NAC的工作原理，以及它如何处理加法和减法等操作

deepmind 最近发布了一篇新的论文---《神经算术逻辑单元（nalu）》（https://arxiv.org/abs/1808.00508），这是一篇很有趣的论文，它解决了深度学习中的一个重要问题，即教导神经网络计算。令人惊讶的是，尽管神经网络已经能够在许多任务，如肺癌分类中获得卓绝表现，却往往在一些简单任务，像计算数字上苦苦挣扎。
在一个展示网络如何努力从新数据中插入特征的实验中，我们的研究发现，他们能够用 -5 到 5 之间的数字将训练数据分类，准确度近乎完美，但对于训练数据之外的数字，网络几乎无法归纳概括。
论文提供了一个解决方案，分成两个部分。以下我将简单介绍一下 nac 的工作原理，以及它如何处理加法和减法等操作。之后，我会介绍 nalu，它可以处理更复杂的操作，如乘法和除法。我提供了可以尝试演示这些代码的代码，您可以阅读上述的论文了解更多详情。
第一神经网络（nac）
神经累加器（简称 nac）是其输入的一种线性变换。什么意思呢？它是一个转换矩阵，是 tanh（w_hat）和 sigmoid（m_hat）的元素乘积。最后，转换矩阵 w 乘以输入（x）。
python 中的 nac
1import tensorflow as tf
2
3# nac
4w_hat = tf.variable(tf.truncated_normal(shape, stddev=0.02))
5m_hat = tf.variable(tf.truncated_normal(shape, stddev=0.02))
6
7w = tf.tanh(w_hat) * tf.sigmoid(m_hat)
8# forward propogation
9a = tf.matmul(in_dim, w)
nac
第二神经网络(nalu)
神经算术逻辑单元，或者我们简称之为 nalu，是由两个 nac 单元组成。第一个 nac g 等于 sigmoid（gx）。第二个 nac 在一个等于 exp 的日志空间 m 中运行 (w(log(|x| + epsilon)))
python 中的 nalu
1import tensorflow as tf
2
3# nalu
4g = tf.variable(tf.truncated_normal(shape, stddev=0.02))
5
6m = tf.exp(tf.matmul(tf.log(tf.abs(in_dim) + epsilon), w))
7
8g = tf.sigmoid(tf.matmul(in_dim, g))
9
10y = g * a + (1 - g) * m
nalu
通过学习添加来测试 nac
现在让我们进行测试，首先将 nac 转换为函数。
1# neural accumulator
2def nac(in_dim, out_dim):
3
4in_features = in_dim.shape[1]
5
6# define w_hat and m_hat
7w_hat = tf.get_variable(name = 'w_hat', initializer=tf.initializers.random_uniform(minval=-2, maxval=2),shape=[in_features, out_dim], trainable=true)
8m_hat = tf.get_variable(name = 'm_hat', initializer=tf.initializers.random_uniform(minval=-2, maxval=2), shape=[in_features, out_dim], trainable=true)
9
10w = tf.nn.tanh(w_hat) * tf.nn.sigmoid(m_hat)
11
12a = tf.matmul(in_dim, w)
13
14return a, w
nac function in python
python 中的 nac 功能
接下来，让我们创建一些玩具数据，用于训练和测试数据。 numpy 有一个名为 numpy.arrange 的优秀 api，我们将利用它来创建数据集。
1# generate a series of input number x1 and x2 for training
2x1 = np.arange(0,10000,5, dtype=np.float32)
3x2 = np.arange(5,10005,5, dtype=np.float32)
4
5
6y_train = x1 + x2
7
8x_train = np.column_stack((x1,x2))
9
10print(x_train.shape)
11print(y_train.shape)
12
13# generate a series of input number x1 and x2 for testing
14x1 = np.arange(1000,2000,8, dtype=np.float32)
15x2 = np.arange(1000,1500,4, dtype= np.float32)
16
17x_test = np.column_stack((x1,x2))
18y_test = x1 + x2
19
20print()
21print(x_test.shape)
22print(y_test.shape)
添加玩具数据
现在，我们可以定义样板代码来训练模型。我们首先定义占位符 x 和 y，用以在运行时提供数据。接下来我们定义的是 nac 网络（y_pred，w = nac（in_dim = x，out_dim = 1））。对于损失，我们使用 tf.reduce_sum()。我们将有两个超参数，alpha，即学习率和我们想要训练网络的时期数。在运行训练循环之前，我们需要定义一个优化器，这样我们就可以使用 tf.train.adamoptimizer() 来减少损失。
1# define the placeholder to feed the value at run time
2x = tf.placeholder(dtype=tf.float32, shape =[none , 2]) # number of samples x number of features (number of inputs to be added)
3y = tf.placeholder(dtype=tf.float32, shape=[none,])
4
5# define the network
6# here the network contains only one nac cell (for testing)
7y_pred, w = nac(in_dim=x, out_dim=1)
8y_pred = tf.squeeze(y_pred)# remove extra dimensions if any
9
10# mean square error (mse)
11loss = tf.reduce_mean( (y_pred - y) **2)
12
13
14# training parameters
15alpha = 0.05 # learning rate
16epochs = 22000
17
18optimize = tf.train.adamoptimizer(learning_rate=alpha).minimize(loss)
19
20with tf.session() as sess:
21
22#init = tf.global_variables_initializer()
23cost_history = []
24
25sess.run(tf.global_variables_initializer())
26
27# pre training evaluate
28print(pre training mse: , sess.run (loss, feed_dict={x: x_test, y:y_test}))
29print()
30for i in range(epochs):
31_, cost = sess.run([optimize, loss ], feed_dict={x:x_train, y: y_train})
32print(epoch: {}, mse: {}.format( i,cost) )
33cost_history.append(cost)
34
35# plot the mse over each iteration
36plt.plot(np.arange(epochs),np.log(cost_history)) # plot mse on log scale
37plt.xlabel(epoch)
38plt.ylabel(mse)
39plt.show()
40
41print()
42print(w.eval())
43print()
44# post training loss
45print(post training mse: , sess.run(loss, feed_dict={x: x_test, y: y_test}))
46
47print(actual sum: , y_test[0:10])
48print()
49print(predicted sum: , sess.run(y_pred[0:10], feed_dict={x: x_test, y: y_test}))
训练之后，成本图的样子：
nac 训练之后的成本
actual sum: [2000. 2012. 2024. 2036. 2048. 2060. 2072. 2084. 2096. 2108.]predicted sum: [1999.9021 2011.9015 2023.9009 2035.9004 2047.8997 2059.8992 2071.8984 2083.898 2095.8975 2107.8967]
虽然 nac 可以处理诸如加法和减法之类的操作，但是它无法处理乘法和除法。于是，就有了 nalu 的用武之地。它能够处理更复杂的操作，例如乘法和除法。
通过学习乘法来测试 nalu
为此，我们将添加片段以使 nac 成为 nalu。
神经累加器（nac）是其输入的线性变换。神经算术逻辑单元（nalu）使用两个带有绑定的权重的 nacs 来启用加法或者减法（较小的紫色单元）和乘法/除法（较大的紫色单元），由一个门（橙色单元）来控制。
1# the neural arithmetic logic unit
2def nalu(in_dim, out_dim):
3
4shape = (int(in_dim.shape[-1]), out_dim)
5epsilon = 1e-7
6
7# nac
8w_hat = tf.variable(tf.truncated_normal(shape, stddev=0.02))
9m_hat = tf.variable(tf.truncated_normal(shape, stddev=0.02))
10g = tf.variable(tf.truncated_normal(shape, stddev=0.02))
11
12w = tf.tanh(w_hat) * tf.sigmoid(m_hat)
13# forward propogation
14a = tf.matmul(in_dim, w)
15
16# nalu
17m = tf.exp(tf.matmul(tf.log(tf.abs(in_dim) + epsilon), w))
18g = tf.sigmoid(tf.matmul(in_dim, g))
19y = g * a + (1 - g) * m
20
21return y
python 中的 nalu 函数
现在，再次创建一些玩具数据，这次我们将进行两行更改。
1# test the network by learning the multiplication
2
3# generate a series of input number x1 and x2 for training
4x1 = np.arange(0,10000,5, dtype=np.float32)
5x2 = np.arange(5,10005,5, dtype=np.float32)
6
7
8y_train = x1 * x2
9
10x_train = np.column_stack((x1,x2))
11
12print(x_train.shape)
13print(y_train.shape)
14
15# generate a series of input number x1 and x2 for testing
16x1 = np.arange(1000,2000,8, dtype=np.float32)
17x2 = np.arange(1000,1500,4, dtype= np.float32)
18
19x_test = np.column_stack((x1,x2))
20y_test = x1 * x2
21
22print()
23print(x_test.shape)
24print(y_test.shape)
用于乘法的玩具数据
第 8 行和第 20 行是进行更改的地方，将加法运算符切换为乘法。
现在我们可以训练的是 nalu 网络。我们唯一需要更改的地方是定义 nac 网络改成 nalu（y_pred = nalu（in_dim = x，out_dim = 1））。
1# define the placeholder to feed the value at run time
2x = tf.placeholder(dtype=tf.float32, shape =[none , 2]) # number of samples x number of features (number of inputs to be added)
3y = tf.placeholder(dtype=tf.float32, shape=[none,])
4
5# define the network
6# here the network contains only one nac cell (for testing)
7y_pred = nalu(in_dim=x, out_dim=1)
8y_pred = tf.squeeze(y_pred) # remove extra dimensions if any
9
10# mean square error (mse)
11loss = tf.reduce_mean( (y_pred - y) **2)
12
13
14# training parameters
15alpha = 0.05 # learning rate
16epochs = 22000
17
18optimize = tf.train.adamoptimizer(learning_rate=alpha).minimize(loss)
19
20with tf.session() as sess:
21
22#init = tf.global_variables_initializer()
23cost_history = []
24
25sess.run(tf.global_variables_initializer())
26
27# pre training evaluate
28print(pre training mse: , sess.run (loss, feed_dict={x: x_test, y: y_test}))
29print()
30for i in range(epochs):
31_, cost = sess.run([optimize, loss ], feed_dict={x: x_train, y: y_train})
32print(epoch: {}, mse: {}.format( i,cost) )
33cost_history.append(cost)
34
35# plot the loss over each iteration
36plt.plot(np.arange(epochs),np.log(cost_history)) # plot mse on log scale
37plt.xlabel(epoch)
38plt.ylabel(mse)
39plt.show()
40
41
42# post training loss
43print(post training mse: , sess.run(loss, feed_dict={x: x_test, y: y_test}))
44
45print(actual product: , y_test[0:10])
46print()
47print(predicted product: , sess.run(y_pred[0:10], feed_dict={x: x_test, y: y_test}))
nalu 训练后的成本
actual product: [1000000. 1012032. 1024128. 1036288. 1048512. 1060800. 1073152. 1085568. 1098048. 1110592.]predicted product: [1000000.2 1012032. 1024127.56 1036288.6 1048512.06 1060800.8 1073151.6 1085567.6 1098047.6 1110592.8 ]
在 tensorflow 中全面实现

曝苹果正筹备耳罩式头戴耳机将借助传感器给出正确的手势反馈
AI引发的技术革命:从自动驾驶1.0到4.0
了解数据中心边缘的转变
LED检修和LED电源设计实践经验分享，别等了
微流控在便携式紫外水质传感器中的角色
NAC的工作原理，以及它如何处理加法和减法等操作
三星Galaxy S Blaze 4G将于本月底上市
低功耗蓝牙模块是如何作用于传感器的？
HDMI光纤传输中光混缆（AOC）和全光传输哪个更好？
基于网络的运动控制技术、功能、结构类型
AFG3000C系列任意函数发生器的功能特点及应用范围
中科英华铜箔产能将跻身世界前列
如何利用人工智能快速记下单词
开环增益的测量—电流注入法、不稳定系统的测量
ieee1588v2交换机简介
FPGA简单门电路怎么实现？
fireflyAIO-3288C主板--MaskRom模式的调整方法
大疆公司推出首款带变焦功能的无人机
销量猛涨，OPPO是如何逆袭华为的？原来是靠这个！
CoinAll上线Zeux（ZUC）全球首个数字货币支付理财APP