Tensorflow conv2d导致cpu与gpu的差异-Java 学习之路

我们一直在努力解决这个问题，根据我们观察到的一些理论上没有影响的因素导致结果之间存在一些差异 .

在我们的问题中，我们正在处理大小不同的数据集 . 在训练期间，通过填充或斩波将输入固定为特定大小 . 然而，出于性能和速度的考虑，我们希望利用这样一个事实，即我们应该能够预先计算输入的结果，该输入仅包括通过我们的卷积神经网络层的填充 . 因此，对于远小于训练中设置的固定大小的输入，我们将确定最终转换层生成的激活所需的最小填充量，如果它被填充到那个大的固定大小，则匹配相同输入的激活，除了需要附加到结果的预先计算的所有pad激活之外 .

我们正在试验的模型的conv网部分由9层组成，前8个执行conv2d，后跟maxpool，最后一个是单个conv2d . 模型的训练是在gpu上进行的，但是，为了测试我们只有cpu可用 . 使用可变填充方法对大量数据进行测试得出的结果大部分与我们期望看到的固定大小输入相同 . 但是，在一些情况下，我们观察到使用这两种方法获得的值之间存在差异，有时大约为e-4 .

这个问题看起来只表现在规模较小一端的输入中 . 深入研究这些案例表明，模型中每一层的激活总是相同的，直到你到达出现差异的最终卷积层 . 但令人惊讶的是，我们发现添加足够的填充（仍远低于固定大小）可能会影响最后一层的前几次激活的值，即使这些单元根本不依赖于额外的填充 .

我们能够使用预制数据来复制问题，甚至可以在单个conv2d操作中进行演示 . 在下面的代码中，一个简单的conv2d具有单个长度：5步幅：1个过滤器应用于长度为7的张量以产生3个激活 . 但是，当您在输入上附加1或2个额外值时，即使它们根本不依赖于这些新输入值，第2次和第3次激活的值也会更改 . 使用tf.mul和reduce_sum ops手动复制结果，产生的结果与带有额外值的输入相同，表明它是正确的结果 . 这只在ops在cpu上执行时才会被观察到，但是，无论输入的长度是7,8还是9，都将它们分配给gpu而不是总是产生相同的值 . 这是在cpu上执行conv时的预期？有没有办法解释转换输出的方差，例如 . 是否因为用于小输入的算法而发生？以上所有内容均指使用VALID填充的conv2d .

例：

# Consider a length-T tensor x0 where T>=5.  Create another tensor
# x1 by adding an element onto the end of x0.  If you convolve both
# x0 and x1 with a length-5 filter with VALID padding, you'd expect
# the first (T-4) elements of the resulting tensors to have the
# exact same value because the calculation is being done on the
# same set of numbers.  It turns out you can get discrepancies
# though on the order of e^-5 if the convolution is done on the CPU.

#device = '/gpu:0'
device = '/cpu:0'

T = 7  # >=5

def expand_4d(f, n):
    f = tf.expand_dims(f, n[0])
    f = tf.expand_dims(f, n[1])
    f = tf.expand_dims(f, n[2])
    return f

# Convolution Filter and Bias
cnv = tf.cast([-0.7313, -1.1043, 1.8492, 1.3007, -0.1033], tf.float32)
cnv = expand_4d(cnv, [0, -1, -1])
bias = tf.constant([0.0401], tf.float32)

# Input
x0 = 10.0 * tf.cast(tf.range(T), tf.float32)
x1 = 10.0 * tf.cast(tf.range(T+1), tf.float32)
x2 = 10.0 * tf.cast(tf.range(T+2), tf.float32)
x0 = expand_4d(x0, [0, 0, -1])
x1 = expand_4d(x1, [0, 0, -1])
x2 = expand_4d(x2, [0, 0, -1])

# Run Convolution
def my_conv(x):
    with tf.device(device):
        return tf.nn.bias_add(tf.nn.conv2d(x, cnv, strides=[1,1,1,1], padding='VALID'), bias)

y0 = my_conv(x0)
y1 = my_conv(x1)
y2 = my_conv(x2)
n = T - 4   # length of y0

sess = tf.Session()
y0_, y1_, y2_ = sess.run([y0, y1, y2])
print('T =', T)
print('device =', device)
print('y0 = convolution with length-T tensor x0')
print(y0_)
print('y1 = convolution with length-(T+1) tensor x1')
print(y1_)
print('y2 = convolution with length-(T+2) tensor x2')
print(y2_)
# Compare the first n elements of each tensor (should all be equal)
print('y0 - y1[0:%s]' % n)
print(y0_[0][0] - y1_[0][0][0:n])
print('y1[0:%s] - y2[0:%s]' % (n, n))
print(y1_[0][0][0:n] - y2_[0][0][0:n])

样本输出：

T = 7
device = /cpu:0
y0 = convolution with length-T tensor x0
[[[[ 60.87010574]
   [ 72.98010254]
   [ 85.09010315]]]]
y1 = convolution with length-(T+1) tensor x1
[[[[ 60.87010574]
   [ 72.98009491]
   [ 85.09008789]
   [ 97.20009613]]]]
y2 = convolution with length-(T+2) tensor x2
[[[[  60.87010574]
   [  72.98009491]
   [  85.09008789]
   [  97.20009613]
   [ 109.31010437]]]]
y0 - y1[0:3]
[[  0.00000000e+00]
 [  7.62939453e-06]
 [  1.52587891e-05]]
y1[0:3] - y2[0:3]
[[ 0.]
 [ 0.]
 [ 0.]]

Tensorflow conv2d导致cpu与gpu的差异

相关问题