首页 文章

训练后,TensorFlow Hessian矩阵不会更新

提问于
浏览
0

我试图使用tf.hessians函数得到Hessian矩阵 . 在每次训练后,损失值和变量都会更新,而Hessian矩阵值保持不变 . 而且,它们不依赖于可以手动设置的初始变量值 . 实际上,我的问题类似于this one,但尚未得到答案 . 这是我用于测试的代码:

import tensorflow as tf

# Model parameters
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)

# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W*x + b
y = tf.placeholder(tf.float32)

# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares

# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training data
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]

hess = tf.hessians(loss, [W, b])

# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(10):
    sess.run(train, {x: x_train, y: y_train})
    cur_hess, curr_W, curr_b, curr_loss = sess.run([hess, W, b, loss], {x: x_train, y: y_train})
    print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
    print('cur_hess', cur_hess)

以下是打印结果:

W: [-0.21999997] b: [-0.456] loss: 4.0181446
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.39679998] b: [-0.49552] loss: 1.8198745
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.459616] b: [-0.4965184] loss: 1.5448234
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.48454273] b: [-0.48487374] loss: 1.4825068
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.49684232] b: [-0.4691753] loss: 1.444397
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5049019] b: [-0.45227283] loss: 1.409699
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5115062] b: [-0.43511063] loss: 1.3761029
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.51758033] b: [-0.41800055] loss: 1.3433373
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.523432] b: [-0.40104443] loss: 1.3113549
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.52916396] b: [-0.38427448] loss: 1.2801344
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]

所以,cur_hess没有更新,顺便说一下它只包含2个元素而不是4个 . 如何解决?此外,我尝试按照建议here两次应用tf.gradients,但是不会像tf.hessians那样更新值 . 同时,tf.gradients正确计算一阶导数,并在每次训练循环后更改它们 . 谢谢 .

1 回答

  • 2

    在这种情况下持续使用hessian是正常的,因为,

    loss = Σ [(Wx + b - y)^2]
    

    该方程是二次的,二次方的双导数是常数 .

    ∂2(loss)/∂W2 = Σ 2x^2 = 2 * (1 + 4 + 9 + 16) = 60 ;(x = [1,2,3,4])
    
    ∂2(loss)/∂b2 = Σ 2 = 2 + 2 + 2 + 2 = 8 ;(4 samples with constant derivative)
    

相关问题