首页 文章

输出的梯度w.r.t网络权重,保持另一个输出常量

提问于
浏览
6

我们假设我有一个简单的MLP

enter image description here

并且我有一个关于输出层的一些损失函数的梯度来得到G = [0,-1](也就是说,增加第二个输出变量会减小损失函数) .

如果我根据我的网络参数采用G的梯度并应用渐变体面的权重更新,则第二个输出变量应该增加,但是没有关于第一个输出变量的说法,并且渐变的缩放应用几乎肯定会改变输出变量(增加它或减少它)

如何修改我的损失函数或任何梯度计算,以确保第一个输出不会改变?

1 回答

  • 1

    更新:我误解了这个问题 . 这是新的答案 .

    为此,您需要仅更新隐藏层和第二个输出单元之间的连接,同时保持隐藏层和第一个输出单元之间的连接完好无损 .

    The first approach is to introduce two sets of variables :一个用于隐藏层和第一个输出单元之间的连接,其余用于其余部分 . 然后你可以使用 tf.stack 组合它们,并传递 var_list 以获得相应的衍生物 . 它就像(仅用于说明 . 未经测试 . 小心使用):

    out1 = tf.matmul(hidden, W_h_to_out1) + b_h_to_out1
    out2 = tf.matmul(hidden, W_h_to_out2) + b_h_to_out2
    out = tf.stack([out1, out2])
    out = tf.transpose(tf.reshape(out, [2, -1]))
    loss = some_function_of(out)
    optimizer = tf.train.GradientDescentOptimizer(0.1)
    train_op_second_unit = optimizer.minimize(loss, var_list=[W_h_to_out2, b_h_to_out2])
    

    Another approach is to use a mask. 当你使用一些框架(比如苗条,Keras等)时,这更容易实现和更灵活,我会推荐这种方式 . 将第一个输出单元隐藏到损失功能的想法,同时不更改第二个输出单元 . This can be done using a binary variable: multiply something by 1 if you want to keep it, and multiply it by 0 to drop it. 这是代码:

    import tensorflow as tf
    import numpy as np
    
    # let's make our tiny dataset: (x, y) pairs, where x = (x1, x2, x3), y = (y1, y2),
    # and y1 = x1+x2+x3, y2 = x1^2+x2^2+x3^2
    
    # n_sample data points
    n_sample = 8
    data_x = np.random.random((n_sample, 3))
    data_y = np.zeros((n_sample, 2))
    data_y[:, 0] += np.sum(data_x, axis=1)
    data_y[:, 1] += np.sum(data_x**2, axis=1)
    data_y += 0.01 * np.random.random((n_sample, 2))  # add some noise
    
    
    # build graph
    # suppose we have a network of shape [3, 4, 2], i.e.: one hidden layer of size 4.
    
    x = tf.placeholder(tf.float32, shape=[None, 3], name='x')
    y = tf.placeholder(tf.float32, shape=[None, 2], name='y')
    mask = tf.placeholder(tf.float32, shape=[None, 2], name='mask')
    
    W1 = tf.Variable(tf.random_normal(shape=[3, 4], stddev=0.1), name='W1')
    b1 = tf.Variable(tf.random_normal(shape=[4], stddev=0.1), name='b1')
    hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
    W2 = tf.Variable(tf.random_normal(shape=[4, 2], stddev=0.1), name='W2')
    b2 = tf.Variable(tf.random_normal(shape=[2], stddev=0.1), name='b2')
    out = tf.matmul(hidden, W2) + b2
    loss = tf.reduce_mean(tf.square(out - y))
    
    # multiply out by mask, thus out[0] is "invisible" to loss, and its gradient will not be propagated
    masked_out = mask * out
    loss2 = tf.reduce_mean(tf.square(masked_out - y))
    
    optimizer = tf.train.GradientDescentOptimizer(0.1)
    train_op_all = optimizer.minimize(loss)  # update all variables in the network
    train_op12 = optimizer.minimize(loss, var_list=[W2, b2])  # update hidden -> output layer
    train_op2 = optimizer.minimize(loss2, var_list=[W2, b2])  # update hidden -> second output unit
    
    
    sess = tf.InteractiveSession()
    sess.run(tf.global_variables_initializer())
    mask_out1 = np.zeros((n_sample, 2))
    mask_out1[:, 1] += 1.0
    # print(mask_out1)
    print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y: data_y, mask: mask_out1}))
    
    # In this case, only out2 is updated. You see the loss and loss2 decreases.
    sess.run(train_op2, feed_dict={x: data_x, y:data_y, mask: mask_out1})
    print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
    
    # In this case, both out1 and out2 is updated. You see the loss and loss2 decreases.
    sess.run(train_op12, feed_dict={x: data_x, y:data_y, mask: mask_out1})
    print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
    
    # In this case, everything is updated. You see the loss and loss2 decreases.
    sess.run(train_op_all, feed_dict={x: data_x, y:data_y, mask: mask_out1})
    print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
    sess.close()
    

    =======================以下是旧答案====================== ========

    获得衍生品w.r.t.不同的变量,您可以通过 var_list 来决定要更新的变量 . 这是一个例子:

    import tensorflow as tf
    import numpy as np
    
    # let's make our tiny dataset: (x, y) pairs, where x = (x1, x2, x3), y = (y1, y2),
    # and y1 = x1+x2+x3, y2 = x1^2+x2^2+x3^2
    
    # n_sample data points
    n_sample = 8
    data_x = np.random.random((n_sample, 3))
    data_y = np.zeros((n_sample, 2))
    data_y[:, 0] += np.sum(data_x, axis=1)
    data_y[:, 1] += np.sum(data_x**2, axis=1)
    data_y += 0.01 * np.random.random((n_sample, 2))  # add some noise
    
    
    # build graph
    # suppose we have a network of shape [3, 4, 2], i.e.: one hidden layer of size 4.
    
    x = tf.placeholder(tf.float32, shape=[None, 3], name='x')
    y = tf.placeholder(tf.float32, shape=[None, 2], name='y')
    
    W1 = tf.Variable(tf.random_normal(shape=[3, 4], stddev=0.1), name='W1')
    b1 = tf.Variable(tf.random_normal(shape=[4], stddev=0.1), name='b1')
    hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
    W2 = tf.Variable(tf.random_normal(shape=[4, 2], stddev=0.1), name='W2')
    b2 = tf.Variable(tf.random_normal(shape=[2], stddev=0.1), name='b2')
    out = tf.matmul(hidden, W2) + b2
    
    loss = tf.reduce_mean(tf.square(out - y))
    optimizer = tf.train.GradientDescentOptimizer(0.1)
    # You can pass a variable list to decide which variable(s) to minimize.
    train_op_second_layer = optimizer.minimize(loss, var_list=[W2, b2])
    # If there is no var_list, all variables will be updated.
    train_op_all = optimizer.minimize(loss)
    
    sess = tf.InteractiveSession()
    sess.run(tf.global_variables_initializer())
    print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
    
    # In this case, only W2 and b2 are updated. You see the loss decreases.
    sess.run(train_op_second_layer, feed_dict={x: data_x, y:data_y})
    print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
    
    # In this case, all variables are updated. You see the loss decreases.
    sess.run(train_op_all, feed_dict={x: data_x, y:data_y})
    print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
    sess.close()
    

相关问题