首页 文章

如何在Tensorflow中以不同输入提供的不同RNN单元之间共享权重?

提问于
浏览
4

我很好奇是否有一种很好的方法来分享不同RNN小区的权重,同时仍然为每个小区提供不同的输入 .

我想要构建的图形是这样的:

enter image description here

其中有三个橙色的LSTM单元并行工作,我希望在这两个单元之间共享权重 .

我已经设法使用占位符实现类似于我想要的东西(参见下面的代码) . 但是,使用占位符会破坏优化程序的渐变计算,并且不会训练任何超出我使用占位符的位置 . 是否有可能在Tensorflow中更好地实现这一目标?

我在Windows 7的Anaconda环境中使用Tensorflow 1.2和python 3.5 .

码:

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

培训迷你批次分两步计算:

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

谢谢你的帮助 . 任何想法,将不胜感激 .

2 回答

  • 0

    您有3个不同的输入: input_1, input_2, input_3 将其输入到具有共享参数的LSTM模型 . 然后连接3 lstm的输出并将其传递给最终的LSTM层 . 代码看起来应该是这样的:

    # Create input placeholder for the network
     input_1 = tf.placeholder(...)
     input_2 = tf.placeholder(...)
     input_3 = tf.placeholder(...)
    
     # create a shared rnn layer 
     def shared_rnn(...):
        ...
        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)
    
     # generate the outputs for each input
     with tf.variable_scope('lower_lstm') as scope:
        out_input_1 = shared_rnn(...)
        scope.reuse_variables() # the variables will be reused.
        out_input_2 = shared_rnn(...)
         scope.reuse_variables()
        out_input_3 = shared_rnn(...)
    
     # verify whether the variables are reused
     for v in tf.global_variables():
        print(v.name)
    
     # concat the three outputs
     output = tf.concat...  
    
     # Pass it to the final_lstm layer and out the logits
     logits = final_layer(output, ...)
    
     train_op = ...
    
     # train
       sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
    
  • 2

    我最后重新思考了我的架构,并提出了一个更可行的解决方案 .

    我没有复制LSTM细胞的中间层以创建具有相同权重的三个不同细胞,而是选择运行相同的细胞三次 . 每次运行的结果存储在类似tf.Variable的_2546022中,然后整个变量用作最终LSTM层的输入 . I drew a diagram here

    以这种方式实现它允许在3个时间步之后有效输出,并且没有破坏张量流反向传播算法(即ANN中的节点仍然可以训练 . )

    唯一棘手的事情是确保缓冲区的顺序顺序为最终的RNN .

相关问题