我很好奇是否有一种很好的方法来分享不同RNN小区的权重,同时仍然为每个小区提供不同的输入 .
我想要构建的图形是这样的:
其中有三个橙色的LSTM单元并行工作,我希望在这两个单元之间共享权重 .
我已经设法使用占位符实现类似于我想要的东西(参见下面的代码) . 但是,使用占位符会破坏优化程序的渐变计算,并且不会训练任何超出我使用占位符的位置 . 是否有可能在Tensorflow中更好地实现这一目标?
我在Windows 7的Anaconda环境中使用Tensorflow 1.2和python 3.5 .
码:
def ann_model(cls,data, act=tf.nn.relu):
with tf.name_scope('ANN'):
with tf.name_scope('ann_weights'):
ann_weights = tf.Variable(tf.random_normal([1,
cls.n_ann_nodes]))
with tf.name_scope('ann_bias'):
ann_biases = tf.Variable(tf.random_normal([1]))
out = act(tf.matmul(data,ann_weights) + ann_biases)
return out
def rnn_lower_model(cls,data):
with tf.name_scope('RNN_Model'):
data_tens = tf.split(data, cls.sequence_length,1)
for i in range(len(data_tens)):
data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
cls.n_rnn_inputs])
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)
outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
data_tens,
dtype=tf.float32)
with tf.name_scope('RNN_out_weights'):
out_weights = tf.Variable(
tf.random_normal([cls.n_rnn_nodes_lower,1]))
with tf.name_scope('RNN_out_biases'):
out_biases = tf.Variable(tf.random_normal([1]))
#Encode the output of the RNN into one estimate per entry in
#the input sequence
predict_list = []
for i in range(cls.sequence_length):
predict_list.append(tf.matmul(outputs[i],
out_weights)
+ out_biases)
return predict_list
def create_graph(cls,sess):
#Initializes the graph
with tf.name_scope('input'):
cls.x = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_inputs])
with tf.name_scope('labels'):
cls.y = tf.placeholder('float',[cls.batch_size,1])
with tf.name_scope('community_id'):
cls.c = tf.placeholder('float',[cls.batch_size,1])
#Define Placeholder to provide variable input into the
#RNNs with shared weights
cls.input_place = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_rnn_inputs])
#global step used in optimizer
global_step = tf.Variable(0,trainable = False)
#Create ANN
ann_output = cls.ann_model(cls.c)
#Combine output of ANN with other input data x
ann_out_seq = tf.reshape(tf.concat([ann_output for _ in
range(cls.sequence_length)],1),
[cls.batch_size,
cls.sequence_length,
cls.n_ann_nodes])
cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)
#Create 'unrolled' RNN by creating sequence_length many RNN Cells that
#share the same weights.
with tf.variable_scope('Lower_RNNs'):
#Create RNNs
daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2
培训迷你批次分两步计算:
RNNinput = sess.run(cls.rnn_input,feed_dict = {
cls.x:batch_x,
cls.y:batch_y,
cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
cls.y:batch_y,
cls.x:batch_x,
cls.c:batch_c})
谢谢你的帮助 . 任何想法,将不胜感激 .
2 回答
您有3个不同的输入:
input_1, input_2, input_3
将其输入到具有共享参数的LSTM模型 . 然后连接3 lstm的输出并将其传递给最终的LSTM层 . 代码看起来应该是这样的:我最后重新思考了我的架构,并提出了一个更可行的解决方案 .
我没有复制LSTM细胞的中间层以创建具有相同权重的三个不同细胞,而是选择运行相同的细胞三次 . 每次运行的结果存储在类似tf.Variable的_2546022中,然后整个变量用作最终LSTM层的输入 . I drew a diagram here
以这种方式实现它允许在3个时间步之后有效输出,并且没有破坏张量流反向传播算法(即ANN中的节点仍然可以训练 . )
唯一棘手的事情是确保缓冲区的顺序顺序为最终的RNN .