将Keras模型转换为Tensorflow实现 . 结果不同-Java 学习之路

Update: I've found the problem - decay parameter for RMSPropOptimizer is 0.9 in tensorflow by default. Reducing it to 0 fix strange behavior. Now is much better)

我在Tensorflow张量器中实现我现有的Keras模型时遇到问题 . 这是Keras模型 . 它处理从视频帧中提取的特征序列，以对视频的活动进行分类：

self.model = Sequential()
    self.model.add(Bidirectional(LSTM(self.model_config.rnn_size, return_sequences=True),
                                 input_shape=(self.model_config.sequence_length, input_embedding_size)))
    self.model.add(Reshape((-1,)))
    self.model.add(Dense(output_size))
    self.model.add(Activation('softmax'))
    opt = RMSprop(lr=0.001, decay=0.)
    self.model.compile(loss='categorical_crossentropy', optimizer=opt)

这是我在Tensorflow中实现的这个模型：

with tf.name_scope("inputs"):
        self.input_x = tf.placeholder(tf.float32,
                                      [None, self.model_config.sequence_length, input_embedding_size])
        self.input_y = tf.placeholder(tf.int32, [None])
        input_y_one_hot = tf.one_hot(self.input_y, output_size)

    with tf.name_scope("rnn"):
        fw_cell = tf.nn.rnn_cell.LSTMCell(rnn_size, activation=tf.sigmoid)
        bw_cell = tf.nn.rnn_cell.LSTMCell(rnn_size, activation=tf.sigmoid)

        (output_fw, output_bw), states = tf.nn.bidirectional_dynamic_rnn(fw_cell, bw_cell, self.input_x,
                                                                         dtype=tf.float32)

    output = tf.concat([output_fw, output_bw], axis=-1)
    output = tf.reshape(output, (-1, output.shape[1] * output.shape[2]))
    output = tf.tanh(output)

    with tf.name_scope("nn"):
        w = tf.get_variable("w", [output.shape[1], output_size], dtype=output.dtype)
        b = tf.get_variable("b", [output_size], dtype=output.dtype)

        l = tf.nn.xw_plus_b(output, w, b)

    with tf.name_scope(name="loss"):
        self.loss = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.input_y, logits=l))

    self.optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(self.loss)

    with tf.name_scope("prediction"):
        self.prediction = tf.cast(tf.argmax(tf.nn.softmax(l, axis=-1), axis=-1), dtype=tf.int32)

这两个模型使用相同的参数进行训练，如batch_size，学习率和LSTM大小，但显示不同的结果 . 我发现的唯一区别是Keras LSTM层使用硬sigmoid但我使用sigmoid形式tensorflow . 我检查过，在我看来，默认情况下不使用丢失或正则化（dropout prob为0） .
这是培训历史（ red - loss, blue - val dataset accuracy ）：
Keras模型实施 . 收敛速度快，损失平稳
Keras model training history

Tensorflow模型实现 . 收敛速度非常慢，一般来说准确度更差，每个时期的训练时间更长
TF model training history
请你帮我解决两件事：

Keras内部实现的细微差别，他们预先定义的一些我错过的最佳参数 . Keras模型的工作方式要好得多！
TF模型训练历史中间那个奇怪的高峰/秋天是什么？

谢谢！

[更新]
在rnn之后移除激活层（并且留下tanh反复激活LSTM） - 模型收敛方式更长，60个纪元后的最终准确度为56％：
after remove activation layer

在查看了tf的来源后，我了解到tf keras后端使用层的所有组成元素的不同实现，然后定期执行tensorflow . 查看keras包中的LSTMCell和LSTMCell常规实现 . 因此，可以不期望相同的结果 .
使用AdamOptimizer，tf模型可以快速收敛，但最终精度要低得多 - 67％ . 我想来自tf的RMSProp优化器不能从keras包中调整好（

将Keras模型转换为Tensorflow实现 . 结果不同

相关问题