将CNN功能输入LSTM-Java 学习之路

我想 Build 一个具有以下特征的端到端可训练模型：

CNN从图像中提取特征
将要素重新整形为矩阵
然后将该矩阵的每一行馈送到LSTM1
然后将该矩阵的每列送入LSTM2
LSTM1和LSTM2的输出连接为最终输出

（它或多或少类似于本文中的图2：https://arxiv.org/pdf/1611.07890.pdf）

我现在的问题是重塑后，如何使用Keras或Tensorflow将特征矩阵的值提供给LSTM？

到目前为止，这是我使用VGG16网络的代码（也是指向Keras issues的链接）：

# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))

# reshape the  feature 4096 = 64 * 64
model.add(Reshape((64, 64)))

# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct: 
# model.add(LSTM(256, input_shape=(64, 1)))  # 256 hidden units, sequence length = 64, feature dim = 1

1 回答

0
考虑使用Conv2D和MaxPool2D图层构建CNN模型，直到到达Flatten图层，因为Flatten图层的矢量化输出将是您将数据输入到结构的LSTM部分 .

那么，像这样构建你的CNN模型：
```
model_cnn = Sequential()
model_cnn.add(Conv2D...)
model_cnn.add(MaxPooling2D...)
...
model_cnn.add(Flatten())
```
现在，这是一个有趣的观点，当前版本的Keras与某些TensorFlow结构不兼容，这些结构不允许您将整个图层堆叠在一个Sequential对象中 .

因此，现在是时候使用Keras模型对象通过一个技巧完成神经网络：
```
input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)
```
最后，现在是时候将我们的2个分离模型组合在一起了：
```
model = Model(inputs=[input_lay], outputs=[output_lay])
model.compile(...)
```
OBS: Note that you can substitute my model_cnn example by your VGG without including the top layers, once the vectorized output from the VGG Flatten layer will be the input of the LSTM model.
回复于 2024-05-03T23:48:31+08:00

将CNN功能输入LSTM

1 回答

相关问题