首页 文章

将CNN功能输入LSTM

提问于
浏览
1

我想 Build 一个具有以下特征的端到端可训练模型:

  • CNN从图像中提取特征

  • 将要素重新整形为矩阵

  • 然后将该矩阵的每一行馈送到LSTM1

  • 然后将该矩阵的每列送入LSTM2

  • LSTM1和LSTM2的输出连接为最终输出

(它或多或少类似于本文中的图2:https://arxiv.org/pdf/1611.07890.pdf

我现在的问题是重塑后,如何使用Keras或Tensorflow将特征矩阵的值提供给LSTM?

到目前为止,这是我使用VGG16网络的代码(也是指向Keras issues的链接):

# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))

# reshape the  feature 4096 = 64 * 64
model.add(Reshape((64, 64)))

# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct: 
# model.add(LSTM(256, input_shape=(64, 1)))  # 256 hidden units, sequence length = 64, feature dim = 1

1 回答

  • 0

    考虑使用Conv2D和MaxPool2D图层构建CNN模型,直到到达Flatten图层,因为Flatten图层的矢量化输出将是您将数据输入到结构的LSTM部分 .

    那么,像这样构建你的CNN模型:

    model_cnn = Sequential()
    model_cnn.add(Conv2D...)
    model_cnn.add(MaxPooling2D...)
    ...
    model_cnn.add(Flatten())
    

    现在,这是一个有趣的观点,当前版本的Keras与某些TensorFlow结构不兼容,这些结构不允许您将整个图层堆叠在一个Sequential对象中 .

    因此,现在是时候使用Keras模型对象通过一个技巧完成神经网络:

    input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
    time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
    lstm_lay = LSTM(?)(time_distribute)
    output_lay = Dense(?, activation='?')(lstm_lay)
    

    最后,现在是时候将我们的2个分离模型组合在一起了:

    model = Model(inputs=[input_lay], outputs=[output_lay])
    model.compile(...)
    

    OBS: Note that you can substitute my model_cnn example by your VGG without including the top layers, once the vectorized output from the VGG Flatten layer will be the input of the LSTM model.

相关问题