如何在Keras的CNN上堆叠LSTM？-Java 学习之路

我为声音识别目的制作了以下神经网络模型 . 流程图如下：

cnn-lstm-dense-hybrid(please click here)

这个想法如下：

我有2个不同的输入层，称为A和B.

（i）输入A具有 100 time steps ，每一步都有一个64维特征向量

（ii）1D CNN层（时间分布）将从每个时间步骤中提取特征 . CNN层包含 64 filters, each has length 16 taps . 然后， maxpooling 层将提取每个卷积输出的单个最大值，因此将在每个时间步骤提取 a total of 64 features .

（iii）CNN层的输出将被输入 LSTM layer with 64 neurons . 重复次数与输入的时间步长相同，即100个时间步长 . The LSTM layer should return a sequence of 64-dimensional output （序列的长度==时间步数== 100，所以应该有100 * 64 = 6400个数字） .

（iv）同时， input B also has 100 time steps, each step has a 65-dimensional feature vector ，但它们与输入A的处理方式不同 .

（v）输入B is fed into a dense layer (Time distributed) of 65 neurons ，因此它应该产生65维输出 at each time step .

现在，在每个时间步，我们有来自LSTM层（64个神经元）和密集层（65个神经元）的输出，我们 concatenate 它们在合并层中 . 现在我们在每个时间步都得到 129-dimensional vector .
我们将此向量输入 another dense layer, which produces the output （单神经元，表示"is target sound"的概率）

A hand drawn illustration

但是，我一开始就试图让1（i）工作 . 网络建设的代码如下：

mfcc_input = Input(shape=(100,64), dtype='float', name='mfcc_input')
print(mfcc_input)

CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)
CNN_out = Dropout(0.4)(CNN_out)


LSTM_out = LSTM(64,return_sequences=True)(CNN_out)

## Auxilliary branch
delta_input = Input(shape=(100,64), dtype='float', name='delta_input')
zcr_input   = Input(shape=(100,1), dtype='float', name='zcr_input')
aux_input   = concatenate([delta_input, zcr_input])
aux_out     = TimeDistributed(Dense(64+1))(aux_input) 

### Merge branches
merged_layer   = concatenate([LSTM_out, aux_out])

## Output layer
output = TimeDistributed(Dense(1))(merged_layer)

model = Model(inputs=[mfcc_input, delta_input, zcr_input], outputs=[output])

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
          loss_weights=[1., 0.2])
...(other code here) ...

"CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)"的错误是： IndexError: list index out of range

有人可以帮忙吗？非常感谢！

如何在Keras的CNN上堆叠LSTM？

相关问题