我为声音识别目的制作了以下神经网络模型 . 流程图如下:

cnn-lstm-dense-hybrid(please click here)

这个想法如下:

  • 我有2个不同的输入层,称为A和B.

(i)输入A具有 100 time steps ,每一步都有一个64维特征向量

(ii)1D CNN层(时间分布)将从每个时间步骤中提取特征 . CNN层包含 64 filters, each has length 16 taps . 然后, maxpooling 层将提取每个卷积输出的单个最大值,因此将在每个时间步骤提取 a total of 64 features .

(iii)CNN层的输出将被输入 LSTM layer with 64 neurons . 重复次数与输入的时间步长相同,即100个时间步长 . The LSTM layer should return a sequence of 64-dimensional output (序列的长度==时间步数== 100,所以应该有100 * 64 = 6400个数字) .

(iv)同时, input B also has 100 time steps, each step has a 65-dimensional feature vector ,但它们与输入A的处理方式不同 .

(v)输入B is fed into a dense layer (Time distributed) of 65 neurons ,因此它应该产生65维输出 at each time step .

  • 现在,在每个时间步,我们有来自LSTM层(64个神经元)和密集层(65个神经元)的输出,我们 concatenate 它们在合并层中 . 现在我们在每个时间步都得到 129-dimensional vector .

  • 我们将此向量输入 another dense layer, which produces the output (单神经元,表示"is target sound"的概率)

A hand drawn illustration

但是,我一开始就试图让1(i)工作 . 网络建设的代码如下:

mfcc_input = Input(shape=(100,64), dtype='float', name='mfcc_input')
print(mfcc_input)

CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)
CNN_out = Dropout(0.4)(CNN_out)


LSTM_out = LSTM(64,return_sequences=True)(CNN_out)

## Auxilliary branch
delta_input = Input(shape=(100,64), dtype='float', name='delta_input')
zcr_input   = Input(shape=(100,1), dtype='float', name='zcr_input')
aux_input   = concatenate([delta_input, zcr_input])
aux_out     = TimeDistributed(Dense(64+1))(aux_input) 

### Merge branches
merged_layer   = concatenate([LSTM_out, aux_out])

## Output layer
output = TimeDistributed(Dense(1))(merged_layer)

model = Model(inputs=[mfcc_input, delta_input, zcr_input], outputs=[output])

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
          loss_weights=[1., 0.2])
...(other code here) ...

"CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)"的错误是: IndexError: list index out of range

有人可以帮忙吗?非常感谢!