我为声音识别目的制作了以下神经网络模型 . 流程图如下:
cnn-lstm-dense-hybrid(please click here)
这个想法如下:
- 我有2个不同的输入层,称为A和B.
(i)输入A具有 100 time steps ,每一步都有一个64维特征向量
(ii)1D CNN层(时间分布)将从每个时间步骤中提取特征 . CNN层包含 64 filters, each has length 16 taps . 然后, maxpooling 层将提取每个卷积输出的单个最大值,因此将在每个时间步骤提取 a total of 64 features .
(iii)CNN层的输出将被输入 LSTM layer with 64 neurons . 重复次数与输入的时间步长相同,即100个时间步长 . The LSTM layer should return a sequence of 64-dimensional output (序列的长度==时间步数== 100,所以应该有100 * 64 = 6400个数字) .
(iv)同时, input B also has 100 time steps, each step has a 65-dimensional feature vector ,但它们与输入A的处理方式不同 .
(v)输入B is fed into a dense layer (Time distributed) of 65 neurons ,因此它应该产生65维输出 at each time step .
-
现在,在每个时间步,我们有来自LSTM层(64个神经元)和密集层(65个神经元)的输出,我们 concatenate 它们在合并层中 . 现在我们在每个时间步都得到 129-dimensional vector .
-
我们将此向量输入 another dense layer, which produces the output (单神经元,表示"is target sound"的概率)
但是,我一开始就试图让1(i)工作 . 网络建设的代码如下:
mfcc_input = Input(shape=(100,64), dtype='float', name='mfcc_input')
print(mfcc_input)
CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)
CNN_out = Dropout(0.4)(CNN_out)
LSTM_out = LSTM(64,return_sequences=True)(CNN_out)
## Auxilliary branch
delta_input = Input(shape=(100,64), dtype='float', name='delta_input')
zcr_input = Input(shape=(100,1), dtype='float', name='zcr_input')
aux_input = concatenate([delta_input, zcr_input])
aux_out = TimeDistributed(Dense(64+1))(aux_input)
### Merge branches
merged_layer = concatenate([LSTM_out, aux_out])
## Output layer
output = TimeDistributed(Dense(1))(merged_layer)
model = Model(inputs=[mfcc_input, delta_input, zcr_input], outputs=[output])
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
...(other code here) ...
"CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)"的错误是: IndexError: list index out of range
有人可以帮忙吗?非常感谢!