在Keras中使用有状态LSTM和迷你批量输入以及可变时间步长？-Java 学习之路

我是Keras的新手，并试图实现这个网络
enter image description here

此网络将视频帧视为x = {x1，........，xT}，其中T是视频中帧的数量，x是2048帧的视觉特征

我尝试使用有状态LSTM，因为每个样本都有多个帧作为参考here

这是我的模特

x = Input(batch_shape=(1, None, 2048), name='x')
lstmR = LSTM(256, return_sequences=True, name='lstmR', stateful=True)(x)
lstmL = LSTM(256, return_sequences=True, go_backwards=True,name='lstmL', stateful=True)(x)
merge = merge([x, lstmR, lstmL], mode='concat', name='merge')
dense = Dense(256, activation='sigmoid', name='dense')(merge)
y = Dense(1, activation='sigmoid', name='y')(dense)
model = Model(input=x, output=y)
model.compile(loss='mean_squared_error',
          optimizer=SGD(lr=0.01),
          metrics=['accuracy'])

并尝试使用迷你批处理训练模型

for epoch in range(15):
    mean_tr_acc = []
    mean_tr_loss = []
    for i in range(nb_samples):
        x, y = get_train_sample(i)
        for j in range(len(x)):
            sample_x = x[j]
            tr_loss, tr_acc = model.train_on_batch(np.expand_dims(np.expand_dims(sample_x, axis=0), axis=0),np.expand_dims(y, axis=0))
            mean_tr_acc.append(tr_acc)
            mean_tr_loss.append(tr_loss)
        model.reset_states()

但似乎模型不能收敛，因为它给出0.3准确度

我也尝试使用无状态LSTM和输入形状（无，1024），但它也没有收敛

1 回答

1
我认为您的LSTM无法从视频帧中提取相关功能，以实现良好的准确性 .

通常在处理图像（或视频帧）时提供最佳结果的方法是使用一堆卷积relu max pooling层提取特征（参见https://arxiv.org/abs/1612.02903这是一个关于面部表情识别的调查，他们都使用卷积从中提取有用的特征图片） .

这些最适合二维输入，但我发现你代表的是一个大小为2048而不是矩阵的视频帧 . 通常，图像用类似于 (rows, cols, color_channels) 的形状表示 .

在你的情况下，输入将具有形状 (1, None, rows, cols, color_channels) ，然后卷积看起来像这样：
```
from keras.layers import Input, LSTM, Conv2D, MaxPool2D, TimeDistributed, Flatten

x = Input(batch_shape=(1, None, rows, cols, color_channels), name='x')
convs = TimeDistributed(Conv2D(16, kernel_size=(3,3), activation='relu', padding='same'))(x)
convs = TimeDistributed(MaxPool2D(pool_size=(2,2)))(convs)
convs = TimeDistributed(Conv2D(32, kernel_size=(3,3), activation='relu', padding='same'))(convs)
convs = TimeDistributed(MaxPool2D(pool_size=(2,2)))(convs)
lstm_input = TimeDistributed(Flatten())(convs)
lstmR = LSTM(256, return_sequences=True, name='lstmR', stateful=True)(lstm_input)
lstmL = LSTM(256, return_sequences=True, go_backwards=True, name='lstmL', stateful=True)(lstm_input)
...
```
其中 TimeDistrubuted 将给定图层应用于每个时间步 .
回复于 2024-04-30T11:15:26+08:00

在Keras中使用有状态LSTM和迷你批量输入以及可变时间步长？

1 回答

相关问题