我使用keras和tensorflow作为后端在imdb数据集上训练双向lstm以进行情感分析 . This是keras中的示例 . 训练后,准确率迅速提升至90%及以上用于训练,84%用于验证 . 到现在为止还挺好 .
但是当我创建一个自定义关注解码器层并训练网络时,训练和验证从第1纪元到第10纪元保持不变 .
下面是我的imdb数据集培训代码,它实现了一个自定义注意解码器层 .

max_features = 20000
maxlen = 80
batch_size = 32
timesteps = 80
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

x_train = x_train[:5000]
y_train = y_train[:5000]
x_test = x_test[:5000]
y_test = y_test[:5000]

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print (x_train[0])          #Shape(5000, 80)
print (y_train[0])          #Shape (5000,)


print('Build model...')
encoder_units=64
decoder_units=64
n_labels=1
trainable=True
return_probabilities=False

def modelnmt():
    input_ = Input(shape=(80,), dtype='float32')
    print (input_.get_shape())
    input_embed = Embedding(max_features, 128 ,input_length=80)(input_)
    print (input_embed.get_shape())
    rnn_encoded = Bidirectional(LSTM(encoder_units, return_sequences=True),
        name='bidirectional_1',
        merge_mode='concat')(input_embed)
    print (rnn_encoded.get_shape())
    y_hat = AttentionDecoder(decoder_units,
        name='attention_decoder_1',
        output_dim=n_labels,
        return_probabilities=return_probabilities,
        trainable=trainable)(rnn_encoded)
    y_adec = Reshape((80,))(y_adec)
    y_hat = Dense(1, activation='sigmoid')(y_adec)
    model = Model(inputs=input_, outputs=y_hat)
    model.summary()

    return model

model = modelnmt()

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=10,
          validation_data=(x_test, y_test))

这是输出:


Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 80)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 80, 128)           2560000   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 80, 128)           98816     
_________________________________________________________________
attention_decoder_1 (Attenti (None, 80, 1)             58050     
_________________________________________________________________
reshape_1 (Reshape)          (None, 80)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 81        
=================================================================
Total params: 2,716,947
Trainable params: 2,716,947
Non-trainable params: 0
_________________________________________________________________
Epoc 1/10
5000/5000 [==============================] - 289s - loss: 0.6955 - acc: 0.5056 - val_loss: 0.6935 - val_acc: 0.4956
Epoch 2/10
5000/5000 [==============================] - 348s - loss: 0.6944 - acc: 0.4956 - val_loss: 0.6936 - val_acc: 0.4956

我将简要解释一下模型 . 首先将这些单词嵌入然后传递给双向LSTM以后再注意解码器 . 那么,什么注意解码器它输出一个数字的时间步数,即(无,80,1) . 这 number is then reshaped 并传递给 Dense Layer 来计算 overall sentiment of the sentence (概率) . 注意解码器的输出稍后可用于可视化句子中每个单词的贡献 .
什么可以 possible reasons for such output?