在Keras上使用解码器输入seq2seq模型连接注意层-Java 学习之路

我正在尝试使用keras库实现注意序列2序列模型 . 该模型的框图如下

该模型将输入序列嵌入到3D张量中 . 然后双向lstm创建编码层 . 接下来，编码序列被发送到自定义注意层，该层返回具有每个隐藏节点的注意权重的2d张量 . 解码器输入作为一个热矢量注入模型 . 现在在解码器（另一个bi-lstm）中，解码器输入和注意权重都作为输入传递 . 解码器的输出被发送到具有softmax激活功能的时间分布密集层，以便以概率的方式获得每个时间步的输出 . 该模型的代码如下：

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim= vocab_size_input, output_dim= embedding_width,trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25,recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))    

merge = concatenate([attention, decoder_input])    

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

问题是当我连接注意层和解码器输入时 . 由于解码器输入是3d张量，而注意力是2d张量，它显示以下错误：

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1024), (None, 10, 8281)]

如何将2d注意张量转换为3d张量？

1 回答

2
根据您的方框图，您似乎在每个时间步将相同的注意向量传递给解码器 . 在这种情况下，您需要 RepeatVector 在每个时间步复制相同的注意向量，以将2D注意张量转换为3D张量：
```
# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...
```
请注意，这将为每个时间步重复相同的注意向量 .
回复于 2024-04-29T00:14:54+08:00

在Keras上使用解码器输入seq2seq模型连接注意层

1 回答

相关问题