使用seq2seq API（版本1.1及更高版本）的Tensorflow序列到序列模型-Java 学习之路

我正在使用 TensorFlow v:1.1 ，我想使用tf.contrib.seq2seq api实现 sequence to sequence 模型 . 但是我很难理解如何使用提供的所有函数（BasicDecoder，Dynamic_decode，Helper，Training Helper ...）来构建我的模型 .

这是我的设置：我想"translate"一系列特征向量： (batch_size, encoder_max_seq_len, feature_dim) 成一个不同长度的序列 (batch_size, decoder_max_len, 1) .

我已经拥有 encoder 这是一个带有LSTM单元的RNN，我得到了它想要作为初始输入提供给解码器的 final state . 我已经有了解码器的单元，MultiRNNCell LSM . 你能帮助我使用 tf.contrib.seq2seq2 和dynamic_decode的功能构建最后一部分（会非常感谢示例代码或解释）吗？

这是我的代码：

import tensorflow as tf
from tensorflow.contrib import seq2seq
from tensorflow.contrib import rnn
import math

from data import gen_sum_2b2

class Seq2SeqModel:
def __init__(self,
             in_size,
             out_size,
             embed_size,
             n_symbols,
             cell_type,
             n_units,
             n_layers):
    self.in_size = in_size
    self.out_size = out_size
    self.embed_size = embed_size
    self.n_symbols = n_symbols
    self.cell_type = cell_type
    self.n_units = n_units
    self.n_layers = n_layers

    self.build_graph()

def build_graph(self):
    self.init_placeholders()
    self.init_cells()
    self.encoder()
    self.decoder_train()
    self.loss()
    self.training()

def init_placeholders(self):
    with tf.name_scope('Placeholders'):
        self.encoder_inputs = tf.placeholder(shape=(None, None, self.in_size), 
                                             dtype=tf.float32, name='encoder_inputs')
        self.decoder_targets = tf.placeholder(shape=(None, None),
                                              dtype=tf.int32, name='decoder_targets')
        self.seqs_len = tf.placeholder(dtype=tf.int32)
        self.batch_size = tf.placeholder(tf.int32, name='dynamic_batch_size')
        self.max_len = tf.placeholder(tf.int32, name='dynamic_seq_len')
        decoder_inputs = tf.reshape(self.decoder_targets, shape=(self.batch_size,
                                    self.max_len, self.out_size))
        self.decoder_inputs = tf.cast(decoder_inputs, tf.float32)
        self.eos_step = tf.ones([self.batch_size, 1], dtype=tf.float32, name='EOS')
        self.pad_step = tf.zeros([self.batch_size, 1], dtype=tf.float32, name='PAD')

def RNNCell(self):
    c = self.cell_type(self.n_units, reuse=None)
    c = rnn.MultiRNNCell([self.cell_type(self.n_units) for i in range(self.n_layers)])
    return c

def init_cells(self):
    with tf.variable_scope('RNN_enc_cell'):
        self.encoder_cell = self.RNNCell()  
    with tf.variable_scope('RNN_dec_cell'):
        self.decoder_cell = rnn.OutputProjectionWrapper(self.RNNCell(), self.n_symbols)

def encoder(self):
    with tf.variable_scope('Encoder'):
        self.init_state = self.encoder_cell.zero_state(self.batch_size, tf.float32) 
        _, self.encoder_final_state = tf.nn.dynamic_rnn(self.encoder_cell, self.encoder_inputs,
                                                        initial_state=self.init_state)

1 回答

17
Decoding layer:

解码由两部分组成，因为它们在 training 和 inference 之间存在差异：

特定时间步的解码器输入始终来自前一个时间步的输出 . 但是在训练期间，输出固定为实际目标（实际目标作为输入反馈），这表明可以提高性能 .

这两个都是使用 tf.contrib.seq2seq 中的方法处理的 .
- decoder 的主要功能是： seq2seq.dynamic decoder() 执行动态解码：
tf.contrib.seq2seq.dynamic_decode(decoder,maximum_iterations)

这需要 Decoder 实例和 maximum_iterations=maximum seq length 作为输入 .

1.1 Decoder 实例来自：

seq2seq.BasicDecoder(cell, helper, initial_state,output_layer)

输入为： cell （RNNCell实例）， helper （辅助实例）， initial_state （解码器的初始状态应该是编码器的输出状态）和 output_layer （可选的密集层作为输出以进行预测）

1.2 RNNCell实例可以是 rnn.MultiRNNCell() .

1.3 helper 实例是 training 和 inference 中不同的实例 . 在 training 期间，我们希望将输入馈送到解码器，而在 inference 期间，我们希望 time-step (t) 中的解码器输出作为输入传递到 time step (t+1) 中的解码器 .

For training: 我们使用辅助函数： seq2seq.TrainingHelper(inputs, sequence_length) ，它只读取输入 .

For inference: 我们称之为辅助函数： seq2seq.GreedyEmbeddingHelper() or seqseq.SampleEmbeddingHelper() ，不同之处在于它是否使用输出的 argmax() or sampling(from a distribution) 并通过嵌入层传递结果以获得下一个输入 .

Putting together: the Seq2Seq model
- 从 encoder layer 获取编码器状态，并将其作为 initial_state 传递给解码器 .
- 使用 seq2seq.dynamic_decoder() 获取 decoder train 和 decoder inference 的输出 . 当您调用这两种方法时，请确保共享权重 . （使用 variable_scope 重复使用权重）
- 然后使用损失函数 seq2seq.sequence_loss 训练网络 .
示例代码为here和here .
回复于 2024-05-15T02:01:04+08:00

使用seq2seq API（版本1.1及更高版本）的Tensorflow序列到序列模型

1 回答

相关问题