使用tf.while_loop（TensorFlow）累积图表的输出-Java 学习之路

长话短说，我有一个堆叠在CNN之上的RNN . CNN是分开创建和培训的 . 为了澄清事情，让我们假设CNN以[BATCH SIZE，H，W，C]占位符的形式接收输入（H =高度，W =宽度，C =通道数） .

现在，当堆叠在RNN之上时，组合网络的总输入将具有以下形状：[批量大小，时间序列，H，W，C]，即小批量中的每个样本由TIME_SEQUENCE组成许多图像 . 而且，时间序列的长度是可变的 . 有一个名为 sequence_lengths 的单独占位符，其形状为[BATCH SIZE]，其中包含与小批量中每个样本的长度相对应的标量值 . TIME SEQUENCE的值对应于最大可能的时间序列长度，对于长度较小的样本，剩余的值用零填充 .

What I want to do

我希望以一种形状[BATCH SIZE，TIME SEQUENCE，1]累积来自CNN的输出（最后一个维度只包含CNN为每个批处理元素的每个时间样本输出的最终得分），这样我就可以转发这整个信息块集中在CNN顶部的RNN上 . 棘手的是，我还希望能够将错误从RNN反向传播到CNN（CNN已经预先训练过，但我想稍微调整一下权重），所以我必须留在图表内，即我无法拨打 session.run() .

选项A：最简单的方法是将整个网络输入张量重新整形为[BATCH SIZE * TIME SEQUENCE，H，W，C] . 这个问题是BATCH SIZE * TIME SEQUENCE可能会大到2000，所以当我尝试将一个大批量输入我的CNN时，我必然会耗尽内存 . 无论如何，批量大小对于训练来说太大了 . 此外，许多序列只是填充零，这是浪费计算 .
选项B：使用 tf.while_loop . 我的想法是将时间轴上的所有图像视为单个小批量元素作为CNN的小批量 . 从本质上讲，CNn将在每次迭代时处理大小[TIME SEQUENCE，H，W，C]的批次（不是每次都有很多时间序列;确切数量取决于序列长度） . 我现在的代码如下所示：

# The output tensor that I want populated
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))

# Counter for the loop. I'll process one batch element per iteration.
# One batch element contains a variable number of images for each time step. All these images will form a minibatch for the CNN.
loop_counter = tf.get_variable('loop_counter', dtype=tf.int32, initializer=0)

# Loop variables that will be passed to the body and cond methods
loop_vars = [input_image_sequence, sequence_lengths, image_output_sequence, loop_counter]
# input_image_sequence: [BATCH SIZE, TIME SEQUENCE, H, W, C]
# sequence_lengths: [BATCH SIZE]
# image_output_sequence: [BATCH SIZE, TIME SEQUENCE, 1]

# abbreviations for vars in loop_vars:
# iis --> input_image_sequence
# sl --> sequence_lengths
# ios --> image_output_sequence
# lc --> loop_counter
def cond(iis, sl, ios, lc):  
    return tf.less(lc, batch_size)

def body(iis, sl, ios, lc):
    seq_len = sl[lc]  # the sequence length of the current batch element
    cnn_input_batch = iis[lc, :seq_len]  # extract the relevant portion (the rest are just padded zeros)

    # propagate this 'batch' through the CNN
    my_cnn_model.process_input(cnn_input_batch)

    # Pad the remaining indices
    padding = [[0, 0], [0, batch_size - seq_len]]
    padded_cnn_output = tf.pad(cnn_input_batch_features, paddings=padding, mode='CONSTANT', constant_values=0)

    # The problematic part: assign these processed values to the output tensor
    ios[lc].assign(padded_cnn_features)
    return [iis, sl, ios, lc + 1]

_, _, result, _ = tf.while_loop(cond, body, loop_vars, swap_memory=True)

在 my_cnn_model.process_input 里面，我只是通过一个香草CNN传递输入 . 在其中创建的所有变量都使用 tf.AUTO_REUSE ，因此应确保while循环在所有循环迭代中重用相同的权重 .

The exact problem

image_output_sequence 是一个变量，但不知何故，当 tf.while_loop 调用 body 方法时，它会变成Tensor类型的对象，无法进行赋值 . 我收到错误消息： Sliced assignment is only supported for variables

即使我使用另一种格式，例如使用尺寸为[TIME SEQUENCE，H，W，C]的BATCH SIZE张量元组，这个问题仍然存在 .

我也愿意完全重新设计代码，只要它能很好地完成工作 .

1 回答

解决方案是使用 TensorArray 类型的对象，该对象专门用于解决此类问题 . 以下行：

image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))

被替换为：

image_output_sequence = tf.TensorArray(size=batch_size, dtype=tf.float32, element_shape=[max_sequence_length, 1], infer_shape=True)

TensorArray 不会强制执行它 .

然后在 body 函数内，替换为：

ios[lc].assign(padded_cnn_features)

有：

ios = ios.write(lc, padded_cnn_output)

然后在 tf.while_loop 语句之后，可以堆叠 TensorArray 以形成常规 Tensor 以进行进一步处理：

stacked_tensor = result.stack()

回复于 2024-05-19T14:07:58+08:00

使用tf.while_loop（TensorFlow）累积图表的输出

1 回答

相关问题