Tensorflow CIFAR10多GPU - 为何会出现组合损失？-Java 学习之路

在TensorFlow CIFAR10 example中，经过多个GPU的训练，每个"tower"的损失似乎都是合并的，并且梯度是根据这个组合损失计算出来的 .

# Build the portion of the Graph calculating the losses. Note that we will
    # assemble the total_loss using a custom function below.
    _ = cifar10.loss(logits, labels)

    # Assemble all of the losses for the current tower only.
    losses = tf.get_collection('losses', scope)

    # Calculate the total loss for the current tower.
    total_loss = tf.add_n(losses, name='total_loss')

    # Attach a scalar summary to all individual losses and the total loss; do the
    # same for the averaged version of the losses.
    for l in losses + [total_loss]:
        # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
        # session. This helps the clarity of presentation on tensorboard.
        loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
        tf.contrib.deprecated.scalar_summary(loss_name, l)

    return total_loss

我是TensorFlow的新手，但根据我的理解，每次调用 cifar10.loss 时，都会运行 tf.add_to_collection('losses', cross_entropy_mean) ，并且当前批次的损失将存储在集合中 .

然后调用 losses = tf.get_collection('losses', scope) ，并从集合中检索所有损失 . 然后 tf.add_n 操作将此"tower"中的所有检索到的损失张量一起添加 .

我预计损失只是来自当前的培训步骤/批次，而不是所有批次 .

我误会了什么吗？或者是否有理由将损失合并在一起？

2 回答

1

如果启用了重量衰减，它还会将其添加到损失集合中 . 因此，对于每个塔（范围），它将添加所有损失：cross_entropy_mean和weight_decay .

然后计算每个塔（范围）的梯度 . 最后，不同塔（范围）的所有梯度将在average_gradients中得到平均值 .

回复于 2024-05-04T18:32:47+08:00
1

为什么要合并损失

您引用的示例是多个gpus上的数据并行性的示例 . 数据并行性有助于使用更大的batch_size训练更深层次的模型 . 在此设置中，您需要将gpus中的损失组合在一起，因为每个gpus都持有输入批处理的一部分（与该输入部分对应的丢失和渐变） . 以下示例从tensorflow data parallism example提供了一个示例 .

注意：在模型并行性的情况下，模型的不同子图在单独的gpus上运行，中间输出由主控器收集 .

例子

如果你想使用批量大小256训练模型，对于可能不适合单个gpu（例如8 GB内存）的更深层模型（例如，resnet / inception），那么你可以拆分批处理进入两批128号大小并使用两个批次在单独的gpus上进行模型的正向传递并计算损失和梯度 . 收集每个gpus的计算（损失梯度）并进行平均 . 平均梯度用于更新模型参数 .

回复于 2024-05-04T18:32:47+08:00

Tensorflow CIFAR10多GPU - 为何会出现组合损失？

2 回答

例子

相关问题