Tensorflow：图形部分的MultiGPU训练-Java 学习之路

所有代码都假设为Tensorflow 1.3和Python 3.x.

我们正在研究一种具有有趣损失函数的GAN算法 .

Stage 1 - Compute only the completion/generator loss portion of the network
          Iterates over the completion portion of the GAN for X iterations.  

Stage 2 - Compute only the discriminator loss portion of the network
          Iterates over the discriminator portion for Y iterations (but 
          don't train on Stage 1)

Stage 3 - Compute the full loss on the network
          Iterate over both completion and discriminator for Z iterations 
          (training on the entire network).

我们有这个工作单GPU . 由于培训时间很长，我们希望它能够使用多GPU .

我们已经查看了Tensorflow / models / tutorials / Images / cifar10 / cifar10_multi_gpu_train.py，它讨论了塔损失，将塔平均在一起，计算GPU上的梯度，然后将它们应用到CPU上 . 这是一个很好的开始 . 然而，由于我们的损失更复杂，它使我们的一切变得复杂 .

代码相当复杂，但大致类似于这个，https://github.com/timsainb/Tensorflow-MultiGPU-VAE-GAN，（但是它赢了't run because it was written around Tensorflow 0.1, so it has some oddities that I haven' t得到了工作，但这应该会让你知道我们在做什么）

当我们计算渐变时，它看起来像这样（伪代码试图突出显示重要部分）：

for i in range(num_gpus):
    with tf.device('/gpu:%d' % gpus[i]):
        with tf.name_scope('Tower_%d' % gpus[i]) as scope:
            with tf.variable_scope( "generator" )
                generator = build_generator()

        with tf.variable_scope( "discriminator" ):
            with tf.variable_scope( "real_discriminator" ) :
                real_discriminator = build_discriminator(x)

            with tf.variable_scope( "fake_discriminator", reuse = True ):
                fake_discriminator = build_discriminator(generator) 

        gen_only_loss, discm_only_loss, full_loss = build_loss( generator, 
            real_discriminator, fake_discriminator )

        tf.get_variable_scope().reuse_variables()

        gen_only_grads = gen_only_opt.compute_gradients(gen_only_loss)
        tower_gen_only_grads.append(gen_only_grads)

        discm_only_train_vars= tf.get_collection( 
            tf.GraphKeys.TRAINABLE_VARIABLES, "discriminator" )
        discm_only_train_vars= discm_only_train_vars+ tf.get_collection( 
            tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "discriminator" )

        discm_only_grads = discm_only_opt.compute_gradients(discm_only_loss, 
            var_list = discm_only_train_vars)
        tower_discm_only_grads.append(discm_only_grads)

        full_grads = full_opt.compute_gradients(full_loss)
        tower_full_grads.append(full_grads)

# average_gradients is the same code from the cifar10_multi_gpu_train.py.  
We haven't changed it.  Just iterates over gradients and averages 
them...this is part of the problem...
gen_only_grads = average_gradients(tower_gen_only_grads)
gen_only_train = gen_only_opt.apply_gradients(gen_only_grads, 
global_step=global_step)

discm_only_grads = average_gradients(tower_discm_only_grads)
discm_only_train = discm_only_opt.apply_gradients(discm_only_grads, 
    global_step=global_step)

full_grads = average_gradients(tower_full_grads)
full_train = full_opt.apply_gradients(full_grads, global_step=global_step)

如果我们只调用“compute_gradients（full_loss）”，则算法可以在多个GPU上正常工作 . 这非常相当于cifar10_multi_gpu_train.py示例中的代码 . 当需要在第1阶段或第2阶段限制网络时，棘手的部分就出现了 .

Compute_gradients（full_loss）有一个var_list参数，默认值为None，这意味着它会训练所有变量 . 在Tower_1中，如何知道不训练Tower_0变量？我问，因为当我们处理compute_gradients（discm_only_loss，var_list = discm_only_train_vars）时，我需要知道如何收集正确的变量来限制对网络部分的训练 . 我找到一个线程谈论这个，但发现它不准确/不完整 - "freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize .

原因是，如果你看一下compute_gradients中的代码，当传入None时，var_list是可训练变量和可训练资源变量的组合 . 这就是我如何限制它 . 如果我们不尝试拆分多个GPU，这一切都能正常工作 .

问题1：现在我已经拆分了网络，我是否也负责收集当前的塔？我需要添加这样的一行吗？

discm_only_train_vars= tf.get_collection( tf.GraphKeys.TRAINABLE_VARIABLES, "Tower_{}/discriminator".format( i ) )
discm_only_train_vars= discm_only_train_vars + tf.get_collection( tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "Tower_{}/discriminator".format( i ) )

为了训练塔的适当变量（并确保我不会错过那些变量的训练？）

问题2：可能与问题1的答案相同 . 获取“compute_gradients（gen_only_loss）”有点困难......在非塔楼版本中，gen_only_loss从未触及过鉴别器，因此它激活了图表中所需的张量和所有内容很好 . 但是，在这个版本中，当我调用“compute_gradients”时，它会返回尚未激活的张量的渐变 - 因此有些条目是[（None，tf.Variable），（None，tf.Variable）] . 这会导致average_gradients崩溃，因为它无法将None值转换为Tensor . 这让我觉得我也需要限制这些 .

关于所有这些的令人困惑的事情是cifar示例和我的full_loss示例并不关心特定塔的训练，但我猜测一旦我指定了var_list，compute_gradients用来知道要训练哪些变量的任何魔法哪些塔消失了？我是否需要担心 grab 任何其他变量？

1 回答

0

对于问题1，如果您手动拆分，则负责收集，是的 .

对于问题2，您可能希望限制对compute_gradients的调用或过滤结果 .

回复于 2024-05-01T18:55:05+08:00

Tensorflow：图形部分的MultiGPU训练

1 回答

相关问题