批量大小的Tensorflow和批量标准化== 1 =>输出全零-Java 学习之路

-1

我对BatchNorm（后来的BN）的理解有疑问 .

我有一个很好的服务器，我正在编写测试来检查形状和输出范围 . 我注意到当我设置batch_size = 1时，我的模型输出零（logits和激活） .

我用BN制作了最简单的节目原型：

Input => Conv + ReLU => BN => Conv + ReLU => BN => Conv Layer + Tanh

使用xavier初始化初始化模型 . 我猜BN during training 做了一些需要Batch_size> 1的计算 .

我在PyTorch中发现了一个似乎在讨论这个问题：https://github.com/pytorch/pytorch/issues/1381

谁能解释一下呢？它对我来说仍然有点模糊 .

Example Run:

Important: 此脚本运行需要Tensorlayer Library：pip install tensorlayer

import tensorflow as tf
import tensorlayer as tl

import numpy as np

def conv_net(inputs, is_training):

    xavier_initilizer = tf.contrib.layers.xavier_initializer(uniform=True)
    normal_initializer = tf.random_normal_initializer(mean=1., stddev=0.02)

    # Input Layers
    network = tl.layers.InputLayer(inputs, name='input')

    fx = [64, 128, 256, 256, 256]

    for i, n_out_channel in enumerate(fx):

        with tf.variable_scope('h' + str(i + 1)):

            network = tl.layers.Conv2d(
                network,
                n_filter    = n_out_channel,
                filter_size = (5, 5),
                strides     = (2, 2),
                padding     = 'VALID',
                act         = tf.identity,
                W_init      = xavier_initilizer,
                name        = 'conv2d'
            )

            network = tl.layers.BatchNormLayer(
                network,
                act        = tf.identity,
                is_train   = is_training,
                gamma_init = normal_initializer,
                name       = 'batch_norm'
            )

            network = tl.layers.PReluLayer(
                layer  = network,
                a_init = tf.constant_initializer(0.2),
                name   ='activation'
            )

    ############# OUTPUT LAYER ###############

    with tf.variable_scope('h' + str(len(fx) + 1)):
        '''

        network = tl.layers.FlattenLayer(network, name='flatten')

        network = tl.layers.DenseLayer(
            network,
            n_units = 100,
            act     = tf.identity,
            W_init  = xavier_initilizer,
            name    = 'dense'
        )

        '''

        output_filter_size = tuple([int(i) for i in network.outputs.get_shape()[1:3]])

        network = tl.layers.Conv2d(
            network,
            n_filter    = 100,
            filter_size = output_filter_size,
            strides     = (1, 1),
            padding     = 'VALID',
            act         = tf.identity,
            W_init      = xavier_initilizer,

            name        = 'conv2d'
        )

        network = tl.layers.BatchNormLayer(
            network,
            act        = tf.identity,
            is_train   = is_training,
            gamma_init = normal_initializer,
            name       = 'batch_norm'
        )

        net_logits = network.outputs

        network.outputs = tf.nn.tanh(
            x        = network.outputs,
            name     = 'activation'
        )

        net_output = network.outputs

    return network, net_output, net_logits


if __name__ == '__main__':

    tf.logging.set_verbosity(tf.logging.DEBUG)

    #################################################
    #                MODEL DEFINITION               #
    #################################################

    PLH_SHAPE = [None, 256, 256, 3]

    input_plh = tf.placeholder(tf.float32, PLH_SHAPE, name='input_placeholder')

    convnet, net_out, net_logits = conv_net(input_plh, is_training=True)


    with tf.Session() as sess:
        tl.layers.initialize_global_variables(sess)

        convnet.print_params(details=True)

        #################################################
        #                  LAUNCH A RUN                 #
        #################################################

        for BATCH_SIZE in [1, 2]:

            INPUT_SHAPE = [BATCH_SIZE, 256, 256, 3]

            batch_data = np.random.random(size=INPUT_SHAPE)

            output, logits = sess.run(
                [net_out, net_logits],
                feed_dict={
                    input_plh: batch_data
                }
            )

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("\n\n###########################")

                print("\nBATCH SIZE = %d\n" % BATCH_SIZE)

            tf.logging.debug("output => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                output.shape,
                output.mean(),
                output.std(),
                output.min(),
                output.max()
            ))

            tf.logging.debug("logits => Shape: %s - Mean: %e - Std: %f - Min: %f - Max: %f" % (
                logits.shape,
                logits.mean(),
                logits.std(),
                logits.min(),
                logits.max()
            ))

            if tf.logging.get_verbosity() == tf.logging.DEBUG:
                print("###########################")

Gives the following output:

###########################

BATCH SIZE = 1

DEBUG:tensorflow:output => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
DEBUG:tensorflow:logits => Shape: (1, 1, 1, 100) - Mean: 0.000000e+00 - Std: 0.000000 - Min: 0.000000 - Max: 0.000000
###########################


###########################

BATCH SIZE = 2

DEBUG:tensorflow:output => Shape: (2, 1, 1, 100) - Mean: -1.430511e-08 - Std: 0.760749 - Min: -0.779634 - Max: 0.779634
DEBUG:tensorflow:logits => Shape: (2, 1, 1, 100) - Mean: -4.768372e-08 - Std: 0.998715 - Min: -1.044437 - Max: 1.044437
###########################

2 回答

2
您应该阅读有关批量标准化的说明，例如this one . 你也可以看看tensorflow's related doc .

基本上，有两种方法可以执行batch_norm，并且两者都有处理批量大小为1的问题：
- 使用每个像素的移动均值和方差像素，因此它们是与批次中的每个样本具有相同形状的张量 . 这是在@ layog的回答中使用的那个，并且（我认为）在the original paper中使用的是最常用的 .
- 在整个图像/特征空间上使用移动均值和方差，因此它们只是形状 (n_channels,) 的向量（等级1） .
在这两种情况下，您将拥有：
```
output = gamma * (input - mean) / sigma + beta
```
Beta通常设置为0，gamma设置为1，因为你在BN之后就有线性函数 .

During training ， mean 和 variance 是在当前批次中计算的，当大小为1时会导致问题：
- 在第一种情况下，你会得到 mean=input ，所以 output=0
- 在第二种情况下， mean 将是所有像素的平均值，因此它更好;但是如果你的宽度和高度也是1，那么你再次得到 mean=input ，所以你得到 output=0 .
我认为大多数人（和原始方法）使用第一种方式，这就是为什么你会得到0（虽然TF doc似乎暗示第二种方法也是常用的） . 您提供的链接中的参数似乎正在考虑第二种方法 .

在任何情况下（无论你使用哪个），使用BN，如果你使用更大的批量（例如，至少10个），你将只会得到好的结果 .
回复于 2024-05-08T02:30:02+08:00
2
Batch Normalization 使用以下内容（来自original paper）对整个批次的每个输出进行标准化 .

例如，对于批量大小为2，您有以下输出（大小3）
```
[2, 4, 6]
[4, 6, 8]
```
现在意味着批次中的每个输出都将是
```
[3, 5, 7]
```
现在，看看上面公式中的分子 . 它从输出的每个元素中减去平均值 . 但是，如果批量大小为1，那么mean将与输出完全相同，因此它将评估为0 .

作为旁注，即使分母也将被评估为0，但似乎 tensorflow 在 0/0 情况下输出 0 .
回复于 2024-05-08T02:30:02+08:00

批量大小的Tensorflow和批量标准化== 1 =>输出全零

2 回答

相关问题