什么是TensorFlow中Max Pooling 2D Layer的输出张量？-Java 学习之路

我试图了解有关张量流的一些基础知识，并且在阅读最大池2D图层的文档时遇到困难：https://www.tensorflow.org/tutorials/layers#pooling_layer_1

这是max_pooling2d的指定方式：

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

其中 conv1 具有形状为 [batch_size, image_width, image_height, channels] 的张量，具体在这种情况下它是 [batch_size, 28, 28, 32] .

所以我们的输入是一个形状的张量： [batch_size, 28, 28, 32] .

我对最大池化2D层的理解是，它将应用大小为 pool_size （在这种情况下为2x2）并通过 stride （也是2x2）移动滑动窗口的过滤器 . 这意味着图像的 width 和 height 都将减半，即每个通道最终会有14x14像素（总共32个通道），这意味着我们的输出是一个形状为张量的张量： [batch_size, 14, 14, 32] .

但是，根据上面的链接，输出张量的形状是 [batch_size, 14, 14, 1] ：

Our output tensor produced by max_pooling2d() (pool1) has a shape of 
[batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.

我在这里错过了什么？

32如何转换为1？

他们在这里应用相同的逻辑：https://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2

但这一次是正确的，即 [batch_size, 14, 14, 64] 变为 [batch_size, 7, 7, 64] （通道数相同） .

2 回答

是的，使用2x2 max pool with strides = 2x2会将数据减少到一半，输出深度不会改变 . 这是我给出的测试代码，输出形状是 (14, 14, 32) ，也许是错误的？

#!/usr/bin/env python

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)

conv1 = tf.placeholder(tf.float32, [None,28,28,32])
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2,2], strides=2)
print pool1.get_shape()

输出是：

Extracting ./MNIST_data/train-images-idx3-ubyte.gz
Extracting ./MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./MNIST_data/t10k-labels-idx1-ubyte.gz
(?, 14, 14, 32)

回复于 2024-05-03T15:07:41+08:00

0
尼古拉，它已经按照你的想法得到了纠正 .
- Documentation fixes for TF Layers tutorial (see #8301)
- Feedback on "A Guide to TF Layers: Building a Convolutional Neural Network" tutorial #8301
学习卷积和汇集的概念，我遇到了这个问题 . 感谢您提出问题，该问题将我带到了内容丰富的文档中 .
回复于 2024-05-03T15:07:41+08:00

什么是TensorFlow中Max Pooling 2D Layer的输出张量？

2 回答

相关问题