首页 文章

使用可变大小的输入训练完全卷积的神经网络在Keras / TensorFlow中花费了不合理的长时间

提问于
浏览
11

我正在尝试实现一个FCNN用于图像分类,可以接受可变大小的输入 . 该模型使用TensorFlow后端在Keras中构建 .

考虑以下玩具示例:

model = Sequential()

# width and height are None because we want to process images of variable size 
# nb_channels is either 1 (grayscale) or 3 (rgb)
model.add(Convolution2D(32, 3, 3, input_shape=(nb_channels, None, None), border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(16, 1, 1))
model.add(Activation('relu'))

model.add(Convolution2D(8, 1, 1))
model.add(Activation('relu'))

# reduce the number of dimensions to the number of classes
model.add(Convolution2D(nb_classses, 1, 1))
model.add(Activation('relu'))

# do global pooling to yield one value per class
model.add(GlobalAveragePooling2D())

model.add(Activation('softmax'))

这个模型运行正常,但我遇到了性能问题 . 与固定大小的输入训练相比,对可变大小的图像进行训练需要不合理的长时间 . 如果我将所有图像的大小调整为数据集中的最大大小,则训练模型所需的时间远远少于对可变大小输入的训练 . input_shape=(nb_channels, None, None) 指定可变大小输入的正确方法是什么?有没有办法缓解这个性能问题?

Update

model.summary() 对于具有3个类和灰度图像的模型:

Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 32, None, None 320         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 32, None, None 0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 32, None, None 0           activation_1[0][0]               
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 32, None, None 9248        maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 32, None, None 0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 16, None, None 528         maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 16, None, None 0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 8, None, None) 136         activation_2[0][0]               
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 8, None, None) 0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D)  (None, 3, None, None) 27          activation_3[0][0]               
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 3, None, None) 0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
globalaveragepooling2d_1 (Global (None, 3)             0           activation_4[0][0]               
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 3)             0           globalaveragepooling2d_1[0][0]   
====================================================================================================
Total params: 10,259
Trainable params: 10,259
Non-trainable params: 0

1 回答

  • 0

    不同尺寸的图像意味着不同尺度的类似事物的图像 . 如果这种尺度差异显着,则类似物体的相对位置将随着图像尺寸的减小而从框架的中心向左上方移动 . 所示的(简单)网络体系结构在空间上是可识别的,因此随着非常不同规模的数据不一致,模型收敛速率降低将是一致的 . 这种架构不适合在不同或多个地方找到相同的东西 .

    一定程度的剪切,旋转,镜像将有助于模型的推广,但重新缩放到一致的大小 . 因此,当您重新调整大小时,您可以修复缩放问题并使输入数据在空间上保持一致 .

    简而言之,我认为这种网络架构不适合/能够完成您提供的任务,即各种规模 .

相关问题