为什么我的卷积自动编码器没有正确收敛?我有一个非常简单的图层堆栈 .

编码器:Conv / ReLU(内核大小:7x7,stride = 1,padding = 0)=> maxPool(内核大小= 2x2,stride = 2)=> Conv / ReLU(内核大小:5x5,stride = 1,padding = 0 )=> MaxPool(内核大小= 2x2,步幅= 2)

解码器:最近邻上采样=> Deconv / ReLU =>最近邻上采样=> Deconv / ReLU

训练图像大小为30x30x1 .

我尝试用超过1000个时期的1000张图像训练它,但错误(MSE)仍然是120 .

BATCH_SIZE = 100
IMAGE_SIZE = 30
NUM_CHANNELS = 1
num_images = 1000

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))
def encoder(X, w, w2, wd, wd2):
    l1a = tf.nn.relu(tf.nn.conv2d(X, w,strides=[1, 1, 1, 1], padding='VALID'))
    l1 = tf.nn.max_pool(l1a, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    l2a = tf.nn.relu(tf.nn.conv2d(l1, w2,strides=[1, 1, 1, 1], padding='VALID'))          
    l2 = tf.nn.max_pool(l2a, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


    l1da = tf.image.resize_images(l2, 8, 8, 1, align_corners=False)
    output_shapel1d = tf.convert_to_tensor([BATCH_SIZE, 12, 12, 32], dtype=tf.int32);
    l1d = tf.nn.relu(tf.nn.conv2d_transpose(l1da, wd, output_shapel1d,     strides=[1, 1, 1, 1], padding='VALID'))                          
    l2da = tf.image.resize_images(l1d, 24, 24, 1, align_corners=False)

    output_shapel2d = tf.convert_to_tensor([BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS], dtype=tf.int32);
    l2d = tf.nn.relu(tf.nn.conv2d_transpose(l2da, wd2, output_shapel2d, strides=[1, 1, 1, 1], padding='VALID'))
    return l2d


complete_image = extract_data(0, 1000)
trX = complete_image[0:900]
trY = trX
teX = complete_image[900:1000]
teY = teX

X = tf.placeholder("float", [BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE,     NUM_CHANNELS])
Y = tf.placeholder("float", [BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS])

w = init_weights([7, 7, 1, 32])       
w2 = init_weights([5, 5, 32, 64])     
wd = init_weights([5, 5, 32, 64])
wd2 = init_weights([7, 7, 1, 32])
py_x = encoder(X, w, w2, wd, wd2)   
cost = tf.reduce_mean(tf.squared_difference(py_x, Y, name = None))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = py_x;   
global_step = tf.Variable(0, name='global_step', trainable=False)
saver = tf.train.Saver()
with tf.Session() as sess:
    tf.initialize_all_variables().run()
    start = global_step.eval() # get last global_step
    print "Start from:", start
    if FLAGS.output == "train":
        for i in range(start, 500):
            training_batch = zip(range(0, num_images - BATCH_SIZE,  batch_size),
                            range(batch_size, num_images - BATCH_SIZE, batch_size))           

        for start, end in training_batch:
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
            total_epoch_cost += sess.run(cost, feed_dict={X: trX[start:end], Y: trY[start:end]})
        avg_epoch_cost = total_epoch_cost/BATCH_SIZE
        print "cost during epoch " + `i` + "is ", avg_epoch_cost

我在this gist中添加了完整的代码,稍作修改 . 我用大约10,000个图像训练这个,488个时期后的错误是74.8 .