训练精度增加，测试精度稳定-Java 学习之路

在文章之后训练卷积神经网络时，训练集的准确性增加太多而测试集的准确性稳定下来 .

下面是6400个训练样例的示例， randomly chosen at each epoch （因此可能会在前一个时期看到一些示例，有些可能是新的）和6400 same test examples .

对于更大的数据集（64000或100000个训练样例），训练准确度的提高甚至更加突然，在第三个时期达到98 .

我也试过使用 the same 6400 training examples 每个时代，只是随机改组 . 正如预期的那样，结果更糟 .

epoch 3  loss 0.54871 acc 79.01 
learning rate 0.1
nr_test_examples 6400    
TEST epoch 3  loss 0.60812 acc 68.48 
nr_training_examples 6400
tb 91
epoch 4  loss 0.51283 acc 83.52 
learning rate 0.1
nr_test_examples 6400
TEST epoch 4  loss 0.60494 acc 68.68 
nr_training_examples 6400
tb 91
epoch 5  loss 0.47531 acc 86.91 
learning rate 0.05
nr_test_examples 6400
TEST epoch 5  loss 0.59846 acc 68.98 
nr_training_examples 6400
tb 91
epoch 6  loss 0.42325 acc 92.17 
learning rate 0.05
nr_test_examples 6400
TEST epoch 6  loss 0.60667 acc 68.10 
nr_training_examples 6400
tb 91
epoch 7  loss 0.38460 acc 95.84 
learning rate 0.05
nr_test_examples 6400
TEST epoch 7  loss 0.59695 acc 69.92 
nr_training_examples 6400
tb 91
epoch 8  loss 0.35238 acc 97.58 
learning rate 0.05
nr_test_examples 6400
TEST epoch 8  loss 0.60952 acc 68.21

这是我的模型（我在每个卷积后使用RELU激活）：

conv 5x5 (1, 64)
max-pooling 2x2
dropout
conv 3x3 (64, 128)
max-pooling 2x2
dropout
conv 3x3 (128, 256)
max-pooling 2x2
dropout
conv 3x3 (256, 128)
dropout
fully_connected(18*18*128, 128)
dropout
output(128, 128)

What could be the cause?

我正在使用具有学习率衰减的Momentum Optimizer：

batch = tf.Variable(0, trainable=False)

    train_size = 6400

    learning_rate = tf.train.exponential_decay(
      0.1,                # Base learning rate.
      batch * batch_size,  # Current index into the dataset.
      train_size*5,          # Decay step.
      0.5,                # Decay rate.
      staircase=True)
    # Use simple momentum for the optimization.
    optimizer = tf.train.MomentumOptimizer(learning_rate,
                                         0.9).minimize(cost, global_step=batch)

1 回答

2

这是非常期待的 . This problem is called over-fitting . 这是当你的模型开始"memorizing"训练样例而没有实际学习任何对测试集有用的东西时 . 事实上，这正是我们首先使用测试集的原因 . 因为如果我们有足够复杂的模型，我们总能完美地拟合数据，即使没有意义 . 测试集告诉我们模型实际学到了什么 .

使用 Validation set 也很有用，它就像一个测试集，但你用它来找出何时停止训练 . 当验证错误停止降低时，您停止训练 . why not use the test set for this? 测试集是为了了解您的模型在现实世界中的表现 . 如果您开始使用测试集中的信息来选择有关您的培训过程的信息，那么就像您的作弊一样，您的测试错误将不再代表您的真实世界错误 .

最后， convolutional neural networks are notorious for their ability to over-fit . 已经证明，即使您对标签甚至随机像素进行随机播放，Conv-nets也可以获得零训练错误 . 这意味着Conv-net不必有真正的模式来学习代表它 . 这意味着 you have to regularize a conv-net . 也就是说，你必须使用像 Dropout ， batch normalization ， early stopping 这样的东西 .

如果你想了解更多，我会留下一些链接：

过度拟合，验证，提前停止https://elitedatascience.com/overfitting-in-machine-learning

适合随机标签的网络：https://arxiv.org/pdf/1611.03530.pdf（这篇论文有点先进，但它可以轻松浏览）

附：要真正提高测试精度，您需要通过数据扩充来更改模型或训练 . 您可能也想尝试转学习 .

回复于 2024-04-26T03:09:25+08:00

训练精度增加，测试精度稳定

1 回答

相关问题