首页 文章

在训练CNN进行图像分割时,我的损失怎么会突然增加?

提问于
浏览
1

我使用带有tensorflow 1.4.0后端的keras 1.2.2 .

我使用的是unet架构,我有708个650x650像素和6个chanel的图像 . 我用镜像和旋转增加了数据集,共计4248个图像 .

我有2个班,我的损失函数就是这个:

def jaccard_coef_loss(y_true, y_pred):
    smooth = 1e-12
    intersection = K.sum(y_true * y_pred, axis=[0, -1, -2])
    sum_ = K.sum(y_true + y_pred, axis=[0, -1, -2])
    jac = (intersection + smooth) / (sum_ - intersection + smooth)
    return 1 - K.mean(jac)

我的优化器:

optimizer = SGD(lr=0.01, momentum=0.9, nesterov=True)

我有一个大约30%的图像验证集,batch_size为4,shuffle设置为True . 该模型遍历每个时代的每个训练图像 . 预定了200个时期,但如果10个时期的验证集没有改进,则学习将停止 .

以下是最终时期的培训日志

Epoch 10/200
4248/4248 [==============================] - 3192s - loss: 0.1388 - acc: 0.0868 - jaccard_coef: 0.8612 - jaccard_coef_int: 0.8613 - val_loss: 0.2957 - val_acc: 0.0536 - val_jaccard_coef: 0.7043 - val_jaccard_coef_int: 0.7043
Epoch 11/200
4248/4248 [==============================] - 3167s - loss: 0.1375 - acc: 0.0901 - jaccard_coef: 0.8625 - jaccard_coef_int: 0.8626 - val_loss: 0.2968 - val_acc: 0.0632 - val_jaccard_coef: 0.7032 - val_jaccard_coef_int: 0.7033
Epoch 12/200
4248/4248 [==============================] - 3272s - loss: 0.1964 - acc: 0.1084 - jaccard_coef: 0.8036 - jaccard_coef_int: 0.8037 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2793e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 13/200
4248/4248 [==============================] - 3112s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 4.6290e-15 - jaccard_coef_int: 5.5532e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 14/200
4248/4248 [==============================] - 2032s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.5857e-15 - jaccard_coef_int: 5.1207e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 15/200
4248/4248 [==============================] - 2260s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6600e-15 - jaccard_coef_int: 5.0932e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 16/200
4248/4248 [==============================] - 2914s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3220e-15 - jaccard_coef_int: 4.8916e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 17/200
4248/4248 [==============================] - 2928s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6034e-15 - jaccard_coef_int: 6.3645e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 18/200
4248/4248 [==============================] - 2738s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3913e-15 - jaccard_coef_int: 4.7182e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
Epoch 19/200
4248/4248 [==============================] - 2922s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 6.2745e-15 - jaccard_coef_int: 5.0041e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18

我不知道12和13之间发生了什么 . 这是我的错还是有一个已知的bug可以通过升级到更新版本的keras / tf来解决?

1 回答

  • 1

    看起来您的优化过程有所不同:可能您有非常大的渐变,导致您的模型预测垃圾 . 尝试将学习率降低到0.001并从第12次迭代中恢复

相关问题