我使用Pycharm来运行我的脚本 . 我有一个循环的脚本 . 每个循环:1 . 选择一个数据集 . 2.训练一个新的Keras模型 . 3.评估该模型 .
因此代码可以在2周内完美运行,但是在安装新的anaconda环境时,代码在该循环的两次迭代后突然失败 .
Siamese神经网络的两个模型将在第三个循环之前完美地训练,它在进程中以退出代码-1073741819(0xC0000005)崩溃 .
1/32 [..............................] - ETA: 0s - loss: 0.5075
12/32 [==========>...................] - ETA: 0s - loss: 0.5112
27/32 [========================>.....] - ETA: 0s - loss: 0.4700
32/32 [==============================] - 0s 4ms/step - loss: 0.4805
eval run time : 0.046851396560668945
For LOOCV run 2 out of 32. Model is SNN. Time taken for instance = 6.077638149261475
Post-training results:
acc = 1.0 , ce = 0.6019332906978302 , f1 score = 1.0 , mcc = 0.0
cm =
[[1]]
####################################################################################################
Process finished with exit code -1073741819 (0xC0000005)
奇怪的是,代码过去工作得非常好,即使我没有使用anaconda环境并使用我之前使用的环境,它仍然以相同的退出代码退出 .
当我使用另一种类型的模型(密集神经网络)时,它也会在4次迭代后崩溃 . 是否与内存不足有关?这是循环的一个例子 . 确切的模型并不重要,它总是在火车模型线上的一定数量的循环后崩溃(在第2点和第3点之间)
# Run k model instance to perform skf
predicted_labels_store = []
acc_store = []
ce_store = []
f1s_store = []
mcc_store = []
folds = []
val_features_c = []
val_labels = []
for fold, fl_tuple in enumerate(fl_store):
instance_start = time.time()
(ss_fl, i_ss_fl) = fl_tuple # ss_fl is training fl, i_ss_fl is validation fl
if model_mode == 'SNN':
# Run SNN
model = SNN(hparams, ss_fl.features_c_dim)
loader = Siamese_loader(model.siamese_net, ss_fl, hparams)
loader.train(loader.hparams.get('epochs', 100), loader.hparams.get('batch_size', 32),
verbose=loader.hparams.get('verbose', 1))
predicted_labels, acc, ce, cm, f1s, mcc = loader.eval(i_ss_fl)
predicted_labels_store.extend(predicted_labels)
acc_store.append(acc)
ce_store.append(ce)
f1s_store.append(f1s)
mcc_store.append(mcc)
elif model_mode == 'cDNN':
# Run DNN
print('Point 1')
model = DNN_classifer(hparams, ss_fl)
print('Point 2')
model.train_model(ss_fl)
print('Point 3')
predicted_labels, acc, ce, cm, f1s, mcc = model.eval(i_ss_fl)
predicted_labels_store.extend(predicted_labels)
acc_store.append(acc)
ce_store.append(ce)
f1s_store.append(f1s)
mcc_store.append(mcc)
del model
K.clear_session()
instance_end = time.time()
if cv_mode == 'skf':
print('\nFor k-fold run {} out of {}. Model is {}. Time taken for instance = {}\n'
'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
'####################################################################################################'
.format(fold + 1, k_folds, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
else:
print('\nFor LOOCV run {} out of {}. Model is {}. Time taken for instance = {}\n'
'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
'####################################################################################################'
.format(fold + 1, fl.count, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
# Preparing output dataframe that consists of all the validation dataset and its predicted labels
folds.extend([fold] * i_ss_fl.count) # Make a col that contains the fold number for each example
val_features_c = np.concatenate((val_features_c, i_ss_fl.features_c_a),
axis=0) if val_features_c != [] else i_ss_fl.features_c_a
val_labels.extend(i_ss_fl.labels)
K.clear_session()
以及密集神经网络的退出代码 .
For LOOCV run 4 out of 32. Model is cDNN. Time taken for instance = 0.7919328212738037
Post-training results:
acc = 0.0 , ce = 0.7419472336769104 , f1 score = 0.0 , mcc = 0.0
cm =
[[0 1]
[0 0]]
####################################################################################################
Point 1
Point 2
Process finished with exit code -1073741819 (0xC0000005)
非常感谢任何帮助,谢谢!
1 回答
以下是我在评论中建议的事情的解释,如果有人面临同样的问题 .
手动设置keras的会话,而不是在每个循环开始时使用默认的会话 .
删除
loader
变量,因为此对象必须引用原始model
对象,因为我可以看到您正在调用train()
.通过在每个循环后使用
gc.collect()
删除这些变量来显式收集释放的所有内存,以便我们有足够的内存来构建我们的新模型 .因此,要点是在循环中运行多个独立模型时,请确保已明确设置tensorflow会话,以便在循环结束后清除此会话,释放此会话使用的所有资源 . 删除可能与该循环中的tensorflow对象关联的所有引用并收集可用内存 .