首页 文章

运行训练2模型后,keras退出代码-1073741819(0xC0000005)

提问于
浏览
1

我使用Pycharm来运行我的脚本 . 我有一个循环的脚本 . 每个循环:1 . 选择一个数据集 . 2.训练一个新的Keras模型 . 3.评估该模型 .

因此代码可以在2周内完美运行,但是在安装新的anaconda环境时,代码在该循环的两次迭代后突然失败 .

Siamese神经网络的两个模型将在第三个循环之前完美地训练,它在进程中以退出代码-1073741819(0xC0000005)崩溃 .

1/32 [..............................] - ETA: 0s - loss: 0.5075
12/32 [==========>...................] - ETA: 0s - loss: 0.5112
27/32 [========================>.....] - ETA: 0s - loss: 0.4700
32/32 [==============================] - 0s 4ms/step - loss: 0.4805
eval run time : 0.046851396560668945

For LOOCV run 2 out of 32. Model is SNN. Time taken for instance = 6.077638149261475
Post-training results: 
acc = 1.0 , ce = 0.6019332906978302 , f1 score = 1.0 , mcc = 0.0
cm = 
[[1]]
####################################################################################################

Process finished with exit code -1073741819 (0xC0000005)

奇怪的是,代码过去工作得非常好,即使我没有使用anaconda环境并使用我之前使用的环境,它仍然以相同的退出代码退出 .

当我使用另一种类型的模型(密集神经网络)时,它也会在4次迭代后崩溃 . 是否与内存不足有关?这是循环的一个例子 . 确切的模型并不重要,它总是在火车模型线上的一定数量的循环后崩溃(在第2点和第3点之间)

# Run k model instance to perform skf
    predicted_labels_store = []
    acc_store = []
    ce_store = []
    f1s_store = []
    mcc_store = []
    folds = []
    val_features_c = []
    val_labels = []
    for fold, fl_tuple in enumerate(fl_store):
        instance_start = time.time()
        (ss_fl, i_ss_fl) = fl_tuple  # ss_fl is training fl, i_ss_fl is validation fl
        if model_mode == 'SNN':
            # Run SNN
            model = SNN(hparams, ss_fl.features_c_dim)
            loader = Siamese_loader(model.siamese_net, ss_fl, hparams)
            loader.train(loader.hparams.get('epochs', 100), loader.hparams.get('batch_size', 32),
                         verbose=loader.hparams.get('verbose', 1))
            predicted_labels, acc, ce, cm, f1s, mcc = loader.eval(i_ss_fl)
            predicted_labels_store.extend(predicted_labels)
            acc_store.append(acc)
            ce_store.append(ce)
            f1s_store.append(f1s)
            mcc_store.append(mcc)
        elif model_mode == 'cDNN':
            # Run DNN
            print('Point 1')
            model = DNN_classifer(hparams, ss_fl)
            print('Point 2')
            model.train_model(ss_fl)
            print('Point 3')
            predicted_labels, acc, ce, cm, f1s, mcc = model.eval(i_ss_fl)
            predicted_labels_store.extend(predicted_labels)
            acc_store.append(acc)
            ce_store.append(ce)
            f1s_store.append(f1s)
            mcc_store.append(mcc)
        del model
        K.clear_session()
        instance_end = time.time()
        if cv_mode == 'skf':
            print('\nFor k-fold run {} out of {}. Model is {}. Time taken for instance = {}\n'
                  'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
                  '####################################################################################################'
                  .format(fold + 1, k_folds, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
        else:
            print('\nFor LOOCV run {} out of {}. Model is {}. Time taken for instance = {}\n'
                  'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
                  '####################################################################################################'
                  .format(fold + 1, fl.count, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
        # Preparing output dataframe that consists of all the validation dataset and its predicted labels
        folds.extend([fold] * i_ss_fl.count)  # Make a col that contains the fold number for each example
        val_features_c = np.concatenate((val_features_c, i_ss_fl.features_c_a),
                                        axis=0) if val_features_c != [] else i_ss_fl.features_c_a
        val_labels.extend(i_ss_fl.labels)
        K.clear_session()

以及密集神经网络的退出代码 .

For LOOCV run 4 out of 32. Model is cDNN. Time taken for instance = 0.7919328212738037
Post-training results: 
acc = 0.0 , ce = 0.7419472336769104 , f1 score = 0.0 , mcc = 0.0
cm = 
[[0 1]
 [0 0]]
####################################################################################################
Point 1
Point 2

Process finished with exit code -1073741819 (0xC0000005)

非常感谢任何帮助,谢谢!

1 回答

  • 1

    以下是我在评论中建议的事情的解释,如果有人面临同样的问题 .

    手动设置keras的会话,而不是在每个循环开始时使用默认的会话 .

    sess = tf.Session()  
    K.set_session(sess) 
    #..... train your model
    K.clear_session()
    

    删除 loader 变量,因为此对象必须引用原始 model 对象,因为我可以看到您正在调用 train() .

    通过在每个循环后使用 gc.collect() 删除这些变量来显式收集释放的所有内存,以便我们有足够的内存来构建我们的新模型 .

    因此,要点是在循环中运行多个独立模型时,请确保已明确设置tensorflow会话,以便在循环结束后清除此会话,释放此会话使用的所有资源 . 删除可能与该循环中的tensorflow对象关联的所有引用并收集可用内存 .

相关问题