Home Articles

iterator.get_next()导致在抛出'std :: system_error的实例后调用terminate

Asked
Viewed 1517 times
1

我正在使用具有这些属性的共享服务器训练带有tensorflow的resNet50:

ubuntu 16.04 3 gtx 1080 gpus tensorflow 1.3 python 2.7但总是在两个时代之后,在第三个时期,我遇到这个错误:

terminate called after throwing an instance of 'std::system_error' 
what():
Resource temporarily unavailable
Aborted

这是将tfrecord代码转换为数据集的代码:

filenames = ["balanced_t.tfrecords"]
dataset = tf.contrib.data.TFRecordDataset(filenames)
def parser(record):
keys_to_features = {
    "mhot_label_raw": tf.FixedLenFeature((), tf.string, 
default_value=""),
    "mel_spec_raw": tf.FixedLenFeature((), tf.string, 
default_value=""),
}
parsed = tf.parse_single_example(record, keys_to_features)

mel_spec1d = tf.decode_raw(parsed['mel_spec_raw'], tf.float64)
# label = tf.cast(parsed["label"], tf.string)
mhot_label = tf.decode_raw(parsed['mhot_label_raw'], tf.float64)
mel_spec = tf.reshape(mel_spec1d, [96, 64])
return {"mel_data": mel_spec}, mhot_label
dataset = dataset.map(parser)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()

这是输入管道:

while True:
        try:
           (features, labels) = sess.run(iterator.get_next())
        except tf.errors.OutOfRangeError:
           print("end of training dataset")

在我的代码中插入一些打印消息后,我发现下面的行导致了这个错误:

(features, labels) = sess.run(iterator.get_next())

但是,我无法解决它

1 Answer

  • 4

    您的代码有一个(微妙的)内存泄漏,因此该进程可能会耗尽内存并被终止 . 问题是在每次循环迭代中调用 iterator.get_next() 会向TensorFlow图添加一个新节点,最终会占用大量内存 .

    要停止内存泄漏,请按以下方式重写 while 循环:

    # Call `get_next()` once outside the loop to create the TensorFlow operations once.
    next_element = iterator.get_next()
    
    while True:
        try:
            (features, labels) = sess.run(next_element)
        except tf.errors.OutOfRangeError:
            print("end of training dataset")
    

Related