系统信息

  • What is the top-level directory of the model you are using :tensorflow / models / tree / master / research / seq2species

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow) :没有

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04) :Ubuntu 16.04

  • TensorFlow installed from (source or binary) :二进制

  • TensorFlow version (use command below) :(_ 'v1.10.1-0-g4dcfddc5d1','1.10.1')

  • Bazel version (if compiling from source)

  • CUDA/cuDNN version :CUDA9.0 / cuDNN7.0.5.15

  • GPU model and memory :GTX 1080 Ti11GB

描述问题

当执行测试命令 python seq2species/run_training_test.py 时出现以下错误 $ python seq2species/run_training_test.py /home/dunan/anaconda3/envs/python2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Running tests under Python 2.7.15: /home/dunan/anaconda3/envs/python2/bin/python [ RUN ] RunTrainingTest.test_run_training(['test_target_1']) Current Hyperparameters: lrelu_slope : 0.0 min_read_length : 5 optimizer : adam filter_widths : [3] pooling_type : avg keep_prob : 1.0 lr_init : 0.001 weight_scale : 1.0 use_depthwise_separable : True num_fc_units : 455 pointwise_depths : [64] grad_clip_norm : 20.0 lr_decay : 0.1 num_fc_layers : 2 optimizer_hp : 0.9 train_steps : 10 filter_depths : [1] Constructing TensorFlow Graph. Starting model training. I0924 19:17:14.124703 140026229511936 tf_logging.py:115] Create CheckpointSaverHook. I0924 19:17:14.242932 140026229511936 tf_logging.py:115] Graph was finalized. 2018-09-24 19:17:14.243126: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-09-24 19:17:14.335339: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-09-24 19:17:14.335705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721 pciBusID: 0000:01:00.0 totalMemory: 10.92GiB freeMemory: 10.25GiB 2018-09-24 19:17:14.335720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-09-24 19:17:15.268438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-24 19:17:15.268464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-09-24 19:17:15.268470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-09-24 19:17:15.268897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9910 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) I0924 19:17:15.436043 140026229511936 tf_logging.py:115] Running local_init_op. I0924 19:17:15.459621 140026229511936 tf_logging.py:115] Done running local_init_op. I0924 19:17:15.692641 140026229511936 tf_logging.py:115] Saving checkpoints for 0 into /tmp/absl_testing/train:1/model.ckpt. 2018-09-24 19:17:16.151795: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2018-09-24 19:17:16.151853: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 390.87.0 Segmentation fault (core dumped)

源代码/日志

同时,我也使用张量流后端Keras,没有问题 .

我试图安装Nvidia驱动程序384并遇到同样的问题,我也尝试过Cudnn 7.1