TensorFlow多GPU InvalidArgumentError：cifar10_multi

我尝试用多GPUS训练我的模型 . 所以我运行cifar10_multi_gpu.py（https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py） .

1.我的位置：

OS平台：Linux版本3.10.0-327.el7.x86_64

已安装TensorFlow：pip install --upgrade ./tensorflow_gpu-1.0.0rc0-cp35-cp35m-linux_x86_64.whl

Python版本：Python 3.5.2

CUDA / cuDNN版本：cuda_8.0.61_375.26_linux.run / cudnn-8.0-linux-x64-v5.1.tgz

2. GPU设置正确

使用tf.device（'/ cpu：0'）将tensorflow导入为tf：a = tf.constant（[1.0,2.0,3.0]，shape = [3]，name ='a'）

b = tf.constant（[1.0,2.0,3.0]，shape = [3]，name ='b'）
与tf.device（'/ gpu：1'）：c = a b
sess = tf.Session（config = tf.ConfigProto（log_device_placement = True））sess.run（c）add：（Add）：/ job：localhost / replica：0 / task：0 / gpu：1 I tensorflow / core / common_runtime / simple_placer.cc：841] add :(添加）/ job：localhost / replica：0 / task：0 / gpu：1 b：（Const）：/ job：localhost / replica：0 / task：0 / cpu： 0 I tensorflow / core / common_runtime / simple_placer.cc：841] b：（Const）/ job：localhost / replica：0 / task：0 / cpu：0 a：（Const）：/ job：localhost / replica：0 /任务：0 / cpu：0 I tensorflow / core / common_runtime / simple_placer.cc：841] a：（Const）/ job：localhost / replica：0 / task：0 / cpu：0 array（[2.，4， 6.]，dtype = float32）

3. InvalidArgumentError：python cifar10_multi_gpu.py

I tensorflow / core / common_runtime / simple_placer.cc：669]忽略设备规范/ GPU：0表示节点'tower_0 / fifo_queue_Dequeue'，因为'prefetch_queue / fifo_queue'的输入边是参考连接，并且已经将设备字段设置为/ CPU：0回溯（最近一次调用最后一次）：文件“/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py”，第1022行，在_do_call中返回fn（ * args）文件“/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py”，第1000行，在_run_fn self._extend_graph（）文件“/ home / xx / anaconda3 / lib / python3.5 / site-packages / tensorflow / python / client / session.py“，第1049行，在_extend_graph self._session，graph_def.SerializeToString（），status）文件”/ home / xx / anaconda3 / lib /python3.5/contextlib.py“，第66行，在exit next（self.gen）文件”/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py“ ，第469行，在raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode（status）中） tensorflow.python.framework.errors_impl.InvalidArgumentError：无法将设备分配给节点'tower_0 / softmax_linear / weight_loss_1'：无法满足显式设备规范'/ device：GPU：0'，因为没有支持GPU设备的内核可用 . [[节点：tower_0 / softmax_linear / weight_loss_1 = ScalarSummary [T = DT_FLOAT，_device =“/ device：GPU：0”]（tower_0 / softmax_linear / weight_loss_1 / tags，tower_0 / softmax_linear / weight_loss）]]处理上述异常时，发生另一个异常：Traceback（最近一次调用最后一次）：文件“cifar10_multi_gpu_train.py”，第280行，在tf.app.run（）文件“/home/xx/anaconda3/lib/python3.5/site-packages/ tensorflow / python / platform / app.py“，第44行，在运行_sys.exit（main（_sys.argv [：1] flags_passthrough））文件”cifar10_multi_gpu_train.py“，第276行，在主列车（）文件”cifar10_multi_gpu_train .py“，第237行，在train sess.run（init）文件”/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py“，第767行，在运行中run_metadata_ptr）文件“/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py”，第965行，在_run feed_dict_string，options，run_metadata中）文件“/ home / xx / anaconda3 / lib / python3.5 / site-packages / tensorflow / python / client / session.py“，line 1015，在_do_run target_list，options，run_metadata）文件“/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py”，第1035行，在_do_call中提升类型（e）（node_def，op，message）tensorflow.python.framework.errors_impl.InvalidArgumentError：无法将设备分配给节点'tower_0 / softmax_linear / weight_loss_1'：无法满足显式设备规范'/ device：GPU：0'，因为没有支持的内核GPU设备可用 . [[节点：tower_0 / softmax_linear / weight_loss_1 = ScalarSummary [T = DT_FLOAT，_device =“/ device：GPU：0”]（tower_0 / softmax_linear / weight_loss_1 / tags，tower_0 / softmax_linear / weight_loss）]]

我尝试了很多解决方案但失败了在此先感谢您的任何建议 .

1 回答

0
对不起，你遇到了问题！我和那个剧本的原作者之一进行了核实，这是他的回答：

看起来设备放置效果不佳 .
- 根据作者的测试，他检查了他是否可以访问“cpu：0”和“gpu：1”，但他从未检查过“gpu：0” . 我会检查一下 .
- 作者还应在SessionConfig中设置allow_soft_placement = True，以允许放宽设备放置 .
回复于 2024-04-27T10:13:04+08:00

TensorFlow多GPU InvalidArgumentError：cifar10_multi_gpu.py

1.我的位置：

2. GPU设置正确

3. InvalidArgumentError：python cifar10_multi_gpu.py

1 回答

相关问题