无法运行分布式imagenet初始模型（连接失败）-Java 学习之路

我使用2个Ubuntu服务器来运行分布式tensorflow . 每个服务器安装tensorflow 0.8.0 .

我首先在server1上启动ps服务器：```ubuntu @ i-mxdcqm20：/ data1T5 / org_models / inception $ sudo bazel-bin / inception / imagenet_distributed_train \

--job_name ='ps'\ --task_id = 0 \ --ps_hosts = '43 .254.55.221：2222'\ --worker_hosts = '61 .160.41.85：2222'``，

日志显示：

INFO:tensorflow:PS hosts are: ['43.254.55.221:2222'] INFO:tensorflow:Worker hosts are: ['61.160.41.85:2222'] I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job ps -> {localhost:2222} I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job worker -> {61.160.41.85:2222} I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222

当我运行 sudo netstat -tunlp 时，服务器实际上正在侦听端口2222：

tcp6 0 0 :::2222 :::* LISTEN 3525/python

但是当我在server2上启动worker时，它仍然报告无法连接： E0722 10:35:01.142377237 4045 tcp_client_posix.c:191] failed to connect to 'ipv4:43.254.55.221:2222': timeout occurred

我正在根据自述文件运行代码here并且我没有更改任何代码 .

无法运行分布式imagenet初始模型（连接失败）

相关问题