我正在运行分布式Tensorflow脚本 . 创建群集服务器时,我看到控制台中显示的信息如下所示:
E0805 20:51:03.294260965 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3 2017-08-05 20:51:03.299766:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]初始化GrpcChannelCache for job ps - > {0 - > localhost:2222} 2017-08-05 20:51:03.299790:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]初始化工作者的GrpcChannelCache - > {0 - > localhost:2223 2017-08-05 20:51:03.305220:I tensorflow / core / distributed_runtime / rpc / grpc_server_lib.cc:316]启动服务器的目标:grpc:// localhost:2223
在培训时,我遇到相同的信息,没有其他回应 .
E0805 20:52:45.889979901 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3
该信息打印自 with tf.Session("grpc://localhost:2223") as sess:
Tensorflow的版本: 1.3.0-rc0
,用bazel编译并适用于单机
Linux版本: Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty
Active Internet连接是:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:2222 0.0.0.0:* LISTEN 8321/python
tcp 0 0 0.0.0.0:2223 0.0.0.0:* LISTEN 8883/python
以下是创建群集服务器的示例代码
def main(_):
server = tf.train.Server(cluster,
job_name=FLAGS.job_name,
task_index=FLAGS.task_index)
server.join()
if __name__ == "__main__":
tf.app.run()
和培训代码
train_X = np.random.rand(100).astype(np.float32)
train_Y = train_X * 0.1 + 0.3
with tf.device("/job:worker/task:0"):
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0)
b = tf.Variable(0.0)
y = w * X + b
loss = tf.reduce_mean(tf.square(y - Y))
init_op = tf.global_variables_initializer()
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
with tf.Session("grpc://localhost:2223") as sess:
sess.run(init_op)
for i in range(500):
sess.run(train_op, feed_dict={X: train_Y, Y: train_Y})
print("after sess.run train")
if i % 50 == 0:
print i, sess.run(w), sess.run(b)
print sess.run(w)
print sess.run(b)
有谁知道如何修理它?谢谢 .