我的训练脚本,用于训练TensorFlow模型,在线教程略有修改:
def train(data_set_dir, train_set_dir):
data = data_input.read_data_sets(data_set_dir, train_set_dir)
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable=False)
# defines placeholders (type=tf.float32)
images_placeholder, labels_placeholder = placeholder_inputs(batch_size, image_size, channels)
logits = model.inference(images_placeholder, num_classes)
loss = loss(logits, labels_placeholder, num_classes)
train_op = training(loss, global_step, batch_size)
saver = tf.train.Saver(tf.all_variables())
summary_op = tf.merge_all_summaries()
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
for step in range(max_steps):
start_time = time.time()
feed_dict = fill_feed_dict(data, images_placeholder, labels_placeholder, batch_size)
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
# ... continue to print loss_value, run summaries and save checkpoints
上面调用的placeholder_inputs函数是:
def placeholder_inputs(batch_size, img_size, channels):
images_pl = tf.placeholder(tf.float32,
shape=(batch_size, img_size, img_size, channels), name='images')
labels_pl = tf.placeholder(tf.float32,
shape=(batch_size, img_size, img_size), name='labels')
return images_pl, labels_pl
为了澄清,我正在处理的数据是针对分段问题中的每像素分类 . 如上所示,这是二元分类问题 .
而feed_dict函数是:
def fill_feed_dict(data_set, images_pl, labels_pl, batch_size):
images_feed, labels_feed = data_set.next_batch(batch_size)
feed_dict = {images_pl: images_feed, labels_pl: labels_feed}
return feed_dict
我被困在哪里:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'labels' with dtype float and shape [1,750,750]
[[Node: labels = Placeholder[dtype=DT_FLOAT, shape=[1,750,750], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
回溯显示它是由我的 placeholder_inputs
函数的'labels'张量引起的 . 此外,据我所知,这个错误在两个占位符之间不断变换 - 随机 . 有一次,它是'labels' [ labels_pl
]张量,另一次,它是我的'images' [ images_pl
]张量 .
错误详情:
File ".../script.py", line 32, in placeholder_inputs
shape=(batch_size, img_size, img_size), name='labels')
File ".../tensorflow/python/ops/array_ops.py", line 895, in placeholder
name=name)
File ".../tensorflow/python/ops/gen_array_ops.py", line 1238, in _placeholder
name=name)
File ".../tensorflow/python/ops/op_def_library.py", line 704, in apply_op
op_def=op_def)
File ".../tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()
我尝试/检查过的内容:
-
将feed_dict放在for循环之外也无济于事 .
-
已验证训练数据目录中有足够的数据与batch_size要求相对应 .
-
指定占位符dtype的多种变体 - 假设'float'是stacktrace中的关键 .
-
交叉检查数据形状 . 它们与占位符中的指定完全相同 .
也许这比我想象的要简单得多 . 也许即使是一个小错字我也看不到这里 . 建议?我相信我已经筋疲力尽了 . 寻找有人为这个问题提供新的见解 .
我已经提到this错误的描述 .
Update:
在 session.run
之前 print feed_dict
(如此处的评论中所建议的那样)并注意到预期值正在被输入占位符:
{<tf.Tensor 'images:0' shape=(1, 750, 750, 3) dtype=float32>:
array([[[[-0.1556225 , -0.13209309, -0.15954407],
[-0.15954407, -0.12032838, -0.13601466],
.....
[-0.03405387, 0.04829907, 0.09535789]]]], dtype=float32),
<tf.Tensor 'labels:0' shape=(1, 750, 750) dtype=float32>:
array([[[ 0., 0., 0., ..., 0., 0., 0.],
.....
[ 0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)}
我之前没有提到的东西:循环第一次运行 . 所以,我得到 step = 0
的第一个值的输出,然后在打印 loss_value
为 step=0
指定的 loss_value
语句之后立即退出 .
Update 2:
我想到了问题所在 . 这是打印 summary_op
. 但是为什么这样做是超出我的 . 这是我在for循环中打印的方式:
if step % 100 == 0:
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
评论这个区块就可以了 . 关于为什么会出错的想法?
Update 3: Solved
答案如下 . 我注意到的是,TensorFlow CIFAR-10 example做了类似的 sess.run
,没有明确提到 feed_dict
,并且运行正常 . 它究竟是如何工作的呢?
1 回答
明显的错误 . 我没有在
summary_op
上为会话运行指定feed_dict
.在会话运行中明确提到
feed_dict
调用就行了 . 但为什么? TensorFlow CIFAR-10示例执行类似sess.run
,没有明确提及feed_dict
并且运行正常 .