首页 文章

学习率大于0.001会导致错误

提问于
浏览
1

我试图破解Udacity深度学习课程(作业3 - 正规化)和Tensorflow mnist_with_summaries.py教程中的代码 . 我的代码似乎运行正常

https://github.com/llevar/udacity_deep_learning/blob/master/multi-layer-net.py

但是有些奇怪的事情正在发生 . 分配都使用0.5的学习率,并且在某些时候引入指数衰减 . 但是,当我将学习率设置为0.001(衰减或不衰减)时,我放在一起的代码只运行良好 . 如果我将初始速率设置为0.1或更高,我会收到以下错误:

Traceback (most recent call last):
  File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 175, in <module>
    summary, my_accuracy, _ = my_session.run([merged, accuracy, train_step], feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
    e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Nan in summary histogram for: layer1/weights/summaries/HistogramSummary
     [[Node: layer1/weights/summaries/HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](layer1/weights/summaries/HistogramSummary/tag, layer1/weights/Variable/read)]]
Caused by op u'layer1/weights/summaries/HistogramSummary', defined at:
  File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 106, in <module>
    layer1, weights_1 = nn_layer(x, num_features, 1024, 'layer1')
  File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 79, in nn_layer
    variable_summaries(weights, layer_name + '/weights')
  File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 65, in variable_summaries
    tf.histogram_summary(name, var)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py", line 113, in histogram_summary
    tag=tag, values=values, name=scope)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 55, in _histogram_summary
    name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
    self._traceback = _extract_stack()

如果我将速率设置为0.001,则代码将运行至完成,测试精度为0.94 .

在Mac OS X上使用tensorflow 0.8 RC0 .

2 回答

  • 0

    看起来你的训练是分歧的(这会导致你获得无限或NaN) . 没有简单的解释为什么事情在某些条件下会发生分歧而在其他条件下不同,但通常较高的学习率使其更容易发生分歧 .

    Edit, Apr 17 您在 Histogram 摘要中收到 NaN ,这很可能意味着您的权重或激活中有 NaN . NaN 是由数值上不正确的计算引起的,即记录0并将结果乘以0.直方图中有一些错误,排除这一点,关闭摘要,看看你是否仍能训练到良好的准确性 .

    要关闭摘要,请替换此行merged = tf.merge_all_summaries()

    有了这个

    merged = tf.constant(1)
    

    并注释掉这一行

    test_writer.add_summary(summary)
    
  • 5

    你交叉熵:

    diff = y_ * tf.log(y)
    

    也许还要考虑案例0 * log(0)

    您可以将其更改为:

    cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))
    

    来源:Tensorflow NaN bug?

相关问题