自定义数据集上的对象检测培训(店面前端的图像),对于单个类(总共285个图像),在CPU上本地运行,具有8GB RAM,经过几个步骤后得到 killed .

我正在关注这个blog作为参考 .

这是控制台日志

(tensorflow) rajaram@rajaram-Lenovo-ideapad-110-15ISK:~/tensorflow/models$ python object_detection/train.py \
>     --logtostderr \
>     --pipeline_config_path=/home/rajaram/tensorflow/models/object_detection/models/sf_od_model/ssd_mobilenet_v1_sf_train.config \
>     --train_dir=/home/rajaram/tensorflow/models/object_detection/models/sf_od_model/train
INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead.
WARNING:tensorflow:From /home/rajaram/tensorflow/models/object_detection/meta_architectures/ssd_meta_arch.py:607: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2017-09-26 22:15:08.121785: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-26 22:15:08.122313: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-26 22:15:08.123308: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-26 22:15:08.124144: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-26 22:15:08.124658: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-26 22:15:08.953929: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0
INFO:tensorflow:Restoring parameters from /home/rajaram/tensorflow/models/object_detection/models/sf_od_model/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /home/rajaram/tensorflow/models/object_detection/models/sf_od_model/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Saving checkpoint to path /home/rajaram/tensorflow/models/object_detection/models/sf_od_model/train/model.ckpt
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:global_step/sec: 0.00238991
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global step 1: loss = 14.4365 (801.196 sec/step)
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:global step 2: loss = 12.9940 (173.981 sec/step)
INFO:tensorflow:Recording summary at step 2.
INFO:tensorflow:Recording summary at step 3.
INFO:tensorflow:global step 3: loss = 12.4866 (166.656 sec/step)
INFO:tensorflow:Saving checkpoint to path /home/rajaram/tensorflow/models/object_detection/models/sf_od_model/train/model.ckpt
INFO:tensorflow:Recording summary at step 3.
INFO:tensorflow:Saving checkpoint to path /home/rajaram/tensorflow/models/object_detection/models/sf_od_model/train/model.ckpt
INFO:tensorflow:global step 4: loss = 11.2386 (162.260 sec/step)
INFO:tensorflow:Recording summary at step 4.
INFO:tensorflow:Recording summary at step 4.
INFO:tensorflow:Recording summary at step 5.
INFO:tensorflow:global step 5: loss = 10.8210 (416.903 sec/step)
INFO:tensorflow:Recording summary at step 5.
Killed
(tensorflow) rajaram@rajaram-Lenovo-ideapad-110-15ISK:~/tensorflow/models$

我的想法和问题

1) Are image sizes an issue ? - 我的图像分布如下:<= 400x300(5%),400x300和640x480(22%)之间,640x480和800x600(63%)之间以及> 800x600(22%) . 虽然尺寸大约为400x300的图像足以识别商店,但我的数据集中的分辨率更大,因为下一步是在这些板上进行文本识别 .

  • 这是正确的吗?

  • 我应该将图像调整到较小的尺寸(如果是 - 尺寸是好的)并在重新开始整个过程之前重新进行注释?

我可以训练牛津-IIIT宠物数据(~7.9k图像 - 花了大约13个小时)2000步(在配置文件的train_config部分中num_steps = 2000),没有崩溃或被杀死 . 所以,我认为只有285个图像应该可以在CPU本身上运行 .

2) Is swap memory an issue? - 我还检查了其他帖子(increase swap space suggestionno follow-upanother increase swap memory suggestion),这些帖子在类似的行上,但由于我可以在我当前的系统设置上训练Oxford-IIIT宠物数据集,所以训练只有285张图像不应该杀了这个过程 .

  • 我的想法是否正确?

  • 如果没有,这确实是一个解决方案,那么我需要指针和明确的步骤来做到这一点 .


我想知道出了什么问题并让它在本地运行 . 我希望我已经提供了足够的信息来获得帮助 . 如果没有,请告诉我需要什么 .

---------------------------

系统信息

  • What is the top-level directory of the model you are using: tensorflow / models(还没有更新到新的文件夹结构)

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): 是 - 最小的更改(关于Dat Trans's template for my own data set - Github

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.3 LTS

  • TensorFlow installed from (source or binary): 二进制(进入虚拟环境)

  • TensorFlow version (use command below): 1.3.0

  • Bazel version (if compiling from source):

  • CUDA/cuDNN version:

  • GPU model and memory:

  • Exact command to reproduce: python object_detection / train.py --logtostderr --pipeline_config_path = / home / rajaram / tensorflow / models / object_detection / models / sf_od_model / ssd_mobilenet_v1_sf_train.config --train_dir = / home / rajaram / tensorflow / models / object_detection / models / sf_od_model /火车