我正在训练一个完全卷积网络(FCN32),用于在Tesla K80上进行超过11G内存的语义分割 .
输入图像非常大:352x1216 . 网络结构如下所示 . 我使用了batch_size = 1,但仍然遇到out_of_memory错误 .
标准是nn.BCEWithLogitsLoss()
当我在CPU上运行时,网络工作正常 .
Layer (type) Output Shape # Param
Conv2d-1 [-1, 64, 352, 1216] 1,792
Conv2d-2 [-1, 64, 352, 1216] 36,928
MaxPool2d-3 [-1, 64, 176, 608] 0
Conv2d-4 [-1, 128, 176, 608] 73,856
Conv2d-5 [-1, 128, 176, 608] 147,584
MaxPool2d-6 [-1, 128, 88, 304] 0
Conv2d-7 [-1, 256, 88, 304] 295,168
Conv2d-8 [-1, 256, 88, 304] 590,080
Conv2d-9 [-1, 256, 88, 304] 590,080
MaxPool2d-10 [-1, 256, 44, 152] 0
Conv2d-11 [-1, 512, 44, 152] 1,180,160
Conv2d-12 [-1, 512, 44, 152] 2,359,808
Conv2d-13 [-1, 512, 44, 152] 2,359,808
MaxPool2d-14 [-1, 512, 22, 76] 0
Conv2d-15 [-1, 512, 22, 76] 2,359,808
Conv2d-16 [-1, 512, 22, 76] 2,359,808
Conv2d-17 [-1, 512, 22, 76] 2,359,808
MaxPool2d-18 [-1, 512, 11, 38] 0
Conv2d-19 [-1, 4096, 11, 38] 102,764,544
Conv2d-20 [-1, 4096, 11, 38] 16,781,312
Conv2d-21 [-1, 1, 11, 38] 4,097 ConvTranspose2d-22 [-1, 1, 352, 1216] 4,096
错误信息:
------------------------------------------------- -------------------------- RuntimeError Traceback(最近一次调用last)in()36 print(loss)37#torch.cuda.empty_cache( )---> 38 loss.backward()39 optimizer.step()40 /anaconda/envs/py35/lib/python3.5/site-packages/torch/tensor.py in backward(self,gradient,retain_graph,create_graph )91个产品 . 默认为False . 92“”“---> 93 torch.autograd.backward(self,gradient,retain_graph,create_graph)94 95 def register_hook(self,hook):/ anaconda / envs / py35 / lib / python3.5 / site-package /向后的torch / autograd / init.py(张量,grad_tensors,retain_graph,create_graph,grad_variables)88 Variable._execution_engine.run_backward(89张张量,grad_tensors,retain_graph,create_graph,---> 90 allow_unreachable = True)#allow_unreachable flag 91 92 RuntimeError:CUDA错误:内存不足
2 回答
我发现了原因......这与硬件有关 . 我改为另一台机器,错误消失了 .
通常这是因为 memory on your GPU . 如果你有更强大的GPU,你的问题可以解决(正如你在答案中提到的) .
但是如果你没有,你可以 scale down your images 大约
256*x
尺寸 . 这也是 performance 的好习惯 .