我已经按照here中的步骤在 Ubuntu 16.04 机器上启用了带有theano的gpu . 我安装了cuda toolkit,cudnn,drivers但是我仍然无法让它工作 . 当我跑:

from theano.sandbox import cuda
cuda.use("gpu0")

我得到以下异常:

Exception: ('The following error happened while compiling the node', GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>), '\n', 'nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -arch=sm_50 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/usr/include -I/home/...thon2.7/site-packages/numpy/core/include -I/home/.../python2.7 -I/home/.../python2.7/site-packages/theano/gof -I/home/.../python2.7/site-packages/theano/sandbox/cuda -L/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -L/home/... -o /home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpYVhcOM/544270fe7a21a748315f83abfe0913cc.so mod.cu -lcudart -lcublas -lcuda_ndarray -lpython2.7', '[GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)]')

我该如何解决这个错误?

这是完整的错误跟踪:

Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> from theano.sandbox import cuda
>>> cuda.use("gpu0")
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

/home/.../python2.7/site-packages/theano/sandbox/cuda/__init__.py:556: UserWarning: Theano flag device=gpu* (old gpu back-end) only support floatX=float32. You have floatX=float64. Use the new gpu back-end with device=cuda* for that value of floatX.
  warnings.warn(msg)
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '8175' (I am process '2320')


In file included from /home/.../python2.7/Python.h:8:0,
                 from mod.cu:1:
/home/.../python2.7/pyconfig.h:1190:0: warning: "_POSIX_C_SOURCE" redefined
 #define _POSIX_C_SOURCE 200112L
 ^
In file included from /usr/include/host_config.h:161:0,
                 from /usr/include/cuda_runtime.h:76,
                 from <command-line>:0:
/usr/include/features.h:228:0: note: this is the location of the previous definition
 # define _POSIX_C_SOURCE 200809L
 ^
In file included from /home/.../python2.7/Python.h:8:0,
                 from mod.cu:1:
/home/.../python2.7/pyconfig.h:1212:0: warning: "_XOPEN_SOURCE" redefined
 #define _XOPEN_SOURCE 600
 ^
In file included from /usr/include/host_config.h:161:0,
                 from /usr/include/cuda_runtime.h:76,
                 from <command-line>:0:
/usr/include/features.h:169:0: note: this is the location of the previous definition
 # define _XOPEN_SOURCE 700
 ^
In file included from /home/.../python2.7/Python.h:8:0,
                 from mod.cu:1:
/home/.../python2.7/pyconfig.h:1190:0: warning: "_POSIX_C_SOURCE" redefined
 #define _POSIX_C_SOURCE 200112L
 ^
In file included from /usr/include/host_config.h:161:0,
                 from /usr/include/cuda_runtime.h:76,
                 from <command-line>:0:
/usr/include/features.h:228:0: note: this is the location of the previous definition
 # define _POSIX_C_SOURCE 200809L
 ^
In file included from /home/.../python2.7/Python.h:8:0,
                 from mod.cu:1:
/home/.../python2.7/pyconfig.h:1212:0: warning: "_XOPEN_SOURCE" redefined
 #define _XOPEN_SOURCE 600
 ^
In file included from /usr/include/host_config.h:161:0,
                 from /usr/include/cuda_runtime.h:76,
                 from <command-line>:0:
/usr/include/features.h:169:0: note: this is the location of the previous definition
 # define _XOPEN_SOURCE 700
 ^
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
   return (char *) memcpy (__dest, __src, __n) + __n;
                                          ^
mod.cu: In member function ‘int _GLOBAL__N__38_tmpxft_00000948_00000000_9_mod_cpp1_ii_ae46f2fe::__struct_compiled_op_544270fe7a21a748315f83abfe0913cc::run()’:
mod.cu:388:172: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘size_t {aka long unsigned int}’ [-Wformat=]

['nvcc', '-shared', '-O3', '-arch=sm_50', '-m64', '-Xcompiler', '-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-I/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-I/usr/include', '-I/home/...thon2.7/site-packages/numpy/core/include', '-I/home/.../python2.7', '-I/home/.../python2.7/site-packages/theano/gof', '-I/home/.../python2.7/site-packages/theano/sandbox/cuda', '-L/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray', '-L/home/...', '-o', '/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpYVhcOM/544270fe7a21a748315f83abfe0913cc.so', 'mod.cu', '-lcudart', '-lcublas', '-lcuda_ndarray', '-lpython2.7']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.../python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 593, in use
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/home/.../python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 32, in test_nvidia_driver1
    profile=False)
  File "/home/.../python2.7/site-packages/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/home/.../python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/home/.../python2.7/site-packages/theano/compile/function_module.py", line 1795, in orig_function
    defaults)
  File "/home/.../python2.7/site-packages/theano/compile/function_module.py", line 1661, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "/home/.../python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "/home/.../python2.7/site-packages/theano/gof/vm.py", line 1047, in make_all
    impl=impl))
  File "/home/.../python2.7/site-packages/theano/gof/op.py", line 935, in make_thunk
    no_recycling)
  File "/home/.../python2.7/site-packages/theano/gof/op.py", line 839, in make_c_thunk
    output_storage=node_output_storage)
  File "/home/.../python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
    keep_lock=keep_lock)
  File "/home/.../python2.7/site-packages/theano/gof/cc.py", line 1131, in __compile__
    keep_lock=keep_lock)
  File "/home/.../python2.7/site-packages/theano/gof/cc.py", line 1586, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/home/.../python2.7/site-packages/theano/gof/cmodule.py", line 1159, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/home/.../python2.7/site-packages/theano/gof/cc.py", line 1489, in compile_cmodule
    preargs=preargs)
  File "/home/.../python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.py", line 405, in compile_str
    'for cmd', ' '.join(cmd))
Exception: ('The following error happened while compiling the node', GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>), '\n', 'nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -arch=sm_50 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/usr/include -I/home/...thon2.7/site-packages/numpy/core/include -I/home/.../python2.7 -I/home/.../python2.7/site-packages/theano/gof -I/home/.../python2.7/site-packages/theano/sandbox/cuda -L/home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -L/home/... -o /home/...ledir_Linux-4.10--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/tmpYVhcOM/544270fe7a21a748315f83abfe0913cc.so mod.cu -lcudart -lcublas -lcuda_ndarray -lpython2.7', '[GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)]')