我正在关注这个site以安装Jupyter Notebook,PySpark并集成两者 .
当我需要创建“Jupyter配置文件”时,我读到“Jupyter配置文件”不再存在 . 所以我继续执行以下几行 .
$ mkdir -p ~/.ipython/kernels/pyspark
$ touch ~/.ipython/kernels/pyspark/kernel.json
我打开了 kernel.json
并写了以下内容:
{
"display_name": "pySpark",
"language": "python",
"argv": [
"/usr/bin/python",
"-m",
"IPython.kernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7",
"PYTHONPATH": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python:/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip",
"PYTHONSTARTUP": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "pyspark-shell"
}
}
Spark的路径是正确的 .
但是,当我运行 jupyter console --kernel pyspark
时,我得到了这个输出:
MacBook:~ Agus$ jupyter console --kernel pyspark
/usr/bin/python: No module named IPython
Traceback (most recent call last):
File "/usr/local/bin/jupyter-console", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 595, in launch_instance
app.initialize(argv)
File "<decorator-gen-113>", line 2, in initialize
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 74, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 137, in initialize
self.init_shell()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 110, in init_shell
client=self.kernel_client,
File "/usr/local/lib/python2.7/site-packages/traitlets/config/configurable.py", line 412, in instance
inst = cls(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 251, in __init__
self.init_kernel_info()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 305, in init_kernel_info
raise RuntimeError("Kernel didn't respond to kernel_info_request")
RuntimeError: Kernel didn't respond to kernel_info_request
2 回答
最简单的方法是使用findspark . 首先创建一个环境变量:
然后安装findspark:
然后启动jupyter笔记本,以下应该工作:
将pyspark与jupyter笔记本集成的许多方法 .
你可以通过检查安装
你将获得toree pyspark内核的条目
之后如果需要,可以安装其他解压缩程序,如SparkR,Scala,SQL
在终端输入
pyspark
,它将打开一个初始化了sparkcontext的jupyter笔记本 .pip install pyspark
现在你可以像另一个python包一样导入pyspark .