我在 Windows 上遇到了很多关于Spark的问题 . 所以解释错误:
有很多教程可以安装和解决许多问题,但是我已经尝试了几个小时但仍然无法使其工作 .
我有 Java 8 ,我在 System Path
C:\>java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
我也有 Python 2.7 with Anaconda 4.4
C:\Program Files (x86)\Spark\python\dist>python -V
Python 2.7.13 :: Anaconda 4.4.0 (64-bit)
为了以防万一,我确实有 Scale , SBT 和 GOW .
C:\>scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
C:\>gow -version
Gow 0.8.0 - The lightweight alternative to Cygwin
C:\>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
> about
[info] This is sbt 0.13.15
所以进入安装:
-
首先我用包装类型 Pre-build for Apache Hadoop 2.7 and later 下载了Spark 2.1.1
-
我在某个文件夹中提取它,比如
C:\Programs\Spark
-
在Python文件夹上,我运行了
python setup.py sdist
,它应该为pip
制作合适的tgz文件 . -
进入
dist
,我跑了pip install NAME_OF_PACKAGE.tgz
. 那确实安装了它,因为如果conda list
:
C:\>conda list
# packages in environment at C:\Program Files (x86)\Anaconda2:
#
...
pyspark 2.1.1+hadoop2.7 <pip>
...
我确实有些疑惑,所以我去了Anaconda的 Scripts
和 site-packages
. 两者都有我的预期,在Scripts中有 pyspark
spark-shell
等等 . site-packages
处的 pyspark
文件夹也包含从 jars folder 到其自己的 bin folder 的所有内容,其中包含上述脚本 .
-
关于hadoop,我确实下载了 winutils.exe 并将其粘贴在
Spark's bin folder
上,这使得它也位于python's pyspark's bin folder
. -
考虑到这一点,我确实导入了pyspark没有问题:
C:\Users\Rolando Casanueva>python
Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import pyspark
>>>
FIRST QUESTION: Do I have to paste winutils.exe also at python's Scripts Folder?
进入主要情况,使用 pyspark
时会出现问题并引发此异常 .
C:\Users\Rolando Casanueva>python
Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import pyspark
>>> pyspark.SparkContext()
C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark
"Files" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
Failed to find Spark jars directory.
You need to build Spark before running this program.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 259, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\java_gateway.py", line 96, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>
-
我在本地模式下安装了火花StackOverFlow Answer to: How to set up Spark on Windows?
-
我确实像这样安装了火花 Youtube Tutorial
https://www.youtube.com/watch?v=omlwDosMGVk
- 我确实安装了spark作为jupyter补充
https://mas-dse.github.io/DSE230/installation/windows/
- 最后我尝试了如上 .
每次安装都会显示相同的错误 .
SECOND QUESTION: How to solve this issue?
EXTRA QUESTION: Any other recommendation to install it?