我有以下代码几乎直接来自火花流教程 . SPARK STREAMING TUTORIAL
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
# Create a local StreamingContext with two working thread and batch interval of 1 second
conf = SparkConf().setAppName("TITLE")
sc = SparkContext(conf = conf)
ssc = StreamingContext(sc, 1)
# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)
words = lines.flatMap(lambda line: line.split(" "))
# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)
# Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.pprint()
ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate
当我在群集模式下在Amazon Web Service(具有2个从属节点的EMR)上运行此操作时,它无法连接到在我的主节点中打开的我的localhost端口 . 在主节点中,我首先执行 .
nc -lk 9999
然后我在另一个终端上进入master并使用该命令运行上面的脚本
spark-submit --num-executors 2 --executor-cores 2 test_1.py
在此之后,pyspark无法连接到localhost端口9999,我收到以下错误
Restarting receiver with delay 2000ms: Error connecting to localhost:9999
谢谢!