我有以下代码几乎直接来自火花流教程 . SPARK STREAMING TUTORIAL

from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext

# Create a local StreamingContext with two working thread and batch interval of 1 second
conf = SparkConf().setAppName("TITLE")
sc = SparkContext(conf = conf)
ssc = StreamingContext(sc, 1)

# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

words = lines.flatMap(lambda line: line.split(" "))

# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)

# Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.pprint()

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

当我在群集模式下在Amazon Web Service(具有2个从属节点的EMR)上运行此操作时,它无法连接到在我的主节点中打开的我的localhost端口 . 在主节点中,我首先执行 .

nc -lk 9999

然后我在另一个终端上进入master并使用该命令运行上面的脚本

spark-submit --num-executors 2 --executor-cores 2 test_1.py

在此之后,pyspark无法连接到localhost端口9999,我收到以下错误

Restarting receiver with delay 2000ms: Error connecting to localhost:9999

谢谢!