我创建了一个 RDD
,当我尝试使用该RDD时,会发生以下错误 . 我试图打印或 collect()
这个 RDD
但仍然有这个错误 .
print('ALS Model')
sc = SparkContext()
def parse_raw_string(raw_string):
user_inventory = json.loads(raw_string)
return list(user_inventory.items())[0]
def id_index(x):
((user_id,lst_inventory),index) = x
return (index, user_id)
def create_tuple(x):
((user_id,lst_inventory),index) = x
if lst_inventory != None:
return (index, [(i.get('appid'), 1) for i in lst_inventory if str(i.get('appid')) in set_valid_game_id])
else:
return (index, [])
def reshape(x):
(index,(appid,time)) = x
return (index,appid,1)
user_inventory_rdd = sc.textFile(path_user_inventory).map(parse_raw_string).zipWithIndex()
dic_id_index = user_inventory_rdd.map(id_index).collectAsMap()
training_rdd = user_inventory_rdd.map(create_tuple).flatMapValues(lambda x: x).map(reshape)
model = ALS.train(training_rdd, 5)
Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误 . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段12.0中的任务0失败1次,最近失败:阶段12.0中丢失任务0.0(TID 21,localhost, Actuator 驱动程序):org.apache.spark .api.python.PythonException:Traceback(最近一次调用最后一次):