我创建了一个 RDD ,当我尝试使用该RDD时,会发生以下错误 . 我试图打印或 collect() 这个 RDD 但仍然有这个错误 .

print('ALS Model')
sc = SparkContext()
def parse_raw_string(raw_string):
    user_inventory = json.loads(raw_string)
    return list(user_inventory.items())[0]
def id_index(x):
    ((user_id,lst_inventory),index) = x
    return (index, user_id)
def create_tuple(x):
    ((user_id,lst_inventory),index) = x
    if lst_inventory != None:
        return (index, [(i.get('appid'), 1) for i in lst_inventory if str(i.get('appid')) in set_valid_game_id])
    else:
        return (index, [])
def reshape(x):
    (index,(appid,time)) = x
    return (index,appid,1)

user_inventory_rdd = sc.textFile(path_user_inventory).map(parse_raw_string).zipWithIndex()
dic_id_index = user_inventory_rdd.map(id_index).collectAsMap()
training_rdd = user_inventory_rdd.map(create_tuple).flatMapValues(lambda x: x).map(reshape)
model = ALS.train(training_rdd, 5)

Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误 . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段12.0中的任务0失败1次,最近失败:阶段12.0中丢失任务0.0(TID 21,localhost, Actuator 驱动程序):org.apache.spark .api.python.PythonException:Traceback(最近一次调用最后一次):