这是我的逻辑回归,我试图创建一个PipelineModel .

log_reg = LogisticRegression(maxIter = 10, regParam = 0.3, elasticNetParam = 0.8)

class_formula = RFormula(formula="tipped ~ \
    pickup_hour \
    + passenger_count \
    + trip_time_in_secs \
    + trip_distance \
    + fare_amount \
    + vendor_vec \
    + payment_vec \
    + rate_vec \
    + time_bins_vec"
    , featuresCol = "features", labelCol = "label")

model = Pipeline(stages=[class_formula, log_reg]).fit(train)

我可以删除[,featuresCol =“features”,labelCol =“label”],但它仍然无效 .

Py4JJavaError:调用o341.fit时发生错误 . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段24.0中的任务64失败4次,最近失败:阶段24.0中失去任务64.3(TID 1040,ip-172-31-70-80.ec2 .internal,executor 5):ExecutorLostFailure(执行者5退出由其中一个正在运行的任务引起)原因:容器被YARN杀死超过内存限制 . 使用5.6 GB的5.5 GB物理内存 . 考虑提升spark.yarn.executor.memoryOverhead . 驱动程序堆栈跟踪: