这是我的逻辑回归,我试图创建一个PipelineModel .
log_reg = LogisticRegression(maxIter = 10, regParam = 0.3, elasticNetParam = 0.8)
class_formula = RFormula(formula="tipped ~ \
pickup_hour \
+ passenger_count \
+ trip_time_in_secs \
+ trip_distance \
+ fare_amount \
+ vendor_vec \
+ payment_vec \
+ rate_vec \
+ time_bins_vec"
, featuresCol = "features", labelCol = "label")
model = Pipeline(stages=[class_formula, log_reg]).fit(train)
我可以删除[,featuresCol =“features”,labelCol =“label”],但它仍然无效 .
Py4JJavaError:调用o341.fit时发生错误 . :org.apache.spark.SparkException:作业因阶段失败而中止:阶段24.0中的任务64失败4次,最近失败:阶段24.0中失去任务64.3(TID 1040,ip-172-31-70-80.ec2 .internal,executor 5):ExecutorLostFailure(执行者5退出由其中一个正在运行的任务引起)原因:容器被YARN杀死超过内存限制 . 使用5.6 GB的5.5 GB物理内存 . 考虑提升spark.yarn.executor.memoryOverhead . 驱动程序堆栈跟踪: