我在Yarn Cluster上运行我的Spark应用程序 . 无论我做什么,我都无法打印RDD功能中的日志 . 您可以在下面找到我为RDD处理函数编写的示例代码段 . 我简化了代码来说明我用来编写函数的语法 . 当我在本地运行它时,我能够看到日志但不能在群集模式下查看 . System.err.println和 Logger 似乎都没有工作 . 但我可以看到我的所有驱动程序日志 . 我甚至尝试使用Root Logger 进行记录,但它在RDD处理函数中根本不起作用 . 我非常想看到日志消息,所以最后我找到了一个使用logger作为瞬态的指南(https://www.mapr.com/blog/how-log-apache-spark),但事件没有没有帮助
class SampleFlatMapFunction implements PairFlatMapFunction <Tuple2<String,String>,String,String>{
private static final long serialVersionUID = 6565656322667L;
transient Logger executorLogger = LogManager.getLogger("sparkExecutor");
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
executorLogger = LogManager.getLogger("sparkExecutor");
}
@Override
public Iterable<Tuple2<String,String>> call(Tuple2<String, String> tuple) throws Exception {
executorLogger.info(" log testing from executorLogger ::");
System.err.println(" log testing from executorLogger system error stream ");
List<Tuple2<String, String>> updates = new ArrayList<>();
//process Tuple , expand and add it to list.
return updates;
}
};
我的Log4j配置如下
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=/var/log/spark/spark.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n
# By default, everything goes to console and file
log4j.rootLogger=INFO, RollingAppender, console
# My custom logging goes to another file
log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
我已经尝试过纱线日志,Spark UI Logs无处可查看RDD处理功能的日志语句 . 我试过下面的方法,但它没有用
yarn logs -applicationId
I checked even below HDFS path also
/tmp/logs/
我通过传递下面的参数来运行我的spark-submit命令,即使那样它也不起作用
--master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
有人可以指导我记录火花RDD和 Map 功能吗?我在上述步骤中遗漏了什么?