我试图通过JAVA代码在SAP HANA上使用spark sql执行查询 . 在调用数据框对象上的任何操作时,我在调用 df.count();
时得到java.io.NotSerializableException.In下面的代码片段,抛出NotSerializableException .
public class SaphanaTest implements Serializable {
private static final long serialVersionUID = 1L;
public void call() {
SparkConf sparkconf = new SparkConf().set("spark.master", "local[*]");
SparkContext sc = new SparkContext(sparkconf);
HiveContext sqlContext = new HiveContext(sc);
try {
Class.forName("com.sap.db.jdbc.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
Map<String, String> options = new HashMap<String, String>();
options.put("url",
"jdbc:sap://<IP>:30015/system");
options.put("user", "SYSTEM");
options.put("password", "Saphana123");
options.put("dbtable", "SYSTEM.TEST1");
DataFrame df = sqlContext.load("jdbc", options);
df.registerTempTable("temp");
df = sqlContext.sql("select * from temp");
long count = df.count();
sc.stop();
}
public static void main(String[] args) {
SaphanaTest test = new SaphanaTest();
test.call();
}
}
错误堆栈跟踪:
线程“main”org.apache.spark.SparkException中的异常:org.apache.spark.util.ClosureCleaner $中的org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:315)中的任务不可序列化 . org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:305)org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:132)at org.apache.spark.SparkContext.clean( SparkContext.scala:1893)org.apache.spark.SparkContext.runJob(SparkContext.scala:1766)at org.apache.spark.rdd.RDD $$ anonfun $ toLocalIterator $ 1.org $ apache $ spark $ rdd $ RDD $ $ anonfun $$ collectPartition $ 1(RDD.scala:900)atg.apache.spark.rdd.RDD $$ anonfun $ toLocalIterator $ 1 $$ anonfun $ apply $ 30.apply(RDD.scala:902)at org.apache.spark .rdd.RDD $$ anonfun $ toLocalIterator $ 1 $$ anonfun $ apply $ 30.apply(RDD.scala:902)at scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)at scala.collection.convert .wrappers $ IteratorWrapper.hasNext(Wrappers.scala:29)at com.impetus.saphana.SaphanaTest.main(SaphanaTest.java:48 )引起:java.io.NotSerializableException:com.sap.db.jdbc.topology.Host序列化堆栈: - 对象不可序列化(类:com.sap.db.jdbc.topology.Host,值:172.26.52.54:30015 ) - writeObject data(类:java.util.ArrayList) - object(类java.util.ArrayList,[172.26.52.54:30015]) - writeObject数据(类:java.util.Hashtable) - object(类java.util) .Properties,{dburl = jdbc:sap://172.26.52.54:30015,user = SYSTEM,password = Saphana123,url = jdbc:sap://172.26.52.54:30015 /?system&user = SYSTEM&password = Saphana123,dbtable = SYSTEM .TEST1,hostlist = [172.26.52.54:30015]}) - 字段(类:org.apache.spark.sql.jdbc.JDBCRDD $$ anonfun $ getConnector $ 1,name:properties $ 1,类型:class java.util.Properties ) - object(类org.apache.spark.sql.jdbc.JDBCRDD $$ anonfun $ getConnector $ 1,)
有什么指针吗?在谷歌研究后,我发现了一个建议,使连接属性可序列化 . 但我不知道如何使它在spark中可序列化 .
感谢您的帮助 . 先感谢您 .
1 回答
从这个Blog帖子的注释部分解决了我的问题,您也可以尝试: