我正在尝试使用jersey Rest-API通过java-Spark程序从HBASE表中获取记录然后我得到下面提到的错误,但是当我通过spark-Jar访问HBase表时,代码正在执行而没有错误 .
我有一个2个工作节点用于Hbase,2个工作节点用于火花,由同一个Master维护 .
WARN TaskSetManager:阶段0.0中的丢失任务1.0(TID 1,172.31.16.140):java.lang.IllegalStateException:java.io.ObjectInputStream中的未读块数据$ java.io.ObjectInputStream中的$ BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) .readObject0(ObjectInputStream.java:1382)java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream . java:1798)at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala :69)atg.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:194)at java.util.concurrent . java的ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.lang.Thread.run中的.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)(Thread.java:745)
2 回答
好吧,我可能知道你的问题,因为我刚刚经历过 .
原因很可能是错过了一些hbase jar,因为在spark runing期间,spark需要通过hbase jar来读取数据,如果不存在,那么会抛出一些异常,你该怎么办?这很容易
在提交作业之前,你需要添加params --jars并加入以下的jar:
--jars /ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
如果能,享受它!
当提交用java api实现的spark作业时,我在CDH5.4.0中遇到了同样的问题,这是我的解决方案:
solution 1:Using spark-submit
solution 2:Use SparkConf in code
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar", "hbase-client-1.0.0-cdh5.4.0.jar", "hbase-common-1.0.0-cdh5.4.0.jar", "hbase-server1.0.0-cdh5.4.0.jar", "hbase-protocol1.0.0-cdh5.4.0.jar", "htrace-core-3.1.0-incubating.jar", // custom jars which are needed in the spark executors });
To summary
问题是由于火花项目中缺少jar,你需要将这些jar添加到项目类路径中,此外,使用上述2个解决方案来帮助将这些jar分发给你的spark集群 .