我写了一个简单的程序来读取HBase中的数据,该程序在HDFS支持的Cloudera中找到 .

但是在使用S3测试EMR上的数据时获得异常 .

// Spark conf
        SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        // Hbase conf
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","localhost");
        conf.set("hbase.zookeeper.property.client.port","2181");
        // Submit scan into hbase conf
 //       conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan));

        conf.set(TableInputFormat.INPUT_TABLE, "mytable");
        conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
        conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

        // Get RDD
        JavaPairRDD<ImmutableBytesWritable, Result> source = jsc
                .newAPIHadoopRDD(conf, TableInputFormat.class,
                        ImmutableBytesWritable.class, Result.class);

        // Process RDD
        System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count());

18/05/04 00:22:02 INFO MetricRegistries:Loaded MetricRegistries类org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 00:22:02 ERROR TableInputFormat:java.io.IOException:java .lang.reflect.InvocationTargetException在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)产生的原因:在sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)java.lang.reflect.InvocationTargetException在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)在java.lang.reflect.Constructor.newInstance(Constructor.java:423)在org.apache .hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)产生的原因:java.lang.IllegalAccessError:试图从类org.apache.hadoop.metrics2访问类org.apache.hadoop.metrics2.lib.MetricsInfoImpl .lib.DynamicMetricsRegistr的Y org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139)在org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl . (MetricsZooKeeperSourceImpl.java:59)在org.apache.hadoop.hbase .zookeeper.MetricsZooKeeperSourceImpl . (MetricsZooKeeperSourceImpl.java:51)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl . java:45)java.lang.ref中的java.lang.reflect.Constructor.newInstance(Constructor.java:423)java.util.ServiceLoader上的java.lang.Class.newInstance(Class.java:442)$ LazyIterator.nextService(ServiceLoader.java) :380)... 42更多线程“main”中的异常java.io.IOException:由于先前的错误,无法创建记录阅读器 . 请看前面的内容
记录任务完整日志中的行以获取更多详细信息 . 在处org.apache.spark.rdd org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270) .NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)at org.apache.spark.rdd.RDD $$ anonfun $在org.apache.spark的org.apache.spark.rdd.RDD.partitions(RDD.scala:250)的scala.Option.getOrElse(Option.scala:121)处分区$ 2.apply(RDD.scala:250) . 位于org.apache.spark.api.api.java.JavaRDDLike $ class.count的org.apache.spark.rdd.RDD.count(RDD.scala:1158)中的SparkContext.runJob(SparkContext.scala:2094)(JavaRDDLike.scala: 455)atg.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)at HbaseScan.main(HbaseScan.java:60)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect sun.reflect.DelegatingMethodAccessorImpl.in中的.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) voke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain( SparkSubmit.scala:775)org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at org . org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)中的apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:119)引起:java.lang.IllegalStateException:输入格式实例尚未正确初始化 . 确保你叫initializeTable无论是在你的构造函数或org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652)在org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits初始化方法(TableInputFormatBase.java :265)... 20多个使用所有APACHE HBASE LIBS:e.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 04:05:54错误TableInputFormat:java.io.IOException:java.lang.reflect . Org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)中的InvocationTargetException位于org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)at atorg.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)位于org.apache.hadoop.hbase的org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:202) . org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)中的mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:259) 125)在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala: 250)在org.apache.spark.SparkContext.runJob的org.apache.spark.rdd.RDD.partitions(RDD.scala:250)的scala.Option.getOrElse(Option.scala:121)(SparkContext.scala:2094) )org.apache.spark.rdd.RDD.count(RDD.scala:1158)org.apache.apark.api.java.JavaRDDLike $ class.count(JavaRDDLike.scala:455)org.apache.spark . 位于HbaseScan.main(HbaseScan.java:60)的api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) un.reflect.NativeMethodAccessorImpl.invoke0(本地方法)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method . 在org.apache.spark.deploy.SparkSubmit $ org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:775)调用(Method.java:498) .doRunMain $ 1(SparkSubmit.scala:180)org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:119)在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)引起:sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl . )中的java.lang.reflect.InvocationTargetException . java:62)在java的sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) orng.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)中的.lang.reflect.Constructor.newInstance(Constructor.java:423)... 24更多引起:java.lang.RuntimeException:无法创建接口org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSource是否在类路径上有hadoop兼容性jar? org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:75)位于org.apache.hadoop.hbase.zookeeper的org.apache.hadoop.hbase.zookeeper.MetricsZooKeeper . (MetricsZooKeeper.java:38) . RecoverableZooKeeper . (RecoverableZooKeeper.java:130)org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:143)org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher . (ZooKeeperWatcher.java:181) org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher . (ZooKeeperWatcher.java:155)org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection . (ZooKeeperKeepAliveConnection.java:43)org.apache.hadoop.hbase.client .ConnectionManager $ HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1737)