首页 文章

Spark,kerberos,yarn-cluster - >与hbase的连接

提问于
浏览
0

面对Kerberos启用Hadoop集群的一个问题 .

我们正在尝试在与Kafka(直接流)和hbase交互的yarn-cluster上运行流媒体作业 .

不知何故,我们无法以群集模式连接到hbase . 我们使用keytab登录hbase .

这就是我们的工作:

spark-submit --master yarn-cluster --keytab "dev.keytab" --principal "dev@IO-INT.COM"  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j_executor_conf.properties -XX:+UseG1GC" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j_driver_conf.properties -XX:+UseG1GC" --conf spark.yarn.stagingDir=hdfs:///tmp/spark/ --files "job.properties,log4j_driver_conf.properties,log4j_executor_conf.properties" service-0.0.1-SNAPSHOT.jar job.properties

要连接到hbase:

def getHbaseConnection(properties: SerializedProperties): (Connection, UserGroupInformation) = {


    val config = HBaseConfiguration.create();
    config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_VALUE);
    config.set("hbase.zookeeper.property.clientPort", 2181);
    config.set("hadoop.security.authentication", "kerberos");
    config.set("hbase.security.authentication", "kerberos");
    config.set("hbase.cluster.distributed", "true");
    config.set("hbase.rpc.protection", "privacy");
   config.set("hbase.regionserver.kerberos.principal", “hbase/_HOST@IO-INT.COM”);
    config.set("hbase.master.kerberos.principal", “hbase/_HOST@IO-INT.COM”);

    UserGroupInformation.setConfiguration(config);

     var ugi: UserGroupInformation = null;
      if (SparkFiles.get(properties.keytab) != null
        && (new java.io.File(SparkFiles.get(properties.keytab)).exists)) {
        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
          SparkFiles.get(properties.keytab));
      } else {
        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
          properties.keytab);
      }


    val connection = ConnectionFactory.createConnection(config);
    return (connection, ugi);
  }

我们连接到hbase:....

foreachRDD { rdd =>
      if (!rdd.isEmpty()) {
        //var ugi: UserGroupInformation = Utils.getHbaseConnection(properties)._2
        rdd.foreachPartition { partition =>
          val connection = Utils.getHbaseConnection(propsObj)._1
          val table = …
          partition.foreach { json =>

          }
          table.put(puts)
          table.close()
          connection.close()
        }
      }
    }

Keytab文件没有被复制到yarn staging / temp目录,我们没有在SparkFiles.get中得到它...如果我们用--files传递keytab,spark-submit失败,因为它已经在--keytab中了 .

1 回答

  • 0

    错误是:

    此服务器位于发生故障的服务器列表中:myserver.test.com/120.111.25.45:60020 RpcRetryingCaller {globalStartTime = 1497943263013,pause = 100,retries = 5},org.apache.hadoop.hbase.ipc.FailedServerException:此服务器是在失败的服务器列表中:myserver.test.com/120.111.25.45:60020 RpcRetryingCaller {globalStartTime = 1497943263013,pause = 100,retries = 5},org.apache.hadoop.hbase.ipc.FailedServerException:此服务器处于失败状态服务器列表:myserver.test.com/120.111.25.45:60020 atg.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)org.apache.hadoop.hbase.client.HTable.get( HTable.java:935)

相关问题