首页 文章

如何在SparkSQL中以编程方式连接到Hive Metastore?

提问于
浏览
14

我正在使用HiveContext和SparkSQL,我正在尝试连接到远程Hive Metastore,设置hive Metastore的唯一方法是在类路径中包含hive-site.xml(或将其复制到/ etc / spark / CONF /) .

有没有办法在不包含hive-site.xml的情况下以编程方式在java代码中设置此参数?如果是这样,使用什么是Spark配置?

4 回答

  • 1

    对于Spark 1.x,您可以设置:

    System.setProperty("hive.metastore.uris", "thrift://METASTORE:9083");
    
    final SparkConf conf = new SparkConf();
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new HiveContext(sc);
    

    要么

    final SparkConf conf = new SparkConf();
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new HiveContext(sc);
    hiveContext.setConf("hive.metastore.uris", "thrift://METASTORE:9083");
    

    Update If your Hive is Kerberized

    在创建HiveContext之前尝试设置它们:

    System.setProperty("hive.metastore.sasl.enabled", "true");
    System.setProperty("hive.security.authorization.enabled", "false");
    System.setProperty("hive.metastore.kerberos.principal", hivePrincipal);
    System.setProperty("hive.metastore.execute.setugi", "true");
    
  • 26

    在火花2.0 . 它看起来应该是这样的:

    别忘了用你的“hive.metastore.uris”代替 . 这假设您已经启动了hive Metastore服务(不是hiveserver) .

    val spark = SparkSession
              .builder()
              .appName("interfacing spark sql to hive metastore without configuration file")
              .config("hive.metastore.uris", "thrift://localhost:9083") // replace with your hivemetastore service's thrift url
              .enableHiveSupport() // don't forget to enable hive support
              .getOrCreate()
    
            import spark.implicits._
            import spark.sql
            // create an arbitrary frame
            val frame = Seq(("one", 1), ("two", 2), ("three", 3)).toDF("word", "count")
            // see the frame created
            frame.show()
            /**
             * +-----+-----+
             * | word|count|
             * +-----+-----+
             * |  one|    1|
             * |  two|    2|
             * |three|    3|
             * +-----+-----+
             */
            // write the frame
            frame.write.mode("overwrite").saveAsTable("t4")
    
  • 2

    我也面临同样的问题,但已经解决了 . 只需按照Spark 2.0版本中的步骤操作即可

    Step1: 将Hive conf文件夹中的hive-site.xml文件复制到spark conf .
    enter image description here

    Step 2: 编辑spark-env.sh文件并配置你的mysql驱动程序 . (如果您使用Mysql作为hive Metastore . )
    enter image description here

    或者将MySQL驱动程序添加到Maven / SBT(如果使用那些)

    Step3: 创建spark会话时添加enableHiveSupport()

    val spark = SparkSession.builder.master(“local”) . appName(“testing”) . enableHiveSupport() . getOrCreate()

    Sample code:

    package sparkSQL
    
    /**
      * Created by venuk on 7/12/16.
      */
    
    import org.apache.spark.sql.SparkSession
    
    object hivetable {
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder.master("local[*]").appName("hivetable").enableHiveSupport().getOrCreate()
    
        spark.sql("create table hivetab (name string, age int, location string) row format delimited fields terminated by ',' stored as textfile")
        spark.sql("load data local inpath '/home/hadoop/Desktop/asl' into table hivetab").show()
        val x = spark.sql("select * from hivetab")
        x.write.saveAsTable("hivetab")
      }
    }
    

    Output:

    enter image description here

  • 12

    下面的代码对我有用 . 我们可以忽略 hive.metastore.uris 的配置为本地Metastore,spark将在本地的备用仓库目录中创建配置单元对象 .

    import org.apache.spark.sql.SparkSession;
    
    object spark_hive_support1 
    {
      def main (args: Array[String]) 
       {
        val spark = SparkSession
          .builder()
          .master("yarn")
          .appName("Test Hive Support")
          //.config("hive.metastore.uris", "jdbc:mysql://localhost/metastore")
          .enableHiveSupport
          .getOrCreate();
    
        import spark.implicits._
    
        val testdf = Seq(("Word1", 1), ("Word4", 4), ("Word8", 8)).toDF;
        testdf.show;
        testdf.write.mode("overwrite").saveAsTable("WordCount");
      }
    }
    

相关问题