使用SparkSession进行Glue Dev endpoints 访问目录-Java 学习之路

我尝试设置AWS胶水开发 endpoints 来测试一个非常简单的ETL脚本，但我似乎无法访问我的目录数据 .

我没有使用zeppelin，只是使用了scala-repl .

spark.catalog.listTables.show - >为空 .

当我尝试按照我的emr步骤创建SparkSession时

SparkSession.builder()
              .config("hive.metastore.connect.retries", 5)
              .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")
              .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
              .enableHiveSupport().appName("ETL")
              .getOrCreate()

我得到这个结果而没有表格 .

18/07/24 09:08:00 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/07/24 09:08:00 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/07/24 09:08:01 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/07/24 09:08:01 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect

我想用这个脚本做什么实际上非常简单 . 我有一个视图（在火花emr集群中的zeppelin中创建），其中包含年，月，日，x，y，z等字符，我只想实现一天，例如year（current_timestamp），month ..作为镶木地板的一天，并将其附加到分区表 .

如果我有一些方法来插入和覆盖分区，那就更好了，我希望胶水能简化调度 . 我是否误解了胶水，是不是只支持使用标准的sparksession sql脚本 .

以后的数据应该由athena和更复杂的火花emr工作读取 . 这似乎是胶水的简单etl工作（它几乎只是SQL），但它确实需要目录访问和spark [SparkSession] .write.mode（“append”） . timber ... to work .

使用SparkSession进行Glue Dev endpoints 访问目录

相关问题