Spark Sql - 插入外部Hive表错误-Java 学习之路

我试图通过spark sql将数据插入外部配置单元表 . 我的蜂巢表是通过一个列进行的 . 创建外部配置单元表的查询是这样的

create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet

现在我尝试将来自镶木地板文件（存储在hdfs中）的数据存储到表中 . 这是我的代码

SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
                    config("hive.exec.dynamic.partition", "true").
                    config("hive.exec.dynamic.partition.mode", "nonstrict").
                    config("hive.execution.engine","tez").
                    config("hive.exec.max.dynamic.partitions","400").
                    config("hive.exec.max.dynamic.partitions.pernode","400").
                    config("hive.enforce.bucketing","true").
                    config("optimize.sort.dynamic.partitionining","true").
                    config("hive.vectorized.execution.enabled","true").
                    config("hive.enforce.sorting","true").
                    enableHiveSupport()
                    .master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";

session.sql(insertSql);

当我运行代码时，它抛出以下错误

不匹配的输入''hdfs：// url：port / user / clsadmin / somedata.parquet''期待（第1行，位置50）

== SQL ==插入UK_DISTRICT_MONTH_DATA从'hdfs：// url：port / user / clsadmin / somedata.parquet'中选择* ---------------------- ---------------------------- ^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)

使用hive执行引擎作为Tez和Spark有什么区别？

2 回答

0

你有没有尝试过

LOAD DATA LOCAL INPATH'/ path / to / data'

OVERWRITE INTO TABLE tablename;

回复于 2024-04-28T06:36:38+08:00
0
在Hive中创建外部表，要指定HDFS位置 .
```
create external table tab1 ( col1 type,col2 type,col3 type) 
clustered by (col1,col2) sorted by (col1) into 8 buckets 
stored as parquet 
LOCATION hdfs://url:port/user/clsadmin/tab1
```
hive不一定会填充数据，相同的应用程序或其他应用程序可以将数据摄取到位置，而hive将通过定义位置的架构顶部来访问数据 .

== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^

parquetInput是镶木地板HDFS文件路径而不是Hive表名 . 因此错误 .

有两种方法可以解决此问题：
- 为"parquetInput"定义外部表并提供表名
- 使用 LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1
回复于 2024-04-28T06:36:38+08:00

Spark Sql - 插入外部Hive表错误

2 回答

相关问题