我正在尝试使用以下代码将覆盖外部配置单元表插入分区的内部表中。该代码“成功”运行,但是当我运行“从 videotracking_playevent 限制 10 中选择*”时,它永远不会返回任何结果。
外部表是从包含 Parquet 文件的递归文件夹目录中生成的,可以查询。我已经测试了此示例中的正则表达式,它们也可以正常工作。 Hive 日志未显示任何错误。我有分区的感觉。我不明白为什么有任何想法吗?
set hive.mapred.supports.subdirectories=true;
set hive.input.dir.recursive=true;
set hive.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
set hive.execution.engine=spark;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT overwrite TABLE videotracking_playevent PARTITION (source, createyear, createmonth, createday)
SELECT
id_gigya,
created,
uid,
category,
action,
video_id,
program,
device,
url,
video_cms,
duration,
position,
version,
slot_type,
slot_position,
ad_position,
ad_duration,
player_type,
is_embed,
ad_max_ads,
ad_max_duration,
brand,
casting,
ip,
platform,
subprofile_id,
channel,
episode_id,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'source=[a-z]*', 0),'source=','') AS source,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createyear=[0-9]*', 0),'createyear=','') AS createyear,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createmonth=[0-9]*', 0),'createmonth=','') AS createmonth,
regexp_replace(regexp_extract(INPUT__FILE__NAME, 'createday=[0-9]*', 0),'createday=','') AS createday
FROM
videotracking_playevent_ext;