我正在尝试查询此JSON文件(出于调试目的,它只包含一行!):
{
"appVersion": null,
"sessionIndex": "3",
"psdkLang": null,
"lamdbaAwsRequestId": "bb04330c-e1e7-4bbd-97b8-86fdb2ee0b7f",
"bundleID": "xyz",
"receiveTimestamp": "2017-03-31T01:45:30.796Z",
"type": "logEvent",
"userIdfv": null,
"osVersion": null,
"uniqueIndex": "9c6c3927-aa66-4974-adac-fd10fc83a1e5",
"userIdfa": null,
"eventName": "Rewarded Ads Ad Is Ready",
"deviceType": null,
"eventId": "shardId-000000000005:49571690399037302251611429510623174446442870333536993362",
"store1": "google",
"deviceLang": null,
"geoCode": null,
"sessionId": "34B4CEC8-9AA0-40DD-94C4-C5420F563F68",
"params": "{\"AdProvider\":\"AdColony\",\"AdIsReady\":\"false\"}",
"gameVersion": null,
"internetConnectionState": null,
"deviceModel": null,
"deviceTimeZone": null,
"time": "2017-03-31T10:44:50.117+0900",
"userId": "24176983"
}
我在Amazon Athena创建了一个表:
CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
`appversion` string,
`psdklang` string,
`bundleid` string,
`receivetimestamp` string,
`type` string,
`osversion` string,
`store1` string,
`devicelang` string,
`geocode` string,
`sessionid` string,
`eventName` string,
`params` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://...'
TBLPROPERTIES ('has_encrypted_data'='false');
当我运行此查询时:select eventname from RAAIR;
一切正常 .
当我尝试使用嵌套的JSON(params元素)时:select params['AdIsReady'] from RAAIR;
我得到一条"Internal error"消息 .
我在这里错过了什么?
1 回答
您在评论中提到
params
包含用于转义的反斜杠 .这是因为
params
是一个字符串,而不是嵌套对象 . Athena无法直接从字符串创建MAP,因此您可以获得"Internal error"消息 .如果您无法更改数据以将params作为嵌套对象,则可以更改表定义,以便
params
是一个字符串:Athena(Presto)将允许您解析字符串中的JSON并查询值 .
通过根据您的偏好解析,转换和提取值,至少有两种不同的方法: