首页 文章

解析嵌套JSON时,Amazon Athena会发出内部错误

提问于
浏览
1

我正在尝试查询此JSON文件(出于调试目的,它只包含一行!):

{
  "appVersion": null,
  "sessionIndex": "3",
  "psdkLang": null,
  "lamdbaAwsRequestId": "bb04330c-e1e7-4bbd-97b8-86fdb2ee0b7f",
  "bundleID": "xyz",
  "receiveTimestamp": "2017-03-31T01:45:30.796Z",
  "type": "logEvent",
  "userIdfv": null,
  "osVersion": null,
  "uniqueIndex": "9c6c3927-aa66-4974-adac-fd10fc83a1e5",
  "userIdfa": null,
  "eventName": "Rewarded Ads Ad Is Ready",
  "deviceType": null,
  "eventId": "shardId-000000000005:49571690399037302251611429510623174446442870333536993362",
  "store1": "google",
  "deviceLang": null,
  "geoCode": null,
  "sessionId": "34B4CEC8-9AA0-40DD-94C4-C5420F563F68",
  "params": "{\"AdProvider\":\"AdColony\",\"AdIsReady\":\"false\"}",
  "gameVersion": null,
  "internetConnectionState": null,
  "deviceModel": null,
  "deviceTimeZone": null,
  "time": "2017-03-31T10:44:50.117+0900",
  "userId": "24176983"
}

我在Amazon Athena创建了一个表:

CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
  `appversion` string,
  `psdklang` string,
  `bundleid` string,
  `receivetimestamp` string,
  `type` string,
  `osversion` string,
  `store1` string,
  `devicelang` string,
  `geocode` string,
  `sessionid` string,
  `eventName` string,
  `params` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'  
) LOCATION 's3://...'
TBLPROPERTIES ('has_encrypted_data'='false');

当我运行此查询时:
select eventname from RAAIR;
一切正常 .

当我尝试使用嵌套的JSON(params元素)时:
select params['AdIsReady'] from RAAIR;
我得到一条"Internal error"消息 .

我在这里错过了什么?

1 回答

  • 1

    您在评论中提到 params 包含用于转义的反斜杠 .
    这是因为 params 是一个字符串,而不是嵌套对象 . Athena无法直接从字符串创建MAP,因此您可以获得"Internal error"消息 .

    如果您无法更改数据以将params作为嵌套对象,则可以更改表定义,以便 params 是一个字符串:

    CREATE EXTERNAL TABLE IF NOT EXISTS RV_QA.RAAIR (
      ...
      `params` string
    )
    ...
    

    Athena(Presto)将允许您解析字符串中的JSON并查询值 .
    通过根据您的偏好解析,转换和提取值,至少有两种不同的方法:

    SELECT
      CAST(json_parse(params) as MAP(varchar, varchar))['AdIsReady'] as AdIsReady1,
      json_extract_scalar(json_parse(params), '$.AdIsReady') as AdIsReady2
    FROM RV_QA.RAAIR LIMIT 10;
    

相关问题