首页 文章

使用PYSPARK从JSON数据创建数据框

提问于
浏览
0

我正在尝试使用pyspark模块从json数据创建数据帧,但无法做到,尝试使用sqlContext.read.json但没有得到正确的结果 .

样本json数据:

{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani@gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani@gmail.com"
}
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"tomhanks@gmail.com"
}

预期o / p:表格式 . 任何人都可以帮我这个 .

1 回答

  • 0

    你可以使用SparkSession:

    my_json = [{ 
         "userId":"rirani",
        "jobTitleName":"Developer", 
        "firstName":"Romin", 
        "lastName":"Irani", 
        "preferredFullName":"Romin Irani",
         "employeeCode":"E1",
         "region":"CA",
         "phoneNumber":"408-1234567",
         "emailAddress":"romin.k.irani@gmail.com" 
        }, 
        { "userId":"nirani", 
        "jobTitleName":"Developer", 
        "firstName":"Neil", 
        "lastName":"Irani",
        "preferredFullName":"Neil Irani",
        "employeeCode":"E2", "region":"CA",
        "phoneNumber":"408-1111111",
        "emailAddress":"neilrirani@gmail.com" 
        },
        { "userId":"thanks", 
        "jobTitleName":"Program Directory",
        "firstName":"Tom", 
        "lastName":"Hanks", 
        "preferredFullName":"Tom Hanks",         "employeeCode":"E3", "region":"CA", "phoneNumber":"408-2222222",
    "emailAddress":"tomhanks@gmail.com"
             }]
    
    json_df = spark.read.json(my_json)
    json_df.show()
    

相关问题