如何在pyspark中将JSON字符串转换为JSON对象-Java 学习之路

我有一个列类型的数据框是字符串，但实际上它包含4个架构的json对象，其中很少有字段是常见的 . 我需要将其转换为jason对象 .

这是数据框架的架构：

query.printSchema（）

root
 |-- test: string (nullable = true)

DF的 Value 看起来像

query.show（10）

+--------------------+
|                test|
+--------------------+
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"Interaction":{"...|
|{"PurchaseActivit...|
|{"Interaction":{"...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
|{"PurchaseActivit...|
+--------------------+
only showing top 10 rows

我申请的解决方案::

写入文本文件

query.write.format（“text”）.mode（'overwrite'） . save（“s3：// bucketname / temp /”）

读为json

df = spark.read.json（“s3a：// bucketname / temp /”）

现在打印Schema，它是已经转换为json对象的每一行的json字符串

df.printSchema（）root
| - EventDate：string（nullable = true）
| - EventId：string（nullable = true）
| - EventNotificationType：long（nullable = true）
| - 交互：struct（nullable = true）
| | - ContextId：string（nullable = true）
| | - 创建：string（nullable = true）
| | - 描述：string（nullable = true）
| | - Id：string（nullable = true）
| | - ModelContextId：string（nullable = true）
| - PurchaseActivity：struct（nullable = true）
| | - BillingCity：string（nullable = true）
| | - BillingCountry：string（nullable = true）
| | - ShippingAndHandlingAmount：double（nullable = true）
| | - ShippingDiscountAmount：double（nullable = true）
| | - SubscriberId：long（nullable = true）
| | - SubscriptionOriginalEndDate：string（nullable = true）
| - SubscriptionChurn：struct（nullable = true）
| | - PaymentTypeCode：long（nullable = true）
| | - PaymentTypeName：string（nullable = true）
| | - PreviousPaidAmount：double（nullable = true）
| | - SubscriptionRemoved：string（nullable = true）
| | - SubscriptionStartDate：string（nullable = true）
| - TransactionDetail：struct（nullable = true）
| | - 数量：double（nullable = true）
| | - OrderShipToCountry：string（nullable = true）
| | - PayPalUserName：string（nullable = true）
| | - PaymentSubTypeCode：long（nullable = true）
| | - PaymentSubTypeName：string（nullable = true）

有没有最好的方法，我不需要将数据帧写为文本文件，并再次将其作为json文件读取，以获得预期的输出

如何在pyspark中将JSON字符串转换为JSON对象

相关问题