首页 文章

Spark Dataframe,其中JSON为String,可以作为嵌套json进行转换

提问于
浏览
1

我在Spark中处理JSON数据时遇到问题 .

DataFrame有一个列为String格式的JSON .

DF架构:

root
 |-- id: string (nullable = true)
 |-- jsonString: string (nullable = true)

示例jsonString: "{\"sample\":\"value\"}";

我想将此jsonString转换为嵌套的JSON对象 . 这使得能够读取和遍历JSON数据 .

我正在寻找的目标DF结构如下 .

root
 |-- id: string (nullable = true)
 |-- json: struct (nullable = true)
 |   |-- sample: string (nullable = true)

感谢任何帮助 .

2 回答

  • -1

    您可以使用 to_json 函数转换jsonString . 为此,您需要创建一个架构

    //dummy data 
    val data = Seq(
      ("a", "{\"sample\":\"value1\"}"),
      ("b", "{\"sample\":\"value2\"}"),
      ("c", "{\"sample\":\"value3\"}")
    ).toDF("id", "jsonString")
    
    //create schema for jsonString 
    
    val schema = StructType(StructField("sample", StringType, true):: Nil)
    
    //create new column with from_json using schema 
    data.withColumn("newCol", from_json($"jsonString", schema))
    

    输出架构:

    root
     |-- id: string (nullable = true)
     |-- jsonString: string (nullable = true)
     |-- newCol: struct (nullable = true)
     |    |-- sample: string (nullable = true)
    

    输出:

    +---+-------------------+--------+
    |id |jsonString         |newCol  |
    +---+-------------------+--------+
    |a  |{"sample":"value1"}|[value1]|
    |b  |{"sample":"value2"}|[value2]|
    |c  |{"sample":"value3"}|[value3]|
    +---+-------------------+--------+
    

    希望这可以帮助!

  • 2

    您可以使用Gson中提供的动态json解析器将json字符串转换为对象 . 请在这里查看java中的示例代码:

    import com.google.gson.Gson;
    import com.google.gson.JsonElement;
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    
    public class DynamicJsonParser {
        public static Map<String, String> parseJsonInputLine(String line) {
    
            // {"sample":"value","address":[{"name":"MyName","age":20},
            // {"name":20,"age":30}]}
            // Parse the input line as a jsonobject
            JsonObject jsonObject = parser.parse(line).getAsJsonObject();
            // get the jsonobject key set (sample and address for the first iteration)
            Iterator<String> jsonObjectItr = jsonObject.keySet().iterator();
            // Iterate over the keyset
            while (jsonObjectItr.hasNext()) { // first key sample then address
                String jsonObjectKey = jsonObjectItr.next();
                // here get the value for the key
                JsonElement jsonElementValue = jsonObject.get(jsonObjectKey);
                // Checking if the value is a json object
                if (jsonElementValue.isJsonObject()) { // address key contains value as json
                    // if its a json object recursively call the parseJsonInputLine
                    // call parseJsonInputLine again with value
                    // [{"name":"MyName","age":20}, {"name":20,"age":30}]
                    parseJsonInputLine(jsonElementValue.toString());
                } else {
                    // Do your processing
                }
            }
        }
    }
    

    如果它解决了您的问题,请告诉我 .

相关问题