我有一个JSON数据文件,我想以编程方式将一个模式应用于列 .
pets.json
{"id":"311","species":"canine","color":"golden","weight":"75","name":"Captain"}
{"id":"928","species":"feline","color":"gray","weight":"8","name":"Oscar"}
SparkSession session = SparkSession.builder().appName("SparkSQLTests").master("local[*]").getOrCreate();
DataFrameReader dataFrameReader = session.read();
// Create Data Frame
Dataset<Row> pets = dataFrameReader.schema(buildSchema()).json("input/pets.json");
// Schema
pets.printSchema();
pets.show(10);
// SELECT *
// FROM pets
// WHERE species='canine'
System.out.println("=== Display Canines ===");
pets.filter(col("species").equalTo("canine")).show();
session.stop();
当我运行程序时,我的列为空 . 我做错了什么?谢谢
root
|-- id: integer (nullable = true)
|-- species: string (nullable = true)
|-- color: string (nullable = true)
|-- weight: double (nullable = true)
|-- name: string (nullable = true)
+----+-------+-----+------+----+
| id|species|color|weight|name|
+----+-------+-----+------+----+
|null| null| null| null|null|
|null| null| null| null|null|
+----+-------+-----+------+----+
=== Display Canines ===
+---+-------+-----+------+----+
| id|species|color|weight|name|
+---+-------+-----+------+----+
+---+-------+-----+------+----+
1 回答
事实证明,我在我的json数据中引用了数值,这引起了关注 . 当我将数据更改为:
{“id”:311,“species”:“canine”,“color”:“golden”,“weight”:75,“name”:“Captain”} {“id”:928,“species”:“feline” ”, “色彩”: “灰色”, “重量”:8中, “名称”: “奥斯卡”}