首页 文章

Pyspark - saveAsTable - 如何将新数据插入现有表?

提问于
浏览
0

How to Insert new data to existing table???

我正在尝试使用pyspark将新数据插入现有表 .

这是我的计划

from pyspark import SparkContext
from pyspark.sql import SQLContext, DataFrameWriter

sc = SparkContext("local[*]", "SPARK-SQL")
sqlContext = SQLContext(sc)

df = sqlContext.read.json("people.json")
df.registerTempTable("people")

# Show old data
result = sqlContext.sql("SELECT * from people")
result.show()

# Create new data
new_data = [{"name": "Phan", "age": 22}]
df_new_data = sqlContext.createDataFrame(new_data)
# Save data to table 'people'
df_new_data.write.mode("append").saveAsTable("people")

# Show new data
result = sqlContext.sql("SELECT * from people")
result.show()

我跑完之后 . 表“人”中的数据无法更改 .

Old data
+---+--------+
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+
New data
+---+--------+                                                                  
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+

请帮我改变表中的数据!谢谢!

2 回答

  • 0

    我尝试使用表名 does not exist saveAsTable .

    df_new_data.write.mode("append").saveAsTable("people1")
    
    # Show new data
    result = sqlContext.sql("SELECT * from people1")
    result.show()
    

    有效 . 我可以在表格中看到新数据 "people1"

    +---+----+
    |age|name|
    +---+----+
    |22 |Phan|
    +---+----+
    
  • 0
    >>> df_new_data.write.mode("append").saveAsTable("people")
    

    上面的代码在hive中的 default database 中写了 people 表 .

    因此,如果要查看来自hive表的数据,则需要创建 HiveContext 然后查看 hive table 而不是临时表的结果 .

    >>> hc=HiveContext(sc)
    >>> hc.sql("select * from default.people").show(100,False)
    

    更新:

    将新数据附加到 temporary table

    >>> df1=df
    >>> df2=df.unionAll(df1)
    >>> df2.registerTempTable("people")
    >>> sqlContext.sql("select * from people").show(100,False)
    

相关问题