我正在使用2个数据帧,让我们说'df1'和'df2',这是下一种:
DF1:
+--------+--------+
| Col1 | Col2 |
+--------+--------+
| 'A' | 1 |
+--------+--------+
| 'B' | 2 |
+--------+--------+
| 'C' | 3 |
+--------+--------+
DF2:
+--------+--------+
| Col1 | Col2 |
+--------+--------+
| 'A' | - |
+--------+--------+
| 'B' | - |
+--------+--------+
| 'B' | - |
+--------+--------+
我想要做的是更新'df2'的列'Col2',同时考虑'df1'的值 . 我的意思是,我想根据带有'Col1'值的参考,将'df2''Col2'的值设置为'df1'Col2'的值 .
结果数据框'df2'应为:
+--------+--------+
| Col1 | Col2 |
+--------+--------+
| 'A' | 1 |
+--------+--------+
| 'B' | 2 |
+--------+--------+
| 'B' | 2 |
+--------+--------+
我怎么能用pyspark数据帧呢?
1 回答
一个简单的左连接应该做,