Python Pandas - 找出两个数据帧之间的差异-Java 学习之路

我有两个数据帧df1和df2，其中df2是df1的子集 . 如何获得两个数据帧之间差异的新数据帧（df3）？

换句话说，一个数据框中df1中的所有行/列都不在df2中？

enter image description here

3 回答

使用 drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

Update :

上述方法仅适用于那些数据帧，它们本身没有重复，例如

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

它输出如下，这是错误的

输出错误：

pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]: 
   A  B
1  2  3

正确的输出

如何实现这一目标？

使用 isin 与 tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]: 
   A  B
1  2  3
2  3  4
3  3  4

回复于 2024-04-29T10:40:52+08:00

14
对于行，请尝试此操作，并将 cols 设置为您要比较的列列表：
```
m = df1.merge(df2, on=cols, how='outer', suffixes=['', '_'], indicator=True)
```
对于列，请尝试以下方法：
```
set(df1.columns).symmetric_difference(df2.columns)
```
回复于 2024-04-29T10:40:52+08:00

import pandas as pd
# given
df1 = pd.DataFrame({'Name':['John','Mike','Smith','Wale','Marry','Tom','Menda','Bolt','Yuswa',],
    'Age':[23,45,12,34,27,44,28,39,40]})
df2 = pd.DataFrame({'Name':['John','Smith','Wale','Tom','Menda','Yuswa',],
    'Age':[23,12,34,44,28,40]})

# find elements in df1 that are not in df2
df_1notin2 = df1[~(df1['Name'].isin(df2['Name']) & df1['Age'].isin(df2['Age']))].reset_index(drop=True)

# output:
print('df1\n', df1)
print('df2\n', df2)
print('df_1notin2\n', df_1notin2)

# df1
#     Age   Name
# 0   23   John
# 1   45   Mike
# 2   12  Smith
# 3   34   Wale
# 4   27  Marry
# 5   44    Tom
# 6   28  Menda
# 7   39   Bolt
# 8   40  Yuswa
# df2
#     Age   Name
# 0   23   John
# 1   12  Smith
# 2   34   Wale
# 3   44    Tom
# 4   28  Menda
# 5   40  Yuswa
# df_1notin2
#     Age   Name
# 0   45   Mike
# 1   27  Marry
# 2   39   Bolt

回复于 2024-04-29T10:40:52+08:00

Python Pandas - 找出两个数据帧之间的差异

3 回答

相关问题