首页 文章

Python Pandas:比较一个列中类似值的两个csv(dataframe)的行,并返回相似的行(列)的内容

提问于
浏览
1

我正在使用两个csv文件并导入为dataframe . 假设df1和df2如图所示,df1和df2具有不同的长度 . df1有50000行,df2有20000行 .

df1

df2

我想比较(遍历行)df2的'time'和df1,找到时间差并返回对应于相似行的所有列的值 .

例如,66(df1中的'time')最接近67(df2中的'time'),所以我想将内容返回到df1(15'vel'和25'yaw')到df2并保存作为一个新的csv

2 回答

  • 0

    它可以在 iterrows() 函数的帮助下完成 .

    这是代码:

    first table creation

    value=[(0,11,10,20),(1,22,11,21),(2,33,12,22),(3,44,13,23),(4,55,14,24), 
    (5,66,15,25),(6,77,16,26),(7,88,17,27),(8,99,18,28)]
    header=["index","time","vel","yaw"]
    df1 = pd.DataFrame.from_records(value, columns=header)
    

    df1

    second table creation

    value=[(0,67,"nan","nan"),(1,75,"nan" ,"nan" ),(2,87,"nan" ,"nan" ) 
    (3,99,"nan" ,"nan" )]
    header=["index","time","vel","yaw"]
    df2 = pd.DataFrame.from_records(value, columns=header)
    

    df2

    storing the result 在df2本身:

    for index, row in df2.iterrows():
        min=10000000
        for indexer, rows in df1.iterrows():
            if abs(row['time']-rows['time'])<min:
                min = abs(row['time']-rows['time'])
                #storing the position 
                pos = indexer
        df2.loc[index,'vel'] = df1['vel'][pos]
        df2.loc[index,'yaw'] = df1['yaw'][pos]
    

    result

  • 0

    创建笛卡尔积,然后进行过滤 -

    df1 = pd.DataFrame({'time': [11, 22, 33,44, 55, 66,77,88,99], 'vel':[10, 11,12,13,14,15,16,17,18],
                       'yaw' : [20, 21, 22,23,24,25,26,27,28]})
    df2 = pd.DataFrame({'time' : [67, 75, 87, 99]})
    
    df1['key'] = 1
    df2['key'] = 1
    df1.rename(index=str, columns ={'time' : 'time_x'}, inplace=True)
    
    df = df2.merge(df1, on='key', how ='left').reset_index()
    df['diff'] = df.apply(lambda x: abs(x['time']  - x['time_x']), axis=1)
    df.sort_values(by=['time', 'diff'], inplace=True)
    
    df=df.groupby(['time']).first().reset_index()[['time', 'vel', 'yaw']]
    

相关问题