首页 文章

合并多个CSV文件并按字段删除重复项

提问于
浏览
1

我需要匹配来自多个CSV文件的数据 . 例如,如果我有三个CSV文件 .

输入1 csv

PANYNJ LGA WEST 1,available, LGA West GarageFlushing
PANYNJ LGA WEST 4,unavailable,LGA West Garage
iPark - Tesla,unavailable,530 E 80th St

输入2 csv

PANYNJ LGA WEST 4,unavailable,LGA West Garage
PANYNJ LGA WEST 5,available,LGA West Garage

输入3 csv

PANYNJ LGA WEST 5,available,LGA West Garage
imPark - Tesla,unavailable,611 E 83rd St

第一列是 name ,第二列是 status ,最后一列是 address . 我想将这三个文件合并为一个csv文件,如果它们具有相同的名称 . 我的愿望输出文件就像

输出csv

PANYNJ LGA WEST 1,available, LGA West GarageFlushing
PANYNJ LGA WEST 4,unavailable,LGA West Garage
iPark - Tesla,unavailable,530 E 80th St
PANYNJ LGA WEST 5,available,LGA West Garage
imPark - Tesla,unavailable,611 E 83rd St

我正试图用 pandasCSV 解决这个问题,但我不确定如何解决这个问题 .

任何帮助是极大的赞赏!

1 回答

  • 1

    使用 pandas ,您可以使用 pd.concat ,然后使用 pd.drop_duplicates

    import pandas as pd
    from io import StringIO
    
    str1 = StringIO("""PANYNJ LGA WEST 1,available, LGA West GarageFlushing
    PANYNJ LGA WEST 4,unavailable,LGA West Garage
    iPark - Tesla,unavailable,530 E 80th St""")
    
    str2 = StringIO("""PANYNJ LGA WEST 4,unavailable,LGA West Garage
    PANYNJ LGA WEST 5,available,LGA West Garage""")
    
    str3 = StringIO("""PANYNJ LGA WEST 5,available,LGA West Garage
    imPark - Tesla,unavailable,611 E 83rd St""")
    
    # replace str1, str2, str3 with 'file1.csv', 'file2.csv', 'file3.csv'
    df1 = pd.read_csv(str1, header=None)
    df2 = pd.read_csv(str2, header=None)
    df3 = pd.read_csv(str3, header=None)
    
    res = pd.concat([df1, df2, df3], ignore_index=True)\
            .drop_duplicates(0)
    
    print(res)
    
                       0            1                         2
    0  PANYNJ LGA WEST 1    available   LGA West GarageFlushing
    1  PANYNJ LGA WEST 4  unavailable           LGA West Garage
    2      iPark - Tesla  unavailable             530 E 80th St
    4  PANYNJ LGA WEST 5    available           LGA West Garage
    6     imPark - Tesla  unavailable             611 E 83rd St
    

相关问题