首页 文章

Python:合并多个文本文件

提问于
浏览
0

我是Python新手,而不是编程人员 . 我有40个文本文件,我想要组合在一起(在'宽'csv中,而不是'高'csv . 也就是说,我不想附加文件)并生成一个新的csv .

使用Pandas(合并)我能够实现我想要的,但我认为有一种更简单的方法 . 这是七个文件:


将pandas导入为pd

a = pd.read_csv("c:/pyTest/B01001.txt")
b = pd.read_csv("c:/pyTest/B01002.txt")
c = pd.read_csv("c:/pyTest/B01003.txt")
d = pd.read_csv("c:/pyTest/B02001.txt")
e = pd.read_csv("c:/pyTest/B05001.txt")
f = pd.read_csv("c:/pyTest/B05002.txt")
g = pd.read_csv("c:/pyTest/B05012.txt")

merged = a.merge(b.merge(c.merge(d.merge(e.merge(f.merge(g, on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID'), on='GEOID')
merged.to_csv("c:/pytest/fook.csv", index=False)

如果重复的列名(例如'GEOID')也没有在输出文件中重复,那将会很棒 .

您的任何帮助非常感谢 .

1 回答

  • 2

    您可以将 merge 应用于DataFrames列表using reduce

    import pandas as pd
    import functools
    
    files = ["c:/pyTest/B01001.txt", "c:/pyTest/B01002.txt", "c:/pyTest/B01003.txt",
             "c:/pyTest/B02001.txt", "c:/pyTest/B05001.txt", "c:/pyTest/B05002.txt",
             "c:/pyTest/B05012.txt",]
    dfs = [pd.read_csv(filename).set_index('GEOID') for filename in files]
    mergefunc = functools.partial(pd.merge, left_index=True, right_index=True)
    merged = functools.reduce(mergefunc, dfs)
    
    merged.to_csv("c:/pytest/fook.csv", index=False)
    

    当Pandas基于索引(而不是列)合并两个DataFrame时,结果DataFrame使用合并索引 . 因此,您可以通过合并索引来避免重复GEOID列 .


    例如:

    In [99]: import numpy as np
    In [100]: import pandas as pd
    In [101]: import functools
    
    In [102]: dfs = [pd.DataFrame(np.arange(6).reshape(3,2), columns=['A','B{}'.format(i)]).set_index('A') for i in range(3)]
    
    In [103]: mergefunc = functools.partial(pd.merge, left_index=True, right_index=True)    
    In [104]: merged = functools.reduce(mergefunc, dfs)
    
    In [105]: merged
    Out[105]: 
       B0  B1  B2
    A            
    0   1   1   1
    2   3   3   3
    4   5   5   5
    

相关问题