首页 文章

比较两个文件内容

提问于
浏览
2

我有两个文件test1.txt和test2.txt

test1.txt包含

abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt

和test2.txt包含

12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt

我想比较这两个文件,并在bash中给我输出这样的东西

同时:

abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt

仅在test1.txt中

abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt

仅在test2.txt中

10111.2222.txt

4 回答

  • 0
    File1 :
    abc.cde.ccd.eed.12345.5678.txt
    abcd.cdde.ccdd.eaed.12346.5688.txt
    aabc.cade.cacd.eaed.13345.5078.txt
    abzc.cdae.ccda.eaed.29345.1678.txt
    abac.cdae.cacd.eead.18145.2678.txt
    aabc.cdve.cncd.ened.19945.2345.txt
    
    
    File2 :
    12345.5678.txt
    29345.1678.txt
    18145.2678.txt
    10111.2222.txt
    
    
    
    #!/bin/bash
    
    if [ -e Both.txt ]
    then
      rm Both.txt
    fi
    
    if [ -e File1.txt ]
    then
      rm File1.txt
    fi
    
    if [ -e File2.txt ]
    then
      rm File2.txt
    fi
    
    while read f2line
    do
      found=0
      while read f1line
      do
        Both=`echo "$f1line" | grep "$f2line"`
        if [ $? -eq 0 ]
        then
          found=1
          echo $Both >> Both.txt
        fi
      done < File1
    if [ $found -eq 0 ]
    then
      echo $f2line >> File2.txt
    fi
    done < File2
    
    sort Both.txt > s_Both.txt
    sort File1 > s_File1
    comm -3 s_File1 s_Both.txt > File1.txt
    rm s_File1
    rm s_Both.txt
    

    输出文件:Both.txt,File1.txt,File2.txt

  • 0

    同时:

    grep -f text2.txt text1.txt
    

    输出:

    abc.cde.ccd.eed.12345.5678.txt
    abzc.cdae.ccda.eaed.29345.1678.txt
    abac.cdae.cacd.eead.18145.2678.txt
    

    仅在test1.txt中:

    grep -v -f text2.txt text1.txt
    

    输出:

    abcd.cdde.ccdd.eaed.12346.5688.txt
    aabc.cade.cacd.eaed.13345.5078.txt
    aabc.cdve.cncd.ened.19945.2345.txt
    

    仅在test2.txt中:

    grep -v -f <( grep -Eo '[0-9]+.[0-9]+.txt' text1.txt) text2.txt
    

    输出:

    10111.2222.txt
    
  • 0

    这个公式可以使用GNU Coreutils的 comm 来解决:

    首先排序第二个文件:

    sort -o test2.txt test2.txt;
    

    然后使用命令来显示行:

    # unique to test1.txt
    cut -d '.' -f 1-4 --complement test1.txt | sort | comm -23 - test2.txt
    # unique to test2.txt
    cut -d '.' -f 1-4 --complement test1.txt | sort | comm -13 - test2.txt
    # that appear in both files
    cut -d '.' -f 1-4 --complement test1.txt | sort | comm -12 - test2.txt
    

    Explanation

    # 1. Extract all but first four fields from test1.txt
    cut -d '.' -f 1-4 --complement test1.txt
    # 2. Here '-' replaces standard input
    comm -3 - test2.txt
    
  • 3

    以下AWK脚本 script.awk 也可以完成这项工作:

    NR == FNR { lines[++i] = $0 }
    
    NR > FNR { patterns[++j] = $0 }
    
    END {
        for (p_index in patterns)
            for (l_index in lines)
                if (index(lines[l_index], patterns[p_index]) > 0) {
                    lines_match[l_index] = 1
                    patterns_match[p_index] = 1
                }
    
        print "Lines only in first file:"
        for (l_index in lines)
            if (!(l_index in lines_match)) 
                print lines[l_index]
    
        print "Lines only in second file:"
        for (p_index in patterns)
            if (! (p_index in patterns_match)) 
                print patterns[p_index]
    
        print "Lines in both files:"
        for (l_index in lines)
            if (l_index in lines_match)
                print lines[l_index]
    }
    

    它可以如下调用:

    awk -f script.awk test1.txt test2.txt
    

    请注意,脚本不会对两个文件中的数据结构做任何假设 . 它只是假设 test2.txt 中的行是 test1.txt 中行的潜在子串 .

相关问题