比较两个文件内容-Java 学习之路

我有两个文件test1.txt和test2.txt

test1.txt包含

abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt

和test2.txt包含

12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt

我想比较这两个文件，并在bash中给我输出这样的东西

同时：

abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt

仅在test1.txt中

abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt

仅在test2.txt中

10111.2222.txt

4 回答

File1 :
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt


File2 :
12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt



#!/bin/bash

if [ -e Both.txt ]
then
  rm Both.txt
fi

if [ -e File1.txt ]
then
  rm File1.txt
fi

if [ -e File2.txt ]
then
  rm File2.txt
fi

while read f2line
do
  found=0
  while read f1line
  do
    Both=`echo "$f1line" | grep "$f2line"`
    if [ $? -eq 0 ]
    then
      found=1
      echo $Both >> Both.txt
    fi
  done < File1
if [ $found -eq 0 ]
then
  echo $f2line >> File2.txt
fi
done < File2

sort Both.txt > s_Both.txt
sort File1 > s_File1
comm -3 s_File1 s_Both.txt > File1.txt
rm s_File1
rm s_Both.txt

输出文件：Both.txt，File1.txt，File2.txt

回复于 2024-05-05T23:36:34+08:00

同时：

grep -f text2.txt text1.txt

输出：

abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt

仅在test1.txt中：

grep -v -f text2.txt text1.txt

输出：

abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt

仅在test2.txt中：

grep -v -f <( grep -Eo '[0-9]+.[0-9]+.txt' text1.txt) text2.txt

输出：

10111.2222.txt

回复于 2024-05-05T23:36:34+08:00

这个公式可以使用GNU Coreutils的 comm 来解决：

首先排序第二个文件：

sort -o test2.txt test2.txt;

然后使用命令来显示行：

# unique to test1.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -23 - test2.txt
# unique to test2.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -13 - test2.txt
# that appear in both files
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -12 - test2.txt

Explanation ：

# 1. Extract all but first four fields from test1.txt
cut -d '.' -f 1-4 --complement test1.txt
# 2. Here '-' replaces standard input
comm -3 - test2.txt

回复于 2024-05-05T23:36:34+08:00

以下AWK脚本 script.awk 也可以完成这项工作：

NR == FNR { lines[++i] = $0 }

NR > FNR { patterns[++j] = $0 }

END {
    for (p_index in patterns)
        for (l_index in lines)
            if (index(lines[l_index], patterns[p_index]) > 0) {
                lines_match[l_index] = 1
                patterns_match[p_index] = 1
            }

    print "Lines only in first file:"
    for (l_index in lines)
        if (!(l_index in lines_match)) 
            print lines[l_index]

    print "Lines only in second file:"
    for (p_index in patterns)
        if (! (p_index in patterns_match)) 
            print patterns[p_index]

    print "Lines in both files:"
    for (l_index in lines)
        if (l_index in lines_match)
            print lines[l_index]
}

它可以如下调用：

awk -f script.awk test1.txt test2.txt

请注意，脚本不会对两个文件中的数据结构做任何假设 . 它只是假设 test2.txt 中的行是 test1.txt 中行的潜在子串 .

回复于 2024-05-05T23:36:34+08:00

比较两个文件内容

4 回答

相关问题