Unix命令查找两个文件中常见的行-Java 学习之路

145

我确定我曾经发现一个unix命令可以打印两个或多个文件中的公共行，有人知道它的名字吗？它比 diff 简单得多 .

11 回答

8
你正在寻找的命令是comm . 例如：-
```
comm -12 1.sorted.txt 2.sorted.txt
```
这里：

-1 ：抑制第1列（1.sorted.txt唯一的行）

-2 ：抑制第2列（2.sorted.txt独有的行）
回复于 2024-05-12T21:02:32+08:00
3
要轻松将 comm 命令应用于 unsorted 文件，请使用Bash的process substitution：
```
$ bash --version
GNU bash, version 3.2.51(1)-release
Copyright (C) 2007 Free Software Foundation, Inc.
$ cat > abc
123
567
132
$ cat > def
132
777
321
```
所以文件abc和def有一条共同的行，一行是"132" . 在未排序的文件上使用 comm ：
```
$ comm abc def
123
    132
567
132
    777
    321
$ comm -12 abc def # No output! The common line is not found
$
```
最后一行没有产生输出，没有发现公共线 .

现在对已排序的文件使用 comm ，使用进程替换对文件进行排序：
```
$ comm <( sort abc ) <( sort def )
123
            132
    321
567
    777
$ comm -12 <( sort abc ) <( sort def )
132
```
现在我们得到了132线！
回复于 2024-05-12T21:02:32+08:00
52

也许你的意思 comm ？

逐行比较已排序的文件FILE1和FILE2 . 没有选项，产生三列输出 . 第一列包含FILE1特有的行，第二列包含FILE2特有的行，第三列包含两个文件共有的行 .

查找这些信息的秘诀是信息页面 . 对于GNU程序，它们比人工页面更详细 . 尝试 info coreutils ，它会列出所有小的有用工具 .

回复于 2024-05-12T21:02:32+08:00
22
为了补充Perl单线程，这是它的 awk 等价物：
```
awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2
```
这将从 file1 读取所有行到数组 arr[] ，然后检查 file2 中的每一行（如果它已存在于数组中）（即 file1 ） . 找到的行将按照它们在 file2 中出现的顺序打印 . 请注意，比较 in arr 使用 file2 的整行作为数组的索引，因此它只报告整行的完全匹配 .
回复于 2024-05-12T21:02:32+08:00
2
而
```
grep -v -f 1.txt 2.txt > 3.txt
```
给你两个文件的差异（2.txt中的内容而不是1.txt中的内容），你可以很容易地做到
```
grep -f 1.txt 2.txt > 3.txt
```
收集所有常见的行，这应该为您的问题提供简单的解决方案 . 如果您有已排序的文件，则应该使用 comm . 问候！
回复于 2024-05-12T21:02:32+08:00

-2

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2

回复于 2024-05-12T21:02:32+08:00

awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2

回复于 2024-05-12T21:02:32+08:00

3
在有限版本的Linux上（就像我正在研究的QNAP（nas））：
- comm不存在
- grep -f file1 file2 可能会导致@ChristopherSchultz所说的一些问题并且使用 grep -F -f file1 file2 非常慢（超过5分钟 - 没有完成 - 超过2-3秒，使用下面的方法超过20MB的文件）
所以这就是我所做的：
```
sort file1 > file1.sorted
sort file2 > file2.sorted

diff file1.sorted file2.sorted | grep "<" | sed 's/^< *//' > files.diff
diff file1.sorted files.diff | grep "<" | sed 's/^< *//' > files.same.sorted
```
如果“files.same.sorted”的顺序与原始顺序相同，则将此行添加到与file1相同的顺序：

awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file1 > files.same

或者，与file2相同的顺序：

awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file2 > files.same
回复于 2024-05-12T21:02:32+08:00
24
如果这两个文件尚未排序，您可以使用：
```
comm -12 <(sort a.txt) <(sort b.txt)
```
它会工作，避免在执行 comm -12 a.txt b.txt 时出现错误消息 comm: file 2 is not in sorted order .
回复于 2024-05-12T21:02:32+08:00
16
仅供参考，如果有人仍在查看如何为多个文件执行此操作，请参阅Finding matching lines across many files.的链接答案

结合这两个答案（ans1和ans2），我认为您可以在不对文件进行排序的情况下获得所需的结果：
```
#!/bin/bash
ans="matching_lines"

for file1 in *
do 
    for file2 in *
        do 
            if  [ "$file1" != "$ans" ] && [ "$file2" != "$ans" ] && [ "$file1" != "$file2" ] ; then
                echo "Comparing: $file1 $file2 ..." >> $ans
                perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' $file1 $file2 >> $ans
            fi
         done 
done
```
只需保存它，赋予它执行权限（ chmod +x compareFiles.sh ）并运行它 . 它将获取当前工作目录中存在的所有文件，并将在"matching_lines"文件中进行全部与全部比较结果 .

需要改进的地方：
- 跳过目录
- 避免两次比较所有文件（file1 vs file2和file2 vs file1） .
- 也许在匹配字符串旁边添加行号
回复于 2024-05-12T21:02:32+08:00

173

rm file3.txt

cat file1.out | while read line1
do
        cat file2.out | while read line2
        do
                if [[ $line1 == $line2 ]]; then
                        echo $line1 >>file3.out
                fi
        done
done

这应该做到这一点 .

回复于 2024-05-12T21:02:32+08:00

Unix命令查找两个文件中常见的行

11 回答

相关问题