211

有没有办法告诉 sed 只输出捕获的组？例如，给定输入：

This is a sample 123 text and some 987 numbers

和模式：

/([\d]+)/

我可以通过反向引用格式化的方式仅获得123和987输出吗？

8 回答

5
你可以使用grep
```
grep -Eow "[0-9]+" file
```
回复于 2024-05-05T19:37:07+08:00

运行数字

此答案适用于任何数字组计数 . 例：

$ echo 'Num123that456are7899900contained0018166intext' |
> sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166

扩大答案 .

有没有办法告诉sed只输出捕获的组？

是 . 替换捕获组的所有文本：

$ echo 'Number 123 inside text' | sed 's/[^0-9]*\([0-9]\{1,\}\)[^0-9]*/\1/'
123

s/[^0-9]*                           # several non-digits
         \([0-9]\{1,\}\)            # followed by one or more digits
                        [^0-9]*     # and followed by more non-digits.
                               /\1/ # gets replaced only by the digits.

或者使用扩展语法（减少反引号并允许使用）：

$ echo 'Number 123 in text' | sed -E 's/[^0-9]*([0-9]+)[^0-9]*/\1/'
123

要避免在没有数字时打印原始文本，请使用：

$ echo 'Number xxx in text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1/p'

（-n）默认情况下不打印输入 .
（/ p）仅在更换完成后打印 .

并匹配几个数字（并打印它们）：

$ echo 'N 123 in 456 text' | sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1 /gp'
123 456

这适用于任何数字运行计数：

$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166

这与grep命令非常相似：

$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | grep -Po '\d+'
123
456
7899900
0018166

关于\ d

和模式：/（[\ d]）/

Sed无法识别'\d'（快捷方式）语法 . [0-9] 之上使用的ascii等价物并不完全等效 . 唯一的替代解决方案是使用字符类：'[[：digit：]]` .

所选答案使用这样的“字符类”来构建解决方案：

$ str='This is a sample 123 text and some 987 numbers'
$ echo "$str" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

该解决方案仅适用于（确切）两个数字运行 .

当然，由于答案是在shell中执行的，我们可以定义几个变量来缩短这样的答案：

$ str='This is a sample 123 text and some 987 numbers'
$ d=[[:digit:]]     D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D+($d+)$D*/\1 \2/p"

但是，正如已经解释的那样，使用 s/…/…/gp 命令更好：

$ str='This is 75577 a sam33ple 123 text and some 987 numbers'
$ d=[[:digit:]]     D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D*/\1 /gp"
75577 33 123 987

这将涵盖重复的数字运行和编写短（呃）命令 .

回复于 2024-05-05T19:37:07+08:00

尝试

sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

我在cygwin下得到了这个：

$ (echo "asdf"; \
   echo "1234"; \
   echo "asdf1234adsf1234asdf"; \
   echo "1m2m3m4m5m6m7m8m9m0m1m2m3m4m5m6m7m8m9") | \
  sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

1234
1234 1234
1 2 3 4 5 6 7 8 9
$

回复于 2024-05-05T19:37:07+08:00

233

Give up and use Perl

由于 sed 没有削减它，让我们抛出毛巾并使用Perl，至少它是LSB而 grep GNU扩展不是:-)

打印整个匹配部分，不需要匹配的组或外观：

cat <<EOS | perl -lane 'print m/\d+/g'
a1 b2
a34 b56
EOS

输出：

12
3456

每行单个匹配，通常是结构化数据字段：

cat <<EOS | perl -lape 's/.*?a(\d+).*/$1/g'
a1 b2
a34 b56
EOS

输出：

1
34

随着背后：

cat <<EOS | perl -lane 'print m/(?<=a)(\d+)/'
a1 b2
a34 b56
EOS

多个字段：

cat <<EOS | perl -lape 's/.*?a(\d+).*?b(\d+).*/$1 $2/g'
a1 c0 b2 c0
a34 c0 b56 c0
EOS

输出：

1 2
34 56

每行多个匹配项，通常是非结构化数据：

cat <<EOS | perl -lape 's/.*?a(\d+)|.*/$1 /g'
a1 b2
a34 b56 a78 b90
EOS

输出：

1 
34 78

随着背后：

cat EOS<< | perl -lane 'print m/(?<=a)(\d+)/g'
a1 b2
a34 b56 a78 b90
EOS

输出：

1
3478

回复于 2024-05-05T19:37:07+08:00

7
让它发挥作用的关键是告诉 sed 排除您不想输出的内容以及指定您想要的内容 .
```
string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
```
这说：
- 不默认打印每一行（ -n ）
- 排除零个或多个非数字
- 包括一个或多个数字
- 排除一个或多个非数字
- 包括一个或多个数字
- 排除零个或多个非数字
- 打印替换（ p ）
通常，在 sed 中，您使用括号捕获组并使用后引用输出捕获的内容：
```
echo "foobarbaz" | sed 's/^foo$.*$baz$/\1/'
```
将输出"bar" . 如果对扩展正则表达式使用 -r （ -E for OS X），则不需要转义括号：
```
echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'
```
最多可以有9个捕获组及其后引用 . 后引用按组显示的顺序编号，但它们可以按任何顺序使用，并且可以重复：
```
echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'
```
输出“a bar a” .

如果您有GNU grep （它也可以在BSD中工作，包括OS X）：
```
echo "$string" | grep -Po '\d+'
```
或变化，如：
```
echo "$string" | grep -Po '(?<=\D )(\d+)'
```
-P 选项启用Perl兼容正则表达式 . 见man 3 pcrepattern或man 3 pcresyntax .
回复于 2024-05-05T19:37:07+08:00
47

Sed最多有九种记忆模式，但您需要使用转义括号来记住正则表达式的部分内容 .

有关示例和更多详细信息，请参阅here

回复于 2024-05-05T19:37:07+08:00
6
这不是OP要求的（捕获组），但您可以使用以下方法提取数字：
```
S='This is a sample 123 text and some 987 numbers'
echo "$S" | sed 's/ /\n/g' | sed -r '/([0-9]+)/ !d'
```
给出以下内容：
```
123
987
```
回复于 2024-05-05T19:37:07+08:00
5
我相信问题中给出的模式只是举例，目标是匹配 any 模式 .

如果你有一个带有GNU扩展名的 sed 允许在模式空间中插入换行符，一个建议是：
```
> set string = "This is a sample 123 text and some 987 numbers"
>
> set pattern = "[0-9][0-9]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
123
987
> set pattern = "[a-z][a-z]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
his
is
a
sample
text
and
some
numbers
```
这些例子是使用CYGWIN的tcsh（是的，我 know 是错误的shell） . （编辑：对于bash，删除set，以及=周围的空格 . ）
回复于 2024-05-05T19:37:07+08:00

如何仅使用sed输出捕获的组？

8 回答

运行数字

扩大答案 .

关于\ d

相关问题