如何从Unix上的文本文件中提取预定范围的行？-Java 学习之路

435

我有一个~23000行的SQL转储包含几个数据库的数据 . 我需要提取该文件的某个部分（即单个数据库的数据）并将其放在一个新文件中 . 我知道我想要的数据的起始和结束行号 .

有没有人知道一个Unix命令（或一系列命令）从第16224和16482行之间的文件中提取所有行，然后将它们重定向到一个新文件？

21 回答

1
即使我们可以在命令行检查：
```
cat filename|sed 'n1,n2!d' > abc.txt
```
例如：
```
cat foo.pl|sed '100,200!d' > abc.txt
```
回复于 2024-05-04T22:44:21+08:00
0
您可以使用'vi'然后使用以下命令：
```
:16224,16482w!/tmp/some-file
```
或者：
```
cat file | head -n 16482 | tail -n 258
```
编辑： - 只是为了添加说明，您使用 head -n 16482 显示第一个16482行，然后使用 tail -n 258 从第一个输出中获取最后258行 .
回复于 2024-05-04T22:44:21+08:00
-3
我会用：
```
awk 'FNR >= 16224 && FNR <= 16482' my_file > extracted.txt
```
FNR包含从文件中读取的行的记录（行）编号 .
回复于 2024-05-04T22:44:21+08:00
2
我认为这可能是有用的解决方案 . 如果表名是“person”，您可以使用sed获取恢复表所需的所有行 .
```
sed -n -e '/DROP TABLE IF EXISTS.*`person `/,/UNLOCK TABLES/p' data.sql  > new_data.sql
```
基于this answer，它正在丢失要恢复的表的"DROP TABLE IF EXIST"，并且在使用它之前需要从新文件底部删除几行以防止删除下一个表 .

详细信息也可以找到here
回复于 2024-05-04T22:44:21+08:00
8
由于我们讨论的是从文本文件中提取文本行，因此我将给出一个特殊情况，您希望提取与特定模式匹配的所有行 .
```
myfile content:
=====================
line1 not needed
line2 also discarded
[Data]
first data line
second data line
=====================
sed -n '/Data/,$p' myfile
```
将打印[数据]行和剩余的 . 如果您希望将第1行中的文本添加到模式中，请键入：sed -n'1，/ Data / p'myfile . 此外，如果您知道两种模式（最好在文本中是唯一的），则可以使用匹配指定范围的起始行和结束行 .
```
sed -n '/BEGIN_MARK/,/END_MARK/p' myfile
```
回复于 2024-05-04T22:44:21+08:00
13
```
cat dump.txt | head -16224 | tail -258
```
应该做的伎俩 . 这种方法的缺点是你需要做算术来确定尾部的参数，并考虑你是否希望'between'包括结束行 .
回复于 2024-05-04T22:44:21+08:00

使用头/尾非常简单：

head -16482 in.sql | tail -258 > out.sql

使用sed：

sed -n '16482,16482p' in.sql > out.sql

使用awk：

awk 'NR>=10&&NR<=20' in.sql > out.sql

回复于 2024-05-04T22:44:21+08:00

190

使用红宝石：

ruby -ne 'puts "#{$.}: #{$_}" if $. >= 32613500 && $. <= 32614500' < GND.rdf > GND.extract.rdf

回复于 2024-05-04T22:44:21+08:00

657

这可能适合你（GNU sed）：

sed -ne '16224,16482w newfile' -e '16482q' file

或利用bash：

sed -n $'16224,16482w newfile\n16482q' file

回复于 2024-05-04T22:44:21+08:00

我写了一个小的bash脚本，您可以从命令行运行，只要您更新PATH以包含其目录（或者您可以将它放在已包含在PATH中的目录中） .

用法：$ pinch filename起始行结束行

#!/bin/bash
# Display line number ranges of a file to the terminal.
# Usage: $ pinch filename start-line end-line
# By Evan J. Coon

FILENAME=$1
START=$2
END=$3

ERROR="[PINCH ERROR]"

# Check that the number of arguments is 3
if [ $# -lt 3 ]; then
    echo "$ERROR Need three arguments: Filename Start-line End-line"
    exit 1
fi

# Check that the file exists.
if [ ! -f "$FILENAME" ]; then
    echo -e "$ERROR File does not exist. \n\t$FILENAME"
    exit 1
fi

# Check that start-line is not greater than end-line
if [ "$START" -gt "$END" ]; then
    echo -e "$ERROR Start line is greater than End line."
    exit 1
fi

# Check that start-line is positive.
if [ "$START" -lt 0 ]; then
    echo -e "$ERROR Start line is less than 0."
    exit 1
fi

# Check that end-line is positive.
if [ "$END" -lt 0 ]; then
    echo -e "$ERROR End line is less than 0."
    exit 1
fi

NUMOFLINES=$(wc -l < "$FILENAME")

# Check that end-line is not greater than the number of lines in the file.
if [ "$END" -gt "$NUMOFLINES" ]; then
    echo -e "$ERROR End line is greater than number of lines in file."
    exit 1
fi

# The distance from the end of the file to end-line
ENDDIFF=$(( NUMOFLINES - END ))

# For larger files, this will run more quickly. If the distance from the
# end of the file to the end-line is less than the distance from the
# start of the file to the start-line, then start pinching from the
# bottom as opposed to the top.
if [ "$START" -lt "$ENDDIFF" ]; then
    < "$FILENAME" head -n $END | tail -n +$START
else
    < "$FILENAME" tail -n +$START | head -n $(( END-START+1 ))
fi

# Success
exit 0

回复于 2024-05-04T22:44:21+08:00

20
```
sed -n '16224,16482 p' orig-data-file > new-file
```
16224,16482是起始行号和结束行号，包括在内 . 这是1索引 . -n 禁止将输入回显为输出，这显然是您不想要的;数字表示使以下命令操作的行数范围;命令 p 打印出相关的行 .
回复于 2024-05-04T22:44:21+08:00
1
快而脏：
```
head -16428 < file.in | tail -259 > file.out
```
可能不是最好的方法，但它应该工作 .

BTW：259 = 16482-16224 1 .
回复于 2024-05-04T22:44:21+08:00
0
```
sed -n '16224,16482p;16483q' filename > newfile
```
来自sed manual：

p - 打印出图案空间（到标准输出） . 此命令通常仅与-n命令行选项一起使用 . n - 如果未禁用自动打印，则打印图案空间，然后，无论如何，将图案空间替换为下一行输入 . 如果没有更多输入，那么sed退出而不再处理任何命令 . q - 退出sed而不再处理任何命令或输入 . 请注意，如果未使用-n选项禁用自动打印，则会打印当前模式空间 .

and

sed脚本中的地址可以采用以下任何一种形式：number指定行号仅匹配输入中的该行 . 可以通过指定以逗号（，）分隔的两个地址来指定地址范围 . 地址范围匹配从第一个地址匹配的行开始，并一直持续到第二个地址匹配（包含） .
回复于 2024-05-04T22:44:21+08:00

# print section of file based on line numbers
 sed -n '16224 ,16482p'               # method 1
 sed '16224,16482!d'                 # method 2

回复于 2024-05-04T22:44:21+08:00

2
awk 还有另一种方法：
```
awk 'NR==16224, NR==16482' file
```
如果文件很大，读取最后一行所需的 exit 可能会很好 . 这样，它不会不必要地读取文件直到最后：
```
awk 'NR==16224, NR==16482-1; NR==16482 {print; exit}' file
```
回复于 2024-05-04T22:44:21+08:00
2
接受答案中的-n工作 . 如果你有倾向，这是另一种方式 .
```
cat $filename | sed "${linenum}p;d";
```
这样做如下：
- 管道中的文件内容（或者您想要的文本中的Feed） .
- sed选择给定的行，打印出来
  删除行需要
- d，否则sed将假定最终将打印所有行 . 即，如果没有d，您将获得所选行打印的所有行，因为您有$ p部件要求打印它 . 我很确定-n基本上和d在做同样的事情 .
回复于 2024-05-04T22:44:21+08:00

perl -ne 'print if 16224..16482' file.txt > new_file.txt

回复于 2024-05-04T22:44:21+08:00

77
我编写了一个名为splitter的Haskell程序，它正是这样做的：有一个read through my release blog post .

您可以按如下方式使用该程序：
```
$ cat somefile | splitter 16224-16482
```
这就是全部它 . 您将需要Haskell来安装它 . 只是：
```
$ cabal install splitter
```
你完成了 . 我希望你发现这个程序很有用 .
回复于 2024-05-04T22:44:21+08:00
3

sed -n '16224,16482p' < dump.sql

回复于 2024-05-04T22:44:21+08:00
5
我准备发布头/尾技巧，但实际上我可能只是发布了emacs . ;-)
- esc-x goto-line ret 16224
- mark（ctrl-space）
- esc-x goto-line ret 16482
- esc-w
打开新的输出文件，ctl-y save

让我看看发生了什么 .
回复于 2024-05-04T22:44:21+08:00
1
我想从使用变量的脚本中做同样的事情，并通过在$变量周围加上引号来将变量名称与p分开来实现它：
```
sed -n "$first","$count"p imagelist.txt >"$imageblock"
```
我想将列表拆分成单独的文件夹，找到最初的问题并回答一个有用的步骤 . （split命令不是旧操作系统上的一个选项，我必须将代码移植到） .
回复于 2024-05-04T22:44:21+08:00

如何从Unix上的文本文件中提取预定范围的行？

21 回答

相关问题