首页 文章

perl -f检查无法识别文件

提问于
浏览
12

我有一个perl脚本,通过一个包含几千个文件的文件夹 .

当我开始编写脚本时,我不知道perl File :: Find函数,所以为了列出结构中的所有文件,我使用了以下内容:

open (FILES, "$FIND $FOLDER -type f |");
while (my $line = <FILES>) {...}

然而,现在我想我会尝试从perl执行此操作,而不是启动外部程序 . (除了想要学习使用File :: Find之外,没有其他理由做这个改变 . )

试图学习File :: Find find函数的语义我在命令行上尝试了一些东西,并将输出与find的输出进行了比较 .

奇怪的是,程序找到的文件有1个,但perl函数会跳过 .

找工作:

machine:~# find /search/path -type f | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# find /search/path -type f | wc -l
    6439

Perl失败了:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | wc -l
    6438

更改为排除文件夹而不是包含文件有效:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" unless -d }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

只有文件之间的区别是大小:

machine:~# ls -l /search/path/folder/folder/UNIQ/
total 4213008
-rw-rw-r--    1 user users    4171336632 May 27  2012 movie_file_015.MOV
-rw-rw-r--    1 user users    141610616 May 27  2012 movie_file_145.MOV
-rw-rw-r--    1 user users       20992 May 27  2012 Thumbs.db

有问题的机器上的Perl很旧但不古老:

machine:~# perl -version

This is perl, v5.8.8 built for sparc-linux

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

这是一个已知的bug还是什么?

或者我是否达到'-f'的大小限制?该文件几乎是4GB,是选择中最大的 .

或者我的测试(如果-f)选择不当?

EDIT [尝试统计文件]:

大文件失败

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"));'

小文件有效

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"));'
$VAR1 = 65024;
$VAR2 = 19989500;
$VAR3 = 33204;
$VAR4 = 1;
$VAR5 = 1004;
$VAR6 = 100;
$VAR7 = 0;
$VAR8 = 141610616;
$VAR9 = 1349281585;
$VAR10 = 1338096718;
$VAR11 = 1352403842;
$VAR12 = 16384;
$VAR13 = 276736;

二进制'stat'适用于这两个文件

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_015.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_015.MOV"
  Size: 4171336632  Blocks: 8149216    IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989499    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:11:05.000000000 +0200
Modify: 2012-05-27 07:23:34.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_145.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_145.MOV"
  Size: 141610616   Blocks: 276736     IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989500    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:26:25.000000000 +0200
Modify: 2012-05-27 07:31:58.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100

也:

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $! . "\n";'
Bad file descriptor

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $! . "\n";'
Value too large for defined data type

EDIT2

# perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=sparc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Dstatic_ext=B ByteLoader GDBM_File POSIX re -Dusemymalloc -Uuselargefiles -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
useperlio=define d_sfio=undef uselargefiles=undef usesocks=undef

问题解决了”:

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $!{EOVERFLOW} . "\n";'
92
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $!{EOVERFLOW} . "\n";'
0

作品:

# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f or ( $!{EOVERFLOW} > 0 and not -d) }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV 
/search/path/folder/folder/UNIQ/movie_file_145.MOV 
/search/path/folder/folder/UNIQ/Thumbs.db

1 回答

  • 10

    基于GooglingGoogling,看起来你的perl解释器还没有用large file support编译,导致 stat (以及任何内部依赖它的文件测试,包括 -f )对于大于2GB的文件失败 .

    要检查是否是这种情况,请运行:

    perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
    

    如果您的perl具有大文件支持,则输出应显示类似 uselargefiles=define-D_FILE_OFFSET_BITS=64 的内容 . 如果它不太可能你perl不支持大文件 .

    可能有点令人费解的是,为什么只需要_684741文件就需要大文件支持 . 根本问题是,如果应用于大于2GB的文件,那么32位版本的stat(2)系统调用(而不是返回虚假大小)只会失败 EOVERFLOW

    “EOVERFLOW(stat())路径是指一个文件,其大小不能以off_t类型表示 . 当在没有-D_FILE_OFFSET_BITS = 64的32位平台上编译的应用程序在大小的文件上调用stat()时,可能会发生这种情况 . 超过(1 << 31)-1位 . “

    从技术上讲,接收到这个错误应该足以表明命名文件确实存在(虽然我猜它也可能是一个非常棒的目录),但perl并不聪明,没有意识到 - 它只是看到统计失败了,所以没有回报 .

    (编辑:正如ikegami在评论中正确注意到的, -f 返回 undef 而不是0或1如果stat(2)调用失败,并将 $! 设置为导致失败的错误代码 . 所以,如果你不介意假设大小> 2GB的所有目录条目都是文件,您可以执行类似 -f $_ or (not defined -f _ and $!{EOVERFLOW}) 的操作来检查它 . )

相关问题