首页 文章

为什么read()的调用会永远阻塞

提问于
浏览
7

我正面临一个最近才开始的令人费解的问题 .

我有一个程序,它使用一个线程写入文件,另一个线程从该文件中读取 . 两个线程都使用不同的文件描述符 . 写入程序线程使用O_WRONLY标志打开文件,读取器线程以O_RDONLY模式打开文件 . 就逻辑而言,读者线程不知道编写器线程正在做什么,并且两者都可以使用不同的文件 .

写入程序线程以固定间隔连续写入文件(数据来自设备流,速度高达20Mbit / s) .

读者线程也定期读取文件 .

这是读者循环:

while (tot < sz)
{
    LOG(VB_FILE, LOG_DEBUG, LOC +
        QString("read(%1) -- begin").arg(sz-tot));
    ret = read(fd2, (char *)data + tot, sz - tot);
    LOG(VB_FILE, LOG_DEBUG, LOC +
        QString("read(%1) -> %2 end").arg(sz).arg(ret));

    if ((sz - tot) != ret)
    {
        LOG(VB_FILE, LOG_DEBUG, LOC + QString("errno = %1").arg(errno));
    }

    if (ret < 0)
    {
        if (errno == EAGAIN)
        {
            LOG(VB_FILE, LOG_DEBUG, LOC +
                QString("read(%1) -> %2 EAGAIN").arg(sz).arg(ret));
            usleep(1000);
            continue;
        }

        LOG(VB_GENERAL, LOG_ERR,
            LOC + "File I/O problem in 'safe_read()'" + ENO);

        errcnt++;
        numfailures++;
        if (errcnt == 3)
            break;
    }
    else if (ret > 0)
    {
        tot += ret;
    }
    [...snipped...]
}

您可以看到我在调用read之前显示日志,并在返回之后立即显示 . 阅读将不时被调用,它永远不会回来......

2014-02-19 11:24:10.156417 D  TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 5076
2014-02-19 11:24:10.156466 D  TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 26934760 bytes
2014-02-19 11:24:10.156514 D  FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -- begin
2014-02-19 11:24:10.190769 D  FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -> 60968 end
2014-02-19 11:24:10.190781 I  RingBuf(/external/recordings/1001_20140219002351.mpg): safe_read(...@1698944, 65536) -> 65536, took 60 ms (8.73813Mbps)
2014-02-19 11:24:10.190786 D  RingBuf(/external/recordings/1001_20140219002351.mpg): total read so far: 26930304 bytes
2014-02-19 11:24:10.190795 I  FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -- begin
2014-02-19 11:24:10.195917 D  FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(65536) -> 4456 end
2014-02-19 11:24:10.195927 D  FileRingBuf(/external/recordings/1001_20140219002351.mpg): errno = 0
2014-02-19 11:24:10.206445 D  TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 1692
2014-02-19 11:24:10.206489 D  TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27000184 bytes
2014-02-19 11:24:10.256103 D  FileRingBuf(/external/recordings/1001_20140219002351.mpg): read(61080) -- begin
2014-02-19 11:24:10.256499 D  TFW(/external/recordings/1001_20140219002351.mpg:64): write(47376) cnt 1 total 40984
2014-02-19 11:24:10.262073 D  TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27047560 bytes
2014-02-19 11:24:10.273385 D  TFW(/external/recordings/1001_20140219002351.mpg:64): write(65424) cnt 1 total 940
2014-02-19 11:24:10.385495 D  TFW(/external/recordings/1001_20140219002351.mpg:64): total written so far: 27112984 bytes

你可以在这里看到编写器已经向磁盘写了26934760个字节 . 到目前为止读取的读数为26930304字节,因此我们从EOF读取4456字节 . 然后尝试64kB读取,读取几乎立即返回4456字节 . 到现在为止还挺好 . 立即尝试另一次读取61080字节(65536-4456) .

不久之后,编写器线程再次写入文件 . 64kB读取现在正在等待,并且不会再持续30秒 .

所以关于为什么读取会突然阻塞的任何特定想法?

编辑:从查看行为开始,一旦读取达到EOF并且提前返回,如果在新写入发生之前立即重试读取,则阻塞似乎总是发生 . 在这种情况下,读取将不会退出几秒钟(通常为20秒)

1 回答

  • 0

    好...

    我发现了这个问题以及如何解决它 .

    正如原始问题中所提到的,一旦读取达到EOF,就会发生阻塞,提前返回并立即重试读取(在文件发生新写入之前) .

    在这种情况下,read()不会退出几秒钟(通常超过20秒)

    因此,解决方法是记录我们到目前为止已读取的字节数,以便知道它在文件中的位置,并调用fstat来检查文件的大小 . 从那里,确保我们从不调用read()如果我们已经在文件的末尾或要求read()检索比文件中更多的字节 .

    struct stat sb;
    off_t current_pos = internalreadpos;
    
    while (tot < sz)
    {
        off_t toread = sz - tot;
        bool read_ok = true;
    
        // check that we have some data to read,
        // so we never attempt to read past the end of file
        // if fstat errored or isn't a regular file, default to previous behavior
        ret = fstat(fd2, &sb);
        if (ret == 0 && S_ISREG(sb.st_mode))
        {
            if (current_pos >= sb.st_size)
            {
                // We're at the end, don't attempt to read
                read_ok = false;
                LOG(VB_FILE, LOG_DEBUG, LOC + "not reading, reached EOF");
            }
            else
            {
                toread = min(sb.st_size - current_pos, toread);
                if (toread < (sz-tot))
                {
                    LOG(VB_FILE, LOG_DEBUG,
                        LOC + QString("About to reach EOF, reading %1 wanted %2")
                        .arg(toread).arg(sz-tot));
                }
            }
        }
    
        if (read_ok)
        {
            LOG(VB_FILE, LOG_DEBUG, LOC +
                QString("read(%1) -- begin").arg(toread));
            ret = read(fd2, (char *)data + tot, toread);
            LOG(VB_FILE, LOG_DEBUG, LOC +
                QString("read(%1) -> %2 end").arg(toread).arg(ret));
        }
        if (ret < 0)
        {
            if (errno == EAGAIN)
                continue;
    
            LOG(VB_GENERAL, LOG_ERR,
                LOC + "File I/O problem in 'safe_read()'" + ENO);
    
            errcnt++;
            numfailures++;
            if (errcnt == 3)
                break;
        }
        else if (ret > 0)
        {
            tot += ret;
            current_pos += ret;
        }
    
        if (oldfile)
            break;
    
        if (ret == 0) // EOF returns 0
        {
            if (tot > 0)
                break;
    
            zerocnt++;
    
            // 0.36 second timeout for livetvchain with usleep(60000),
            // or 2.4 seconds if it's a new file less than 30 minutes old.
            if (zerocnt >= (livetvchain ? 6 : 40))
            {
                break;
            }
        }
    

相关问题