首页 文章

Java NIO通过ByteBuffer扫描某些字节和带有节的字

提问于
浏览
1

好吧,所以我正在尝试做一些看起来应该相当简单的事情,但是对于这些新的NIO接口,事情让我感到困惑!这是我正在尝试做的事情,我需要扫描一个文件作为字节,直到遇到某些字节!当我遇到那些特定的字节时,需要 grab 那段数据并对其做一些事情,然后继续前进并再次执行此操作 . 我会想到,凭借ByteBuffer中的所有这些标记和位置和限制,我能够做到这一点,但我似乎无法使它工作!这是我到目前为止所拥有的......

test.text:

this is a line of text a
this is line 2b
line 3
line 4
line etc.etc.etc.

Test.java:

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTE = {0x0A, 0x0D};

    public Test() {

        String pathString = "test.txt";

        //the path to the file
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {            
            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int pos = 0;
                System.out.println("FILE LOADED: |" + new String(buffer.array(), ENCODING) + "|");
                do {
                    byte b = buffer.get();
                    if (b == NEWLINE_BYTE[0] || b == NEWLINE_BYTE[1]) {
                        System.out.println("POS: " + pos);
                        System.out.println("POSITION: " + buffer.position());
                        System.out.println("LENGTH: " + Integer.toString(buffer.position() - pos));
                        ByteBuffer lineBuffer = ByteBuffer.wrap(buffer.array(), pos + 1, buffer.position() - pos);
                        System.out.println("LINE: |" + new String(lineBuffer.array(), ENCODING) + "|");
                        pos = buffer.position();
                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) {
           ioe.printStackTrace();
        }
    }
    public static void main(String args[]) {
        Test t = new Test();
    }
}

所以第一部分工作,fc.read(缓冲区)函数只运行一次并将整个文件拉入ByteBuffer . 然后在第二个do循环中,我能够逐字节循环,并且当它命中\ n(或\ r \ n)时它会触及if语句,但后来我无法弄清楚如何得到它我刚刚查看的字节的部分到一个单独的字节数组中工作!我已经尝试过拼接和各种翻转,我已经尝试过如上面的代码所示,但似乎无法使它工作,两个缓冲区总是有完整的文件,所以我拼接或包装它!

我只需要逐字循环遍历文件,一次查看某个部分,然后查看我的最终目标,当我查看并找到正确的位置时,我想将一些数据插入到正确的位置!我需要在“LINE:”输出的lineBuffer只包含到目前为止我已循环的部分字节!帮助,谢谢!

3 回答

  • 1

    抛开I / O,一旦你在 ByteBuffer 中有内容,通过 asCharBuffer() 将它转换为 CharBuffer 将会简单得多 . 然后 CharBuffer 实现 CharSequence ,它为您提供了许多 String 和正则表达式方法 .

  • 0

    这是我最终得到的解决方案,使用ByteBuffer的批量相对get函数来获取每次的块 . 我想我正在使用mark()功能,尽管我使用了一个额外的变量(pos)来跟踪标记,因为我在ByteBuffer中找不到函数来返回标记本身的相对位置 . 此外,我有明确的功能来按顺序查找\ r,\ n或两者 . 请记住,此代码仅适用于UTF-8编码数据 . 我希望这有助于其他人 .

    public class Test {
        public static final Charset ENCODING = Charset.forName("UTF-8");
        public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};
    
        public Test() {
            //test text file sequence of any strings followed by newline
            String pathString = "test.txt";
            Path path = Paths.get(pathString);
    
            try (FileChannel fc = FileChannel.open(path, 
                    StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
    
                if (fc.size() > 0) {
                    int n;
                    ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                    do {                    
                        n = fc.read(buffer);
                    } while (n != -1 && buffer.hasRemaining());
                    buffer.flip();
                    int newlineByteCount = 0;
                    buffer.mark();
                    do {
                        //get one byte at a time
                        byte b = buffer.get();
    
                        if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
                            newlineByteCount++;
    
                            byte nextByte = buffer.get();
                            if (nextByte == NEWLINE_BYTES[1]) {
                                newlineByteCount++;
                            } else {
                                buffer.position(buffer.position() - 1);
                            }
    
                            int pos = buffer.position();
                            //reset the buffer back to the mark() position
                            buffer.reset();
                            //create an array just the right length and get the bytes we just measured out 
                            int length = pos - buffer.position() - newlineByteCount;
                            byte[] lineBytes = new byte[length];
                            buffer.get(lineBytes, 0, length);
    
                            String lineString = new String(lineBytes, ENCODING);
                            System.out.println("LINE: " + lineString);
    
                            buffer.position(buffer.position() + newlineByteCount);
    
                            buffer.mark();
                            newlineByteCount = 0;
                        } else if (newlineByteCount > 0) {
    
                        }
                    } while (buffer.hasRemaining());
                } 
            } catch (IOException ioe) { ioe.printStackTrace(); }
        }
        public static void main(String args[]) { new Test(); }
    }
    
  • 0

    我需要类似但比分割单个缓冲区更通用的东西 . 就我而言,我有多个缓冲区;实际上,我的代码是对Spring StringDecoder的修改,它可以将 Flux<DataBuffer>DataBuffer)转换为 Flux<String> .

    https://stackoverflow.com/a/48111196/839733

相关问题