将以单个空字节结尾的字节数组转换为UTF16编码的字符串-Java 学习之路

我得到一个字节数组，其中包含以UCS-2LE编码的字符串，通常，UCS-2LE字符串中的空字符串终结符将被编码为两个空字节（00 00），但有时只有一个如下：

import java.nio.charset.Charset;
import java.util.Arrays;

class Ucs {
    public static void main(String[] args) {
        byte[] b = new byte[] {87, 0, 105, 0, 110, 0, 0}; 
        String s = new String(b, Charset.forName("UTF-16LE"));
        System.out.println(Arrays.toString(s.getBytes()));
        System.out.println(s);
    }   
}

这个程序输出

[87,105,110，-17，-65，-67]Win

我不知道为什么字符串的内部字节数组增长以及未知的unicode来自何处 . 我该如何消除它？

2 回答

黑客忽略最后的奇数长度字节会有帮助吗？

int bytesToUse = b.length%2 == 0 ? b.length : b.length - 1;
String s = new String(b, 0, bytesToUse, Charset.forName("UTF-16LE"));

回复于 2024-04-24T15:07:53+08:00

使用InputStreamReader以及正确的Charset或自定义CharsetDecoder .

Reader reader = new InputStreamReader(
   new ByteArrayInputStream(new byte[]{87, 105, 110, -17, -65, -67,0,0}),
   Chaset.forName("UTF-16LE"));

Reader reader = new InputStreamReader(
   new ByteArrayInputStream(new byte[]{87, 105, 110, -17, -65, -67,0,0}),
   new CharsetDecoder(Chaset.forName("UTF-16LE"),1,2){
      @Override
      protected CoderResult     decodeLoop(ByteBuffer in, CharBuffer out){
        // detect trailing zero(s) to skip them
        // maybe employ the first version to do actual conversion
      }
   });

回复于 2024-04-24T15:07:53+08:00

将以单个空字节结尾的字节数组转换为UTF16编码的字符串

2 回答

相关问题