java中字符串的字节数?

问题

在java中,如果我有一个Stringxhow,我可以计算该字符串中的字节数吗?


#1 热门回答(235 赞)

字符串是字符列表(即代码点)。表示string所用的字节数完全取决于你使用哪种编码将其转换为字节

也就是说,你可以将字符串转换为字节数组,然后查看其大小,如下所示:

// The input string for this test
final String string = "Hello World";

// Check length, in characters
System.out.println(string.length()); // prints "11"

// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"

final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"

final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"

final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"

final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"

所以你看,即使一个简单的"ASCII"字符串在其表示中也可以有不同的字节数,这取决于使用哪种编码。使用你感兴趣的任何字符集作为案例,作为参数getBytes()。并且不要陷入假设UTF-8将每个字符表示为单个字节的陷阱,因为这不是真的:

final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms

// Check length, in characters
System.out.println(interesting.length()); // prints "4"

// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"

final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"

final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"

final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")

final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")

(请注意,如果你不提供字符集参数,则使用平台的默认字符集。这在某些上下文中可能很有用,但通常你应该避免依赖于默认值,并且在编码/时始终使用显式字符集解码是必需的。)


#2 热门回答(44 赞)

如果你正在使用64位引用:

sizeof(string) = 
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code

换一种说法:

sizeof(string) = 36 + string.length() * 2

在具有压缩OOP(-XX:UseCompressedOops)的32位VM或64位VM上,引用为4个字节。所以总数将是:

sizeof(string) = 32 + string.length() * 2

这不会考虑对字符串对象的引用。


#3 热门回答(15 赞)

根据How to convert Strings to and from UTF8 byte arrays in Java

String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);