char[] value
array (which stores the characters of the string) and an int hash
field (which caches the hash code upon its first calculation, in accordance with the hashCode method contract).Before Java 7, there were additional fields:
offset
and count
, which allowed reusing portions of arrays without creating new ones (useful for string builders and substrings). However, these were removed to reduce memory consumption.Initially, all strings were stored in UTF-16 encoding, where each character occupied 2 bytes, fitting into a
char
. However, it was discovered that most strings in practice contain only ASCII characters, which require only 1 byte and fit within the LATIN-1 encoding. This meant that the upper byte of most char
values remained unused, and strings were effectively half empty. Meanwhile, a large portion of an application's memory (around a quarter) is taken up by strings.In Java 6, an experimental feature called Compressed Strings was introduced, allowing strings containing only LATIN-1 characters to be stored in a
byte[]
instead of char[]
. However, due to several issues, this feature was later reverted.String compression returned in Java 9 with the introduction of Compact Strings, which is enabled by default. A new
coder
field was added to the String
class, which determines the encoding (LATIN-1 or UTF-16). The type of the value field was also changed from char[]
to byte[]
. A static flag COMPACT_STRINGS
allows the feature to be turned off entirely.