How to iterate over a string?

In Java, strings are generally stored using UTF-16 encoding. To understand string iteration fully, you need to grasp how UTF-16 works - specifically, the concept of surrogate pairs and Unicode planes. In simpler terms, what we often think of as a single character (a code point) can be represented by one or two char values. So, to correctly answer how to iterate over a string, we need to clarify whether we’re iterating over char values or Unicode code points.

This also means that the length of a string and the upper boundary of iteration can be ambiguous. The length() method actually returns not the number of Unicode symbols, but the number char values. To get the actual logical length (the number of code points), you should use codePointCount(). Keep in mind that codePointCount() needs to internally iterate over the char values to detect surrogate pairs.

You can iterate over char values either by retrieving one character at a time using charAt(), or by converting the string to a char[] array using toCharArray(). The latter creates a copy of the internal character array, which is slower but allows you to mutate the copy. Handling surrogate pairs requires manual intervention, but the Character class provides helper methods for working with them.

Alternatively, you can use codePointAt() and codePointBefore() to work with code points directly. These methods take indices into the char array, but return code points as int values.

In Java 8, a more convenient method of iterating over strings was introduced: the chars() and codePoints() methods. These return an IntStream of characters and code points, respectively. Importantly, these streams use the original internal character array, avoiding the need for a copy.