char
values. So, to correctly answer how to iterate over a string, we need to clarify whether we’re iterating over char values or Unicode code points.This also means that the length of a string and the upper boundary of iteration can be ambiguous. The
length()
method actually returns not the number of Unicode symbols, but the number char
values. To get the actual logical length (the number of code points), you should use codePointCount()
. Keep in mind that codePointCount()
needs to internally iterate over the char values to detect surrogate pairs.You can iterate over
char
values either by retrieving one character at a time using charAt()
, or by converting the string to a char[]
array using toCharArray()
. The latter creates a copy of the internal character array, which is slower but allows you to mutate the copy. Handling surrogate pairs requires manual intervention, but the Character
class provides helper methods for working with them.Alternatively, you can use
codePointAt()
and codePointBefore()
to work with code points directly. These methods take indices into the char
array, but return code points as int
values.In Java 8, a more convenient method of iterating over strings was introduced: the
chars()
and codePoints()
methods. These return an IntStream
of characters and code points, respectively. Importantly, these streams use the original internal character array, avoiding the need for a copy.