As
previously mentioned, the primary functionality for working with regular expressions in Java is provided by the
Matcher
class. Let's delve deeper into its features.
Firstly, a matcher typically operates not on the entire string but within a specified "region". Initially, this region matches the whole string, but it can be narrowed or changed during processing. The methods
regionStart
and
regionEnd
return the current boundaries, while
region
sets new ones.
A matcher's
transparentBounds
property allows the regular expression to look beyond the boundaries during a search, provided that the resulting matched substring is still within the region's bounds (
lookahead and
lookbehind). Turning off the
anchoringBounds
property allows you to stop treating the region's boundaries as string boundaries (
^
and
$
in the expression).
Regular expressions are used for two tasks: searching and replacing. Firstly, consider the searching.
The matches
method checks if the entire region satisfies the expression, while lookingAt
checks just its beginning. The find
method is similar to an iterator's next
– it sequentially moves through the string, finding subsequent matches. This iteration can be shifted to a specific string position by passing the position as a parameter.
Matcher
implements the MatchResult
interface. It provides information about the last successful search (by any of the aforementioned methods). If you need to preserve this information, toMatchResult()
will separate it into a distinct immutable object. If you want to handle the sequence of all matches as a stream, the results()
method can help.
The MatchResult
interface provides the methods group
, start
and end
, which give the content of the found substring and its position in the string. If you pass the number or name of a group to these methods, the result will be information about the groups within the substring. The total number of groups is stored in the groupCount
property.
There are a couple more of properties related the last search but relevant not only to successful results, thus, not included in the interface: hitEnd
and requireEnd
. hitEnd
indicates whether the last search reached the end of the region. requireEnd
will tell if the result (success/failure) of the last search could have changed if an extension had been added to the end of the region.
The reset
method resets all this current search state. By passing a parameter to it, you can also replace the string you are working with. The regular expression used can also be changed with the usePattern
method, but the search state will not be reset.
For replacing substrings that match the regular expression, there are replaceFirst
and replaceAll
methods. They can accept either a replacement string or a callback that calculates it on the fly. Both methods reset the state.
In replacements, you can use $
to refer to matching groups, and the \
character is used for escape sequences. If you need these characters to be interpreted literally, wrap the replacement string in a call to quoteReplacement
.
There is a more flexible way to replace. Matcher
allows you to manually perform a search (with the same methods), and then add the passed segment of the string with the replaced match to a StringBuilder
/StringBuffer
using the appendReplacement
method. The remaining unprocessed tail is added using appendTail
. Thus, the sequence of calls m.find(); m.appendReplacement(); m.appendTail();
is equivalent to calling m.replaceFirst()
, and while(m.find()) m.appendReplacement(); m.appendTail();
is the same as m.replaceAll()
.