java.util.regex
packages for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and are very easy to learn.
A regular expression is a special sequence of characters that can be used to match or find other strings or sets of strings using specialized syntax in patterns. They can be used to search, edit or manipulate text and data.
java.util.regex
The package mainly consists of the following three classes −
Pattern
Class- Pattern
objects are compiled representations of regular expressions. Pattern
Classes do not provide public constructors. To create a pattern, you first call its public static compile()
method, which returns the Pattern
object. These methods accept a regular expression as the first argument.
Matcher
Class - Matcher
Object is the engine that interprets patterns and performs matching operations on input strings. Like the Pattern
classes, Matcher
not defined with public constructors . To Obtain matcher object we need to invoke the matcher()
method on the Pattern
object.
PatternSyntaxException
- PatternSyntaxException
Object is an unchecked exception indicating a syntax error in a regular expression pattern.
Capture groups are a way to treat multiple characters as a unit. They are created by enclosing the characters to be grouped within a set of parentheses. For example, the regular expression ( dog) creates a single group dog
containing the letters d
, o
and g
Capture groups are numbered by counting their opening brackets from left to right. In the expression ((A)(B(C)))
, for example, there are four such groups −
((A)(B(C)))
(A)
(B(C))
(C)
To find how many groups exist in an expression, call the method on the Matcher
object . The method returns a type value showing the number of capturing groups present in the pattern groupCount()int
Matcher
There is also a special group, group 0
, which always represents the entire expression. This group is not included in the groupCount()
reported totals.
example
The following example shows how to find a numeric string from a given alphanumeric string −
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
}else {
System.out.println("NO MATCH");
}
}
}
Execute the above sample code and get the following results:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Here is the table listing down all the regular expression meta character syntax available in Java −
Sub expression | Matches |
---|---|
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character except newline. Using m option allows it to match the newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets. |
\A | Beginning of the entire string. |
\z | End of the entire string. |
\Z | End of the entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of the preceding expression. |
re+ | Matches 1 or more of the previous thing. |
re? | Matches 0 or 1 occurrence of the preceding expression. |
re{ n} | Matches exactly n number of occurrences of the preceding expression. |
re{ n,} | Matches n or more occurrences of the preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of the preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. |
(?: re) | Groups regular expressions without remembering the matched text. |
(?> re) | Matches the independent pattern without backtracking. |
\w | Matches the word characters. |
\W | Matches the nonword characters. |
\s | Matches the white space. Equivalent to [\t\n\r\f]. |
\S | Matches the non white space. |
\d | Matches the digits. Equivalent to [0-9]. |
\D | Matches the non digits. |
\A | Matches the beginning of the string. |
\Z | Matches the end of the string. If a newline exists, it matches just before newline. |
\z | Matches the end of the string. |
\G | Matches the point where the last match finished. |
\n | Back-reference to capture group number "n". |
\b | Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets. |
\B | Matches the non word boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\Q | Escape (quote) all characters up to \E. |
\E | Ends quoting begun with \Q. |
Following is an example of counting occurrences of the word cat in a string
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "\\bcat\\b";
private static final String INPUT = "cat cat cat cattie cat";
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
We can see that this example uses word boundaries to ensure that the letters: c
, a
, t
are not just sub strings within longer words. It also provides some useful information about where in the input string match occurred.
start
method returns the starting index of the subsequence captured by the given group during the previous match operation, end gives
the index of the last character matched
Both the matches()
and lookingAt()
methods attempt to match the input sequence against a pattern. However, the difference is that a match needs to match the entire input sequence, whereas a lookup does not.
Both methods always start from the beginning of the input string. Following is an example of the above method
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "foo";
private static final String INPUT = "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main( String args[] ) {
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "+REGEX);
System.out.println("Current INPUT is: "+INPUT);
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
}
}
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "dog";
private static String INPUT = "The dog says meow. " + "All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
The cat says meow. All cats say meow.
appendReplacement and appendTail methods
The Matcher
class also provides appendReplacement
and appendTail
methods to replace text.
Following is an example of the above method
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, REPLACE);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
Output
-foo-foo-foo-