Java Regular Expressions

Java provides java.util.regex packages for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and are very easy to learn.

A regular expression is a special sequence of characters that can be used to match or find other strings or sets of strings using specialized syntax in patterns. They can be used to search, edit or manipulate text and data.

java.util.regex The package mainly consists of the following three classes −

  • Pattern Class- Pattern objects are compiled representations of regular expressions. Pattern Classes do not provide public constructors. To create a pattern, you first call its public static compile()method, which returns the Pattern object. These methods accept a regular expression as the first argument.

  • Matcher Class - Matcher Object is the engine that interprets patterns and performs matching operations on input strings. Like the Pattern classes, Matcher not defined with public constructors . To Obtain matcher object we need to invoke the  matcher()method on the Pattern object.

  • PatternSyntaxException - PatternSyntaxException Object is an unchecked exception indicating a syntax error in a regular expression pattern.

 

1. Capture group

Capture groups are a way to treat multiple characters as a unit. They are created by enclosing the characters to be grouped within a set of parentheses. For example, the regular expression ( dog) creates a single group dog containing the letters d, and g

Capture groups are numbered by counting their opening brackets from left to right. In the expression ((A)(B(C))), for example, there are four such groups −

  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)

To find how many groups exist in an expression, call the method on the Matcher object . The method returns a type value showing the number of capturing groups present in the pattern groupCount()int Matcher

There is also a special group, group 0, which always represents the entire expression. This group is not included in the groupCount()reported totals.

example

The following example shows how to find a numeric string from a given alphanumeric string −

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   public static void main( String args[] ) {
      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
      }else {
         System.out.println("NO MATCH");
      }
   }
}
Java

Execute the above sample code and get the following results:

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
 

Regular Expression Syntax

Here is the table listing down all the regular expression meta character syntax available in Java −

Sub expression Matches
^ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any single character except newline. Using m option allows it to match the newline as well.
[...] Matches any single character in brackets.
[^...] Matches any single character not in brackets.
\A Beginning of the entire string.
\z End of the entire string.
\Z End of the entire string except allowable final line terminator.
re* Matches 0 or more occurrences of the preceding expression.
re+ Matches 1 or more of the previous thing.
re? Matches 0 or 1 occurrence of the preceding expression.
re{ n} Matches exactly n number of occurrences of the preceding expression.
re{ n,} Matches n or more occurrences of the preceding expression.
re{ n, m} Matches at least n and at most m occurrences of the preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers the matched text.
(?: re) Groups regular expressions without remembering the matched text.
(?> re) Matches the independent pattern without backtracking.
\w Matches the word characters.
\W Matches the nonword characters.
\s Matches the white space. Equivalent to [\t\n\r\f].
\S Matches the non white space.
\d Matches the digits. Equivalent to [0-9].
\D Matches the non digits.
\A Matches the beginning of the string.
\Z Matches the end of the string. If a newline exists, it matches just before newline.
\z Matches the end of the string.
\G Matches the point where the last match finished.
\n Back-reference to capture group number "n".
\b Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets.
\B Matches the non word boundaries.
\n, \t, etc. Matches newlines, carriage returns, tabs, etc.
\Q Escape (quote) all characters up to \E.
\E Ends quoting begun with \Q.

 

start() and end() methods

Following is an example of counting occurrences of the word  cat in a string

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static final String REGEX = "\\bcat\\b";
   private static final String INPUT = "cat cat cat cattie cat";

   public static void main( String args[] ) {
      Pattern p = Pattern.compile(REGEX);
      Matcher m = p.matcher(INPUT);   // get a matcher object
      int count = 0;

      while(m.find()) {
         count++;
         System.out.println("Match number "+count);
         System.out.println("start(): "+m.start());
         System.out.println("end(): "+m.end());
      }
   }
}

Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
 

We can see that this example uses word boundaries to ensure that the letters: c, a, are not just sub strings within longer words. It also provides some useful information about where in the input string match occurred.

start method returns the starting index of the subsequence captured by the given group during the previous match operation, end gives the index of the last character matched 

 

Matches and lookingAt methods

Both the matches() and lookingAt()methods attempt to match the input sequence against a pattern. However, the difference is that a match needs to match the entire input sequence, whereas a lookup does not.

Both methods always start from the beginning of the input string. Following is an example of the above method

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static final String REGEX = "foo";
   private static final String INPUT = "fooooooooooooooooo";
   private static Pattern pattern;
   private static Matcher matcher;

   public static void main( String args[] ) {
      pattern = Pattern.compile(REGEX);
      matcher = pattern.matcher(INPUT);

      System.out.println("Current REGEX is: "+REGEX);
      System.out.println("Current INPUT is: "+INPUT);

      System.out.println("lookingAt(): "+matcher.lookingAt());
      System.out.println("matches(): "+matcher.matches());
   }
}
Output
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false
 

The replaceFirst and replaceAll methods


replaceFirst() and replaceAll()methods replace text matching in the given regular expression. As the name suggests, replaceFirst()replaces the first occurrence, and replaceAll()replaces all occurrences.

Following is an example of the above function

 

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static String REGEX = "dog";
   private static String INPUT = "The dog says meow. " + "All dogs say meow.";
   private static String REPLACE = "cat";

   public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);

      // get a matcher object
      Matcher m = p.matcher(INPUT); 
      INPUT = m.replaceAll(REPLACE);
      System.out.println(INPUT);
   }
}
Output
The cat says meow. All cats say meow.
 

appendReplacement and appendTail methods

The Matcher class also provides appendReplacement and appendTail methods to replace text.

Following is an example of the above method
 

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches {

   private static String REGEX = "a*b";
   private static String INPUT = "aabfooaabfooabfoob";
   private static String REPLACE = "-";
   public static void main(String[] args) {

      Pattern p = Pattern.compile(REGEX);

      // get a matcher object
      Matcher m = p.matcher(INPUT);
      StringBuffer sb = new StringBuffer();
      while(m.find()) {
         m.appendReplacement(sb, REPLACE);
      }
      m.appendTail(sb);
      System.out.println(sb.toString());
   }
}

Output

-foo-foo-foo-