Subpatterns

Top  Previous  Next

Subpatterns are delimited by parentheses (round brackets), which can be nested.

Marking part of a pattern as a subpattern does two things:

 

1. It localizes a set of alternatives. For example, the pattern

 

cat(aract|erpillar|)

 

matches one of the words "cat", "cataract", or "caterpillar". Without the parentheses, it would match "cataract", "erpillar" or the empty string.

 

2. It sets up the subpattern as a capturing subpattern (as defined above). Opening parentheses are counted from left to right (starting from 1) to obtain the numbers of the capturing subpatterns.

 

For example, if the string "the red king" is matched against the pattern

 

the ((red|white) (king|queen))

 

the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.

 

The fact that plain parentheses fulfil two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern

 

the ((?:red|white) (king|queen))

 

the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 99, and the maximum number of all subpatterns, both capturing and non-capturing, is 200.

 

As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns

 

(?i:saturday|sunday)

(?:(?i)saturday|sunday)

 

match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".

 

Note: This topic was taken from the PCRE library manual. The PCRE library is open source software, written by Philip Hazel <ph10@cam.ac.uk>, and copyright by the University of Cambridge, England.