Metacharacters

Top  Previous  Next

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

 

The quick brown fox

 

matches a portion of a subject string that is identical to itself. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of meta-characters, which do not stand for themselves but instead are interpreted in some special way.

 

There are two different sets of meta-characters: those that are recognised anywhere in the pattern except within square brackets, and those that are recognised in square brackets. Outside square brackets, the meta-characters are as follows:

 

\        general escape character with several uses        

^        assert start of subject (or line, in multiline mode)        

$        assert end of subject (or line, in multiline mode)        

.        match any character except newline (by default)        

[        start character class definition        

|        start of alternative branch        

(        start subpattern        

)        end subpattern        

?        extends the meaning of ( also 0 or 1 quantifier also quantifier minimizer        

*        0 or more quantifier        

+        1 or more quantifier        

{        start min/max quantifier        

 

Part of a pattern that is in square brackets is called a "character class". In a character class the only meta-characters are:

 

\        general escape character        

^        negate the class, but only if the first character        

-        indicates character range        

]        terminates the character class        

 

The following sections describe the use of each of the meta-characters.

 

Note: This topic was taken from the PCRE library manual. The PCRE library is open source software, written by Philip Hazel <ph10@cam.ac.uk>, and copyright by the University of Cambridge, England.