* Begin writing document for PCRE Started writing learnxinyminutes document for PCRE to cover general purpose regular expressions. Added introduction and a couple of details. * Change introductory example for regex The old example was incorrect. It's replaced with a simple one. * Add some more introductory text * Add first example * Added more example and a table for proper formatting * Add few more examples * Formatting * Improve example * Edit description of character classes * Add a way to test regex Add https://regex101.com/ web application to test the regex provided in example. * Add example of trap command trap is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using trap command i.e. cleanup upon receiving signal is added. * Revert "Add example of trap command" * Add an example of trap command `trap` is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using `trap` command i.e. cleanup upon receiving signal is added.
3.4 KiB
language | filename | contributors | |||
---|---|---|---|---|---|
PCRE | pcre.txt |
|
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say /^[a-z]+:/
and it will match http:
from http://github.com/
.
PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.
There are two different sets of metacharacters:
- Those that are recognized anywhere in the pattern except within square brackets
\ general escape character with several uses
^ assert start of string (or line, in multiline mode)
$ assert end of string (or line, in multiline mode)
. match any character except newline (by default)
[ start character class definition
| start of alternative branch
( start subpattern
) end subpattern
? extends the meaning of (
also 0 or 1 quantifier
also quantifier minimizer
* 0 or more quantifier
+ 1 or more quantifier
also "possessive quantifier"
{ start min/max quantifier
- Those that are recognized within square brackets. Outside square brackets. They are also called as character classes.
\ general escape character
^ negate the class, but only if the first character
- indicates character range
[ POSIX character class (only if followed by POSIX syntax)
] terminates the character class
PCRE provides some generic character types, also called as character classes.
\d any decimal digit
\D any character that is not a decimal digit
\h any horizontal white space character
\H any character that is not a horizontal white space character
\s any white space character
\S any character that is not a white space character
\v any vertical white space character
\V any character that is not a vertical white space character
\w any "word" character
\W any "non-word" character
Examples
We will test our examples on following string 66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"
. It is a standard Apache access log.
Regex | Result | Comment |
---|---|---|
GET | GET | GET matches the characters GET literally (case sensitive) |
\d+.\d+.\d+.\d+ | 66.249.64.13 | \d+ match a digit [0-9] one or more times defined by + quantifier, \. matches . literally |
(\d+.){3}\d+ | 66.249.64.13 | (\d+\.){3} is trying to match group (\d+\. ) exactly three times. |
[.+] | [18/Sep/2004:11:07:48 +1000] | .+ matches any character (except newline), . is any character |
^\S+ | 66.249.64.13 | ^ means start of the line, \S+ matches any number of non-space characters |
+[0-9]+ | +1000 | \+ matches the character + literally. [0-9] character class means single number. Same can be achieved using \+\d+ |
All these examples can be tried at https://regex101.com/
- Copy the example string in
TEST STRING
section - Copy regex code in
Regular Expression
section - The web application will show the matching result