mirror of
https://github.com/ilyakooo0/urbit.git
synced 2024-12-18 20:31:40 +03:00
27dd121d14
git-subtree-dir: outside/re2 git-subtree-mainline: f94738bfd171ae447133e0964843addbb497894f git-subtree-split: 539b44fc4c5a49c3453b80e3af85d297f4cab4bf
396 lines
12 KiB
Plaintext
396 lines
12 KiB
Plaintext
RE2 regular expression syntax reference
|
||
-------------------------------------
|
||
|
||
Single characters:
|
||
. any character, possibly including newline (s=true)
|
||
[xyz] character class
|
||
[^xyz] negated character class
|
||
\d Perl character class
|
||
\D negated Perl character class
|
||
[:alpha:] ASCII character class
|
||
[:^alpha:] negated ASCII character class
|
||
\pN Unicode character class (one-letter name)
|
||
\p{Greek} Unicode character class
|
||
\PN negated Unicode character class (one-letter name)
|
||
\P{Greek} negated Unicode character class
|
||
|
||
Composites:
|
||
xy «x» followed by «y»
|
||
x|y «x» or «y» (prefer «x»)
|
||
|
||
Repetitions:
|
||
x* zero or more «x», prefer more
|
||
x+ one or more «x», prefer more
|
||
x? zero or one «x», prefer one
|
||
x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more
|
||
x{n,} «n» or more «x», prefer more
|
||
x{n} exactly «n» «x»
|
||
x*? zero or more «x», prefer fewer
|
||
x+? one or more «x», prefer fewer
|
||
x?? zero or one «x», prefer zero
|
||
x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer
|
||
x{n,}? «n» or more «x», prefer fewer
|
||
x{n}? exactly «n» «x»
|
||
x{} (== x*) NOT SUPPORTED vim
|
||
x{-} (== x*?) NOT SUPPORTED vim
|
||
x{-n} (== x{n}?) NOT SUPPORTED vim
|
||
x= (== x?) NOT SUPPORTED vim
|
||
|
||
Possessive repetitions:
|
||
x*+ zero or more «x», possessive NOT SUPPORTED
|
||
x++ one or more «x», possessive NOT SUPPORTED
|
||
x?+ zero or one «x», possessive NOT SUPPORTED
|
||
x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED
|
||
x{n,}+ «n» or more «x», possessive NOT SUPPORTED
|
||
x{n}+ exactly «n» «x», possessive NOT SUPPORTED
|
||
|
||
Grouping:
|
||
(re) numbered capturing group
|
||
(?P<name>re) named & numbered capturing group
|
||
(?<name>re) named & numbered capturing group NOT SUPPORTED
|
||
(?'name're) named & numbered capturing group NOT SUPPORTED
|
||
(?:re) non-capturing group
|
||
(?flags) set flags within current group; non-capturing
|
||
(?flags:re) set flags during re; non-capturing
|
||
(?#text) comment NOT SUPPORTED
|
||
(?|x|y|z) branch numbering reset NOT SUPPORTED
|
||
(?>re) possessive match of «re» NOT SUPPORTED
|
||
re@> possessive match of «re» NOT SUPPORTED vim
|
||
%(re) non-capturing group NOT SUPPORTED vim
|
||
|
||
Flags:
|
||
i case-insensitive (default false)
|
||
m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false)
|
||
s let «.» match «\n» (default false)
|
||
U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false)
|
||
Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»).
|
||
|
||
Empty strings:
|
||
^ at beginning of text or line («m»=true)
|
||
$ at end of text (like «\z» not «\Z») or line («m»=true)
|
||
\A at beginning of text
|
||
\b at word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
|
||
\B not a word boundary
|
||
\G at beginning of subtext being searched NOT SUPPORTED pcre
|
||
\G at end of last match NOT SUPPORTED perl
|
||
\Z at end of text, or before newline at end of text NOT SUPPORTED
|
||
\z at end of text
|
||
(?=re) before text matching «re» NOT SUPPORTED
|
||
(?!re) before text not matching «re» NOT SUPPORTED
|
||
(?<=re) after text matching «re» NOT SUPPORTED
|
||
(?<!re) after text not matching «re» NOT SUPPORTED
|
||
re& before text matching «re» NOT SUPPORTED vim
|
||
re@= before text matching «re» NOT SUPPORTED vim
|
||
re@! before text not matching «re» NOT SUPPORTED vim
|
||
re@<= after text matching «re» NOT SUPPORTED vim
|
||
re@<! after text not matching «re» NOT SUPPORTED vim
|
||
\zs sets start of match (= \K) NOT SUPPORTED vim
|
||
\ze sets end of match NOT SUPPORTED vim
|
||
\%^ beginning of file NOT SUPPORTED vim
|
||
\%$ end of file NOT SUPPORTED vim
|
||
\%V on screen NOT SUPPORTED vim
|
||
\%# cursor position NOT SUPPORTED vim
|
||
\%'m mark «m» position NOT SUPPORTED vim
|
||
\%23l in line 23 NOT SUPPORTED vim
|
||
\%23c in column 23 NOT SUPPORTED vim
|
||
\%23v in virtual column 23 NOT SUPPORTED vim
|
||
|
||
Escape sequences:
|
||
\a bell (== \007)
|
||
\f form feed (== \014)
|
||
\t horizontal tab (== \011)
|
||
\n newline (== \012)
|
||
\r carriage return (== \015)
|
||
\v vertical tab character (== \013)
|
||
\* literal «*», for any punctuation character «*»
|
||
\123 octal character code (up to three digits)
|
||
\x7F hex character code (exactly two digits)
|
||
\x{10FFFF} hex character code
|
||
\C match a single byte even in UTF-8 mode
|
||
\Q...\E literal text «...» even if «...» has punctuation
|
||
|
||
\1 backreference NOT SUPPORTED
|
||
\b backspace NOT SUPPORTED (use «\010»)
|
||
\cK control char ^K NOT SUPPORTED (use «\001» etc)
|
||
\e escape NOT SUPPORTED (use «\033»)
|
||
\g1 backreference NOT SUPPORTED
|
||
\g{1} backreference NOT SUPPORTED
|
||
\g{+1} backreference NOT SUPPORTED
|
||
\g{-1} backreference NOT SUPPORTED
|
||
\g{name} named backreference NOT SUPPORTED
|
||
\g<name> subroutine call NOT SUPPORTED
|
||
\g'name' subroutine call NOT SUPPORTED
|
||
\k<name> named backreference NOT SUPPORTED
|
||
\k'name' named backreference NOT SUPPORTED
|
||
\lX lowercase «X» NOT SUPPORTED
|
||
\ux uppercase «x» NOT SUPPORTED
|
||
\L...\E lowercase text «...» NOT SUPPORTED
|
||
\K reset beginning of «$0» NOT SUPPORTED
|
||
\N{name} named Unicode character NOT SUPPORTED
|
||
\R line break NOT SUPPORTED
|
||
\U...\E upper case text «...» NOT SUPPORTED
|
||
\X extended Unicode sequence NOT SUPPORTED
|
||
|
||
\%d123 decimal character 123 NOT SUPPORTED vim
|
||
\%xFF hex character FF NOT SUPPORTED vim
|
||
\%o123 octal character 123 NOT SUPPORTED vim
|
||
\%u1234 Unicode character 0x1234 NOT SUPPORTED vim
|
||
\%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim
|
||
|
||
Character class elements:
|
||
x single character
|
||
A-Z character range (inclusive)
|
||
\d Perl character class
|
||
[:foo:] ASCII character class «foo»
|
||
\p{Foo} Unicode character class «Foo»
|
||
\pF Unicode character class «F» (one-letter name)
|
||
|
||
Named character classes as character class elements:
|
||
[\d] digits (== \d)
|
||
[^\d] not digits (== \D)
|
||
[\D] not digits (== \D)
|
||
[^\D] not not digits (== \d)
|
||
[[:name:]] named ASCII class inside character class (== [:name:])
|
||
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
|
||
[\p{Name}] named Unicode property inside character class (== \p{Name})
|
||
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
|
||
|
||
Perl character classes:
|
||
\d digits (== [0-9])
|
||
\D not digits (== [^0-9])
|
||
\s whitespace (== [\t\n\f\r ])
|
||
\S not whitespace (== [^\t\n\f\r ])
|
||
\w word characters (== [0-9A-Za-z_])
|
||
\W not word characters (== [^0-9A-Za-z_])
|
||
|
||
\h horizontal space NOT SUPPORTED
|
||
\H not horizontal space NOT SUPPORTED
|
||
\v vertical space NOT SUPPORTED
|
||
\V not vertical space NOT SUPPORTED
|
||
|
||
ASCII character classes:
|
||
[:alnum:] alphanumeric (== [0-9A-Za-z])
|
||
[:alpha:] alphabetic (== [A-Za-z])
|
||
[:ascii:] ASCII (== [\x00-\x7F])
|
||
[:blank:] blank (== [\t ])
|
||
[:cntrl:] control (== [\x00-\x1F\x7F])
|
||
[:digit:] digits (== [0-9])
|
||
[:graph:] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
|
||
[:lower:] lower case (== [a-z])
|
||
[:print:] printable (== [ -~] == [ [:graph:]])
|
||
[:punct:] punctuation (== [!-/:-@[-`{-~])
|
||
[:space:] whitespace (== [\t\n\v\f\r ])
|
||
[:upper:] upper case (== [A-Z])
|
||
[:word:] word characters (== [0-9A-Za-z_])
|
||
[:xdigit:] hex digit (== [0-9A-Fa-f])
|
||
|
||
Unicode character class names--general category:
|
||
C other
|
||
Cc control
|
||
Cf format
|
||
Cn unassigned code points NOT SUPPORTED
|
||
Co private use
|
||
Cs surrogate
|
||
L letter
|
||
LC cased letter NOT SUPPORTED
|
||
L& cased letter NOT SUPPORTED
|
||
Ll lowercase letter
|
||
Lm modifier letter
|
||
Lo other letter
|
||
Lt titlecase letter
|
||
Lu uppercase letter
|
||
M mark
|
||
Mc spacing mark
|
||
Me enclosing mark
|
||
Mn non-spacing mark
|
||
N number
|
||
Nd decimal number
|
||
Nl letter number
|
||
No other number
|
||
P punctuation
|
||
Pc connector punctuation
|
||
Pd dash punctuation
|
||
Pe close punctuation
|
||
Pf final punctuation
|
||
Pi initial punctuation
|
||
Po other punctuation
|
||
Ps open punctuation
|
||
S symbol
|
||
Sc currency symbol
|
||
Sk modifier symbol
|
||
Sm math symbol
|
||
So other symbol
|
||
Z separator
|
||
Zl line separator
|
||
Zp paragraph separator
|
||
Zs space separator
|
||
|
||
Unicode character class names--scripts:
|
||
Arabic Arabic
|
||
Armenian Armenian
|
||
Balinese Balinese
|
||
Bamum Bamum
|
||
Batak Batak
|
||
Bengali Bengali
|
||
Bopomofo Bopomofo
|
||
Brahmi Brahmi
|
||
Braille Braille
|
||
Buginese Buginese
|
||
Buhid Buhid
|
||
Canadian_Aboriginal Canadian Aboriginal
|
||
Carian Carian
|
||
Chakma Chakma
|
||
Cham Cham
|
||
Cherokee Cherokee
|
||
Common characters not specific to one script
|
||
Coptic Coptic
|
||
Cuneiform Cuneiform
|
||
Cypriot Cypriot
|
||
Cyrillic Cyrillic
|
||
Deseret Deseret
|
||
Devanagari Devanagari
|
||
Egyptian_Hieroglyphs Egyptian Hieroglyphs
|
||
Ethiopic Ethiopic
|
||
Georgian Georgian
|
||
Glagolitic Glagolitic
|
||
Gothic Gothic
|
||
Greek Greek
|
||
Gujarati Gujarati
|
||
Gurmukhi Gurmukhi
|
||
Han Han
|
||
Hangul Hangul
|
||
Hanunoo Hanunoo
|
||
Hebrew Hebrew
|
||
Hiragana Hiragana
|
||
Imperial_Aramaic Imperial Aramaic
|
||
Inherited inherit script from previous character
|
||
Inscriptional_Pahlavi Inscriptional Pahlavi
|
||
Inscriptional_Parthian Inscriptional Parthian
|
||
Javanese Javanese
|
||
Kaithi Kaithi
|
||
Kannada Kannada
|
||
Katakana Katakana
|
||
Kayah_Li Kayah Li
|
||
Kharoshthi Kharoshthi
|
||
Khmer Khmer
|
||
Lao Lao
|
||
Latin Latin
|
||
Lepcha Lepcha
|
||
Limbu Limbu
|
||
Linear_B Linear B
|
||
Lycian Lycian
|
||
Lydian Lydian
|
||
Malayalam Malayalam
|
||
Mandaic Mandaic
|
||
Meetei_Mayek Meetei Mayek
|
||
Meroitic_Cursive Meroitic Cursive
|
||
Meroitic_Hieroglyphs Meroitic Hieroglyphs
|
||
Miao Miao
|
||
Mongolian Mongolian
|
||
Myanmar Myanmar
|
||
New_Tai_Lue New Tai Lue (aka Simplified Tai Lue)
|
||
Nko Nko
|
||
Ogham Ogham
|
||
Ol_Chiki Ol Chiki
|
||
Old_Italic Old Italic
|
||
Old_Persian Old Persian
|
||
Old_South_Arabian Old South Arabian
|
||
Old_Turkic Old Turkic
|
||
Oriya Oriya
|
||
Osmanya Osmanya
|
||
Phags_Pa 'Phags Pa
|
||
Phoenician Phoenician
|
||
Rejang Rejang
|
||
Runic Runic
|
||
Saurashtra Saurashtra
|
||
Sharada Sharada
|
||
Shavian Shavian
|
||
Sinhala Sinhala
|
||
Sora_Sompeng Sora Sompeng
|
||
Sundanese Sundanese
|
||
Syloti_Nagri Syloti Nagri
|
||
Syriac Syriac
|
||
Tagalog Tagalog
|
||
Tagbanwa Tagbanwa
|
||
Tai_Le Tai Le
|
||
Tai_Tham Tai Tham
|
||
Tai_Viet Tai Viet
|
||
Takri Takri
|
||
Tamil Tamil
|
||
Telugu Telugu
|
||
Thaana Thaana
|
||
Thai Thai
|
||
Tibetan Tibetan
|
||
Tifinagh Tifinagh
|
||
Ugaritic Ugaritic
|
||
Vai Vai
|
||
Yi Yi
|
||
|
||
Vim character classes:
|
||
\i identifier character NOT SUPPORTED vim
|
||
\I «\i» except digits NOT SUPPORTED vim
|
||
\k keyword character NOT SUPPORTED vim
|
||
\K «\k» except digits NOT SUPPORTED vim
|
||
\f file name character NOT SUPPORTED vim
|
||
\F «\f» except digits NOT SUPPORTED vim
|
||
\p printable character NOT SUPPORTED vim
|
||
\P «\p» except digits NOT SUPPORTED vim
|
||
\s whitespace character (== [ \t]) NOT SUPPORTED vim
|
||
\S non-white space character (== [^ \t]) NOT SUPPORTED vim
|
||
\d digits (== [0-9]) vim
|
||
\D not «\d» vim
|
||
\x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim
|
||
\X not «\x» NOT SUPPORTED vim
|
||
\o octal digits (== [0-7]) NOT SUPPORTED vim
|
||
\O not «\o» NOT SUPPORTED vim
|
||
\w word character vim
|
||
\W not «\w» vim
|
||
\h head of word character NOT SUPPORTED vim
|
||
\H not «\h» NOT SUPPORTED vim
|
||
\a alphabetic NOT SUPPORTED vim
|
||
\A not «\a» NOT SUPPORTED vim
|
||
\l lowercase NOT SUPPORTED vim
|
||
\L not lowercase NOT SUPPORTED vim
|
||
\u uppercase NOT SUPPORTED vim
|
||
\U not uppercase NOT SUPPORTED vim
|
||
\_x «\x» plus newline, for any «x» NOT SUPPORTED vim
|
||
|
||
Vim flags:
|
||
\c ignore case NOT SUPPORTED vim
|
||
\C match case NOT SUPPORTED vim
|
||
\m magic NOT SUPPORTED vim
|
||
\M nomagic NOT SUPPORTED vim
|
||
\v verymagic NOT SUPPORTED vim
|
||
\V verynomagic NOT SUPPORTED vim
|
||
\Z ignore differences in Unicode combining characters NOT SUPPORTED vim
|
||
|
||
Magic:
|
||
(?{code}) arbitrary Perl code NOT SUPPORTED perl
|
||
(??{code}) postponed arbitrary Perl code NOT SUPPORTED perl
|
||
(?n) recursive call to regexp capturing group «n» NOT SUPPORTED
|
||
(?+n) recursive call to relative group «+n» NOT SUPPORTED
|
||
(?-n) recursive call to relative group «-n» NOT SUPPORTED
|
||
(?C) PCRE callout NOT SUPPORTED pcre
|
||
(?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED
|
||
(?&name) recursive call to named group NOT SUPPORTED
|
||
(?P=name) named backreference NOT SUPPORTED
|
||
(?P>name) recursive call to named group NOT SUPPORTED
|
||
(?(cond)true|false) conditional branch NOT SUPPORTED
|
||
(?(cond)true) conditional branch NOT SUPPORTED
|
||
(*ACCEPT) make regexps more like Prolog NOT SUPPORTED
|
||
(*COMMIT) NOT SUPPORTED
|
||
(*F) NOT SUPPORTED
|
||
(*FAIL) NOT SUPPORTED
|
||
(*MARK) NOT SUPPORTED
|
||
(*PRUNE) NOT SUPPORTED
|
||
(*SKIP) NOT SUPPORTED
|
||
(*THEN) NOT SUPPORTED
|
||
(*ANY) set newline convention NOT SUPPORTED
|
||
(*ANYCRLF) NOT SUPPORTED
|
||
(*CR) NOT SUPPORTED
|
||
(*CRLF) NOT SUPPORTED
|
||
(*LF) NOT SUPPORTED
|
||
(*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre
|
||
(*BSR_UNICODE) NOT SUPPORTED pcre
|
||
|