====== Regex - Cheat Sheet ======

<code>
Cheat Sheet
Character classes
. 	any character except newline
\w \d \s 	word, digit, whitespace
\W \D \S 	not word, digit, whitespace
[abc] 	any of a, b, or c
[^abc] 	not a, b, or c
[a-g] 	character between a & g
Anchors
^abc$ 	start / end of the string
\b 	word boundary
Escaped characters
\. \* \\ 	escaped special characters
\t \n \r 	tab, linefeed, carriage return
\u00A9 	unicode escaped ©
Groups & Lookaround
(abc) 	capture group
\1 	backreference to group #1
(?:abc) 	non-capturing group
(?=abc) 	positive lookahead
(?!abc) 	negative lookahead
Quantifiers & Alternation
a* a+ a? 	0 or more, 1 or more, 0 or 1
a{5} a{2,} 	exactly five, two or more
a{1,3} 	between one & three
a+? a{2,}? 	match as few as possible
ab|cd 	match ab or cd
</code>

----

===== Basic regex =====

^Symbol^Descriptions^
|.|replaces any character|
|<nowiki>^</nowiki>|matches start of string|
|$|matches end of string|
|*|matches up zero or more times the preceding character|
|\|Represent special characters|
|()|Groups regular expressions|
|?|Matches up exactly one character|

----

===== Characters =====

^Character^Legend^Example^Sample Match^
|\d|Most engines: one digit from 0 to 9|file_\d\d|file_25|
|\d|.NET, Python 3: one Unicode digit in any script|file_\d\d|file_9੩|
|\w|Most engines: "word character": ASCII letter, digit or underscore|\w-\w\w\w|A-b_1|
|\w|.Python 3: "word character": Unicode letter, ideogram, digit, or underscore|\w-\w\w\w|字-ま_۳|
|\w|.NET: "word character": Unicode letter, ideogram, digit, or connector|\w-\w\w\w|字-ま‿۳|
|\s|Most engines: "whitespace character": space, tab, newline, carriage return, vertical tab|a\sb\sc|a b c|
|\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c|
|\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC|
|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|<nowiki>*-+=)</nowiki>|
|\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo|

----

===== Quantifiers =====

^Quantifier^Legend^Example^Sample Match^
|+|One or more|Version \w-\w+|Version A-b1_1|
|{3}|Exactly three times|\D{3}|ABC|
|{2,4}|Two to four times|\d{2,4}|156|
|{3,}|Three or more times|\w{3,}|regex_tutorial|
|*|Zero or more times|A*B*C*|AAACC|
|?|Once or none|plurals?|plural|

----

===== Interval regex =====

These expressions tell us about the number of occurrences of a character in a string.

^Expression^Description^
|{n}|Matches the preceding character appearing 'n' times exactly|
|{n,m}|Matches the preceding character appearing 'n' times but not more than m|
|{n, }|Matches the preceding character only when it appears 'n' times or more|


----

===== Extended regex =====

These regular expressions contain combinations of more than one expression.

^Expression^Description^
|\+|Matches one or more occurrence of the previous character|
|\?|Matches zero or one occurrence of the previous character|

----

===== Brace expansion =====

The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces "{}".

The starting and ending items in a sequence are separated by two periods "..".

^Expression^Description^
|{a,b,c,d}|Matches the actual characters in the braces.  Example: echo {a,b,c,d}|
|{a..z}|Matches a thru z.  Example: echo {a..z}|
|{0..9}|Matches 0 thru 9.  Example: echo {0..9}|

----

===== More Characters =====

^Character^Legend^Example^Sample Match^
|.|Any character except line break|a.c|abc|
|.|Any character except line break|.*|whatever, man.|
|\.|A period (special character: needs to be escaped by a \)|a\.c|a.c|
|\|Escapes a special character|<nowiki>\.\*\+\?    \$\^\/\\</nowiki>|<nowiki>.*+?    $^/\</nowiki>|
|\|Escapes a special character|\[\{\(\)\}\]|[{()}]|

----

===== Logic =====

^Logic^Legend^Example^Sample Match^
|<nowiki>|</nowiki>|<nowiki>Alternation / OR operand</nowiki>|<nowiki>22|33</nowiki>|33|
|( … )|Capturing group|<nowiki>A(nt|pple)</nowiki>|<nowiki>Apple (captures "pple")</nowiki>|
|\1|Contents of Group 1|<nowiki>r(\w)g\1x</nowiki>|regex|
|\2|Contents of Group 2|<nowiki>(\d\d)\+(\d\d)=\2\+\1</nowiki>|12+65=65+12|
|(?: … )|Non-capturing group|<nowiki>A(?:nt|pple)</nowiki>|Apple|

----

===== More White-Space =====

^Character^Legend^Example^Sample Match^
|\t|Tab|T\t\w{2}|T     ab|
|\r|Carriage return character|see below| |
|\n|Line feed character|see below| |
|\r\n|Line separator on Windows|AB\r\nCD|AB|
|:::|:::|:::|CD|
|\N|Perl, PCRE (C, PHP, R…): one character that is not a line break|\N+|ABC|
|\h|Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator| | |
|\H|One character that is not a horizontal whitespace| | |
|\v|.NET, JavaScript, Python, Ruby: vertical tab| | |
|\v|Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator| | |
|\V|Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace| | |
|\R|Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)| | |

----

===== More Quantifiers =====

^Quantifier^Legend^Example^Sample Match^
|+|The + (one or more) is "greedy"|\d+|12345|
|?|Makes quantifiers "lazy"|\d+?|1 in 12345|
|*|The * (zero or more) is "greedy"|A*|AAA|
|?|Makes quantifiers "lazy"|A*?|empty in AAA|
|{2,4}|Two to four times, "greedy"|\w{2,4}|abcd|
|?|Makes quantifiers "lazy"|\w{2,4}?|ab in abcd|

----

===== Character Classes =====

^Character^Legend^Example^Sample Match^
|[ … ]|One of the characters in the brackets|[AEIOU]|One uppercase vowel|
|[ … ]|One of the characters in the brackets|<nowiki>T[ao]p</nowiki>|Tap or Top|
|-|Range indicator|[a-z]|One lowercase letter|
|[x-y]|One of the characters in the range from x to y|[A-Z]+|GREAT|
|[ … ]|One of the characters in the brackets|[AB1-5w-z]|One of either: A,B,1,2,3,4,5,w,x,y,z|
|[x-y]|One of the characters in the range from x to y|[ -~]+|Characters in the printable section of the ASCII table.|
|<nowiki>[^x]</nowiki>|One character that is not x|[^a-z]{3}|A1!|
|<nowiki>[^x-y]</nowiki>|One of the characters not in the range from x to y|<nowiki>[^ -~]+</nowiki>|Characters that are not in the printable section of the ASCII table.|
|[\d\D]|One character that is a digit or a non-digit|<nowiki>[\d\D]+</nowiki>|Any characters, including new lines, which the regular dot doesn't match|
|[\x41]|Matches the character at hexadecimal position 41 in the ASCII table, i.e. A|[\x41-\x45]{3}|ABE|


----

===== Anchors and Boundaries =====

^Anchor^Legend^Example^Sample Match^
|<nowiki>^</nowiki>|Start of string or start of line depending on multiline mode. (<nowiki>But when [^inside brackets]</nowiki>, it means "not")|<nowiki>^abc .*</nowiki>|abc (line start)|
|$|End of string or end of line depending on multiline mode. Many engine-dependent subtleties.|.*? the end$|this is the end|
|\A|Beginning of string (all major engines except JS)|\Aabc[\d\D]*|abc (string......start)|
|\z|Very end of the string - Not available in Python and JS|the end\z|this is...\n...the end|
|\Z|End of string or (except Python) before final line break - Not available in JS|the end\Z|this is...\n...the end\n|
|\G|Beginning of String or End of Previous Match .NET, Java, PCRE (C, PHP, R…), Perl, Ruby| | |
|\b|Word boundary - Most engines: position where one side only is an ASCII letter, digit or underscore|Bob.*\bcat\b|Bob ate the cat|
|\b|Word boundary - .NET, Java, Python 3, Ruby: position where one side only is a Unicode letter, digit or underscore|Bob.*\b\кошка\b|Bob ate the кошка|
|\B|Not a word boundary|c.*\Bcat\B.*|copycats|

----

===== POSIX Classes =====

^Character^Legend^Example^Sample Match^
|[:alpha:]|PCRE (C, PHP, R…): ASCII letters A-Z and a-z|[8[:alpha:]]+|WellDone88|
|[:alpha:]|Ruby 2: Unicode letter or ideogram|<nowiki>[[:alpha:]\d]+</nowiki>|кошка99|
|[:alnum:]|PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z|<nowiki>[[:alnum:]]{10}</nowiki>|ABCDE12345|
|[:alnum:]|Ruby 2: Unicode digit, letter or ideogram|<nowiki>[[:alnum:]]{10}</nowiki>|кошка90210|
|[:punct:]|PCRE (C, PHP, R…): ASCII punctuation mark|<nowiki>[[:punct:]]+</nowiki>|<nowiki>?!.,:;</nowiki>|
|[:punct:]|Ruby: Unicode punctuation mark|<nowiki>[[:punct:]]+</nowiki>|<nowiki>‽,:〽⁆</nowiki>|

----

===== Inline Modifiers =====

<WRAP info>
**NOTE:**  None of these are supported in JavaScript. In Ruby, beware of <nowiki>(?s) and (?m)</nowiki>.
</WRAP>


^Modifier^Legend^Example^Sample Match^
|(?i)|Case-insensitive mode (except JavaScript)|(?i)Monday|monDAY|
|(?s)|DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n).  Also known as "single-line mode" because the dot treats the entire input as a single line.|(?s)From A.*to Z|From A to Z|
|(?m)|Multiline mode (except Ruby and JS) <nowiki>^ and $</nowiki> match at the beginning and end of every line.|<nowiki>(?m)1\r\n^2$\r\n^3$</nowiki>|1|
|:::|:::|:::|2|
|:::|:::|:::|3|
|(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.|(?m)From A.*to Z|From A to Z|
|(?x)|Free-Spacing Mode mode (except JavaScript).  Also known as comment mode or whitespace mode.|(?x)abc<nowiki>[ ]</nowiki>d|abc d|
|:::|Spaces must be in brackets|abc<nowiki>[ ]</nowiki>d| |
|(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| |
|(?d)|Java: Unix linebreaks only|The dot and the <nowiki>^ and $</nowiki> anchors are only affected by \n| |
|<nowiki>(?^)</nowiki>|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| |


----


===== Lookarounds =====

^Lookaround^Legend^Example^Sample Match^
|(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789|
|<nowiki>(?<=…)</nowiki>|Positive lookbehind|<nowiki>(?<=\d)cat</nowiki>|cat in 1cat|
|(?!…)|Negative lookahead|(?!theatre)the\w+|theme|
|(?<!…)|Negative lookbehind|\w{3}(?<!mon)ster|Munster|

----

===== Character Class Operations =====

^Class Operation^Legend^Example^Sample Match^
|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction. One character that is in those on the left, but not in the subtracted class.|<nowiki>[a-z-[aeiou]]</nowiki>|Any lowercase consonant|
|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction.|<nowiki>[\p{IsArabic}-[\D]]</nowiki>|An Arabic character that is not a non-digit, i.e., an Arabic digit|
|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|<nowiki>[\S&&[\D]]</nowiki>|An non-whitespace character that is a non-digit.|
|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection.|<nowiki>[\S&&[\D]&&[^a-zA-Z]]</nowiki>|An non-whitespace character that a non-digit and not a letter.|
|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.|<nowiki>[a-z&&[^aeiou]]</nowiki>|An English lowercase letter that is not a vowel.|
|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction|<nowiki>[\p{InArabic}&&[^\p{L}\p{N}]]</nowiki>|An Arabic character that is not a letter or a number|

----

===== Other Syntax =====

^Syntax^Legend^Example^Sample Match^
|\K|Keep Out.  Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.|prefix\K\d+|12|
|\Q…\E|Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.|\Q(C++ ?)\E|(C++ ?)|