User Tools

Site Tools


regex:cheat_sheet

Regex - Cheat Sheet

Cheat Sheet
Character classes
. 	any character except newline
\w \d \s 	word, digit, whitespace
\W \D \S 	not word, digit, whitespace
[abc] 	any of a, b, or c
[^abc] 	not a, b, or c
[a-g] 	character between a & g
Anchors
^abc$ 	start / end of the string
\b 	word boundary
Escaped characters
\. \* \\ 	escaped special characters
\t \n \r 	tab, linefeed, carriage return
\u00A9 	unicode escaped ©
Groups & Lookaround
(abc) 	capture group
\1 	backreference to group #1
(?:abc) 	non-capturing group
(?=abc) 	positive lookahead
(?!abc) 	negative lookahead
Quantifiers & Alternation
a* a+ a? 	0 or more, 1 or more, 0 or 1
a{5} a{2,} 	exactly five, two or more
a{1,3} 	between one & three
a+? a{2,}? 	match as few as possible
ab|cd 	match ab or cd

Basic regex

SymbolDescriptions
.replaces any character
^matches start of string
$matches end of string
*matches up zero or more times the preceding character
\Represent special characters
()Groups regular expressions
?Matches up exactly one character

Characters

CharacterLegendExampleSample Match
\dMost engines: one digit from 0 to 9file_\d\dfile_25
\d.NET, Python 3: one Unicode digit in any scriptfile_\d\dfile_9੩
\wMost engines: “word character”: ASCII letter, digit or underscore\w-\w\w\wA-b_1
\w.Python 3: “word character”: Unicode letter, ideogram, digit, or underscore\w-\w\w\w字-ま_۳
\w.NET: “word character”: Unicode letter, ideogram, digit, or connector\w-\w\w\w字-ま‿۳
\sMost engines: “whitespace character”: space, tab, newline, carriage return, vertical taba\sb\sca b c
\s.NET, Python 3, JavaScript: “whitespace character”: any Unicode separatora\sb\sca b c
\DOne character that is not a digit as defined by your engine's \d\D\D\DABC
\WOne character that is not a word character as defined by your engine's \w\W\W\W\W\W*-+=)
\SOne character that is not a whitespace character as defined by your engine's \s\S\S\S\SYoyo

Quantifiers

QuantifierLegendExampleSample Match
+One or moreVersion \w-\w+Version A-b1_1
{3}Exactly three times\D{3}ABC
{2,4}Two to four times\d{2,4}156
{3,}Three or more times\w{3,}regex_tutorial
*Zero or more timesA*B*C*AAACC
?Once or noneplurals?plural

Interval regex

These expressions tell us about the number of occurrences of a character in a string.

ExpressionDescription
{n}Matches the preceding character appearing 'n' times exactly
{n,m}Matches the preceding character appearing 'n' times but not more than m
{n, }Matches the preceding character only when it appears 'n' times or more

Extended regex

These regular expressions contain combinations of more than one expression.

ExpressionDescription
\+Matches one or more occurrence of the previous character
\?Matches zero or one occurrence of the previous character

Brace expansion

The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces “{}”.

The starting and ending items in a sequence are separated by two periods “..”.

ExpressionDescription
{a,b,c,d}Matches the actual characters in the braces. Example: echo {a,b,c,d}
{a..z}Matches a thru z. Example: echo {a..z}
{0..9}Matches 0 thru 9. Example: echo {0..9}

More Characters

CharacterLegendExampleSample Match
.Any character except line breaka.cabc
.Any character except line break.*whatever, man.
\.A period (special character: needs to be escaped by a \)a\.ca.c
\Escapes a special character\.\*\+\? \$\^\/\\.*+? $^/\
\Escapes a special character\[\{\(\)\}\][{()}]

Logic

LogicLegendExampleSample Match
|Alternation / OR operand22|3333
( … )Capturing groupA(nt|pple)Apple (captures "pple")
\1Contents of Group 1r(\w)g\1xregex
\2Contents of Group 2(\d\d)\+(\d\d)=\2\+\112+65=65+12
(?: … )Non-capturing groupA(?:nt|pple)Apple

More White-Space

CharacterLegendExampleSample Match
\tTabT\t\w{2}T ab
\rCarriage return charactersee below
\nLine feed charactersee below
\r\nLine separator on WindowsAB\r\nCDAB
CD
\NPerl, PCRE (C, PHP, R…): one character that is not a line break\N+ABC
\hPerl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\HOne character that is not a horizontal whitespace
\v.NET, JavaScript, Python, Ruby: vertical tab
\vPerl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\VPerl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\RPerl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)

More Quantifiers

QuantifierLegendExampleSample Match
+The + (one or more) is “greedy”\d+12345
?Makes quantifiers “lazy”\d+?1 in 12345
*The * (zero or more) is “greedy”A*AAA
?Makes quantifiers “lazy”A*?empty in AAA
{2,4}Two to four times, “greedy”\w{2,4}abcd
?Makes quantifiers “lazy”\w{2,4}?ab in abcd

Character Classes

CharacterLegendExampleSample Match
[ … ]One of the characters in the brackets[AEIOU]One uppercase vowel
[ … ]One of the characters in the bracketsT[ao]pTap or Top
-Range indicator[a-z]One lowercase letter
[x-y]One of the characters in the range from x to y[A-Z]+GREAT
[ … ]One of the characters in the brackets[AB1-5w-z]One of either: A,B,1,2,3,4,5,w,x,y,z
[x-y]One of the characters in the range from x to y[ -~]+Characters in the printable section of the ASCII table.
[^x]One character that is not x[a-z]{3}A1!
[^x-y]One of the characters not in the range from x to y[^ -~]+Characters that are not in the printable section of the ASCII table.
[\d\D]One character that is a digit or a non-digit[\d\D]+Any characters, including new lines, which the regular dot doesn't match
[\x41]Matches the character at hexadecimal position 41 in the ASCII table, i.e. A[\x41-\x45]{3}ABE

Anchors and Boundaries

AnchorLegendExampleSample Match
^Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means “not”)^abc .*abc (line start)
$End of string or end of line depending on multiline mode. Many engine-dependent subtleties..*? the end$this is the end
\ABeginning of string (all major engines except JS)\Aabc[\d\D]*abc (string……start)
\zVery end of the string - Not available in Python and JSthe end\zthis is…\n…the end
\ZEnd of string or (except Python) before final line break - Not available in JSthe end\Zthis is…\n…the end\n
\GBeginning of String or End of Previous Match .NET, Java, PCRE (C, PHP, R…), Perl, Ruby
\bWord boundary - Most engines: position where one side only is an ASCII letter, digit or underscoreBob.*\bcat\bBob ate the cat
\bWord boundary - .NET, Java, Python 3, Ruby: position where one side only is a Unicode letter, digit or underscoreBob.*\b\кошка\bBob ate the кошка
\BNot a word boundaryc.*\Bcat\B.*copycats

POSIX Classes

CharacterLegendExampleSample Match
[:alpha:]PCRE (C, PHP, R…): ASCII letters A-Z and a-z[8[:alpha:]]+WellDone88
[:alpha:]Ruby 2: Unicode letter or ideogram[[:alpha:]\d]+кошка99
[:alnum:]PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z[[:alnum:]]{10}ABCDE12345
[:alnum:]Ruby 2: Unicode digit, letter or ideogram[[:alnum:]]{10}кошка90210
[:punct:]PCRE (C, PHP, R…): ASCII punctuation mark[[:punct:]]+?!.,:;
[:punct:]Ruby: Unicode punctuation mark[[:punct:]]+‽,:〽⁆

Inline Modifiers

NOTE: None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m).

ModifierLegendExampleSample Match
(?i)Case-insensitive mode (except JavaScript)(?i)MondaymonDAY
(?s)DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n). Also known as “single-line mode” because the dot treats the entire input as a single line.(?s)From A.*to ZFrom A to Z
(?m)Multiline mode (except Ruby and JS) ^ and $ match at the beginning and end of every line.(?m)1\r\n^2$\r\n^3$1
2
3
(?m)In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.(?m)From A.*to ZFrom A to Z
(?x)Free-Spacing Mode mode (except JavaScript). Also known as comment mode or whitespace mode.(?x)abc[ ]dabc d
Spaces must be in bracketsabc[ ]d
(?n).NET, PCRE 10.30+: named capture onlyTurns all (parentheses) into non-capture groups. To capture, use named groups.
(?d)Java: Unix linebreaks onlyThe dot and the ^ and $ anchors are only affected by \n
(?^)PCRE 10.32+: unset modifiersUnsets ismnx modifiers

Lookarounds

LookaroundLegendExampleSample Match
(?=…)Positive lookahead(?=\d{10})\d{5}01234 in 0123456789
(?<=…)Positive lookbehind(?<=\d)catcat in 1cat
(?!…)Negative lookahead(?!theatre)the\w+theme
(?<!…)Negative lookbehind\w{3}(?<!mon)sterMunster

Character Class Operations

Class OperationLegendExampleSample Match
[…-[…]].NET: character class subtraction. One character that is in those on the left, but not in the subtracted class.[a-z-[aeiou]]Any lowercase consonant
[…-[…]].NET: character class subtraction.[\p{IsArabic}-[\D]]An Arabic character that is not a non-digit, i.e., an Arabic digit
[…&&[…]]Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.[\S&&[\D]]An non-whitespace character that is a non-digit.
[…&&[…]]Java, Ruby 2+: character class intersection.[\S&&[\D]&&[^a-zA-Z]]An non-whitespace character that a non-digit and not a letter.
[…&&[^…]]Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.[a-z&&[^aeiou]]An English lowercase letter that is not a vowel.
[…&&[^…]]Java, Ruby 2+: character class subtraction[\p{InArabic}&&[^\p{L}\p{N}]]An Arabic character that is not a letter or a number

Other Syntax

SyntaxLegendExampleSample Match
\KKeep Out. Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.prefix\K\d+12
\Q…\EPerl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.\Q(C++ ?)\E(C++ ?)
regex/cheat_sheet.txt · Last modified: 2021/05/20 23:54 by peter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki