====== Regex - Cheat Sheet ====== Cheat Sheet Character classes . any character except newline \w \d \s word, digit, whitespace \W \D \S not word, digit, whitespace [abc] any of a, b, or c [^abc] not a, b, or c [a-g] character between a & g Anchors ^abc$ start / end of the string \b word boundary Escaped characters \. \* \\ escaped special characters \t \n \r tab, linefeed, carriage return \u00A9 unicode escaped © Groups & Lookaround (abc) capture group \1 backreference to group #1 (?:abc) non-capturing group (?=abc) positive lookahead (?!abc) negative lookahead Quantifiers & Alternation a* a+ a? 0 or more, 1 or more, 0 or 1 a{5} a{2,} exactly five, two or more a{1,3} between one & three a+? a{2,}? match as few as possible ab|cd match ab or cd ---- ===== Basic regex ===== ^Symbol^Descriptions^ |.|replaces any character| |^|matches start of string| |$|matches end of string| |*|matches up zero or more times the preceding character| |\|Represent special characters| |()|Groups regular expressions| |?|Matches up exactly one character| ---- ===== Characters ===== ^Character^Legend^Example^Sample Match^ |\d|Most engines: one digit from 0 to 9|file_\d\d|file_25| |\d|.NET, Python 3: one Unicode digit in any script|file_\d\d|file_9੩| |\w|Most engines: "word character": ASCII letter, digit or underscore|\w-\w\w\w|A-b_1| |\w|.Python 3: "word character": Unicode letter, ideogram, digit, or underscore|\w-\w\w\w|字-ま_۳| |\w|.NET: "word character": Unicode letter, ideogram, digit, or connector|\w-\w\w\w|字-ま‿۳| |\s|Most engines: "whitespace character": space, tab, newline, carriage return, vertical tab|a\sb\sc|a b c| |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c| |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC| |\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|*-+=)| |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo| ---- ===== Quantifiers ===== ^Quantifier^Legend^Example^Sample Match^ |+|One or more|Version \w-\w+|Version A-b1_1| |{3}|Exactly three times|\D{3}|ABC| |{2,4}|Two to four times|\d{2,4}|156| |{3,}|Three or more times|\w{3,}|regex_tutorial| |*|Zero or more times|A*B*C*|AAACC| |?|Once or none|plurals?|plural| ---- ===== Interval regex ===== These expressions tell us about the number of occurrences of a character in a string. ^Expression^Description^ |{n}|Matches the preceding character appearing 'n' times exactly| |{n,m}|Matches the preceding character appearing 'n' times but not more than m| |{n, }|Matches the preceding character only when it appears 'n' times or more| ---- ===== Extended regex ===== These regular expressions contain combinations of more than one expression. ^Expression^Description^ |\+|Matches one or more occurrence of the previous character| |\?|Matches zero or one occurrence of the previous character| ---- ===== Brace expansion ===== The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces "{}". The starting and ending items in a sequence are separated by two periods "..". ^Expression^Description^ |{a,b,c,d}|Matches the actual characters in the braces. Example: echo {a,b,c,d}| |{a..z}|Matches a thru z. Example: echo {a..z}| |{0..9}|Matches 0 thru 9. Example: echo {0..9}| ---- ===== More Characters ===== ^Character^Legend^Example^Sample Match^ |.|Any character except line break|a.c|abc| |.|Any character except line break|.*|whatever, man.| |\.|A period (special character: needs to be escaped by a \)|a\.c|a.c| |\|Escapes a special character|\.\*\+\? \$\^\/\\|.*+? $^/\| |\|Escapes a special character|\[\{\(\)\}\]|[{()}]| ---- ===== Logic ===== ^Logic^Legend^Example^Sample Match^ |||Alternation / OR operand|22|33|33| |( … )|Capturing group|A(nt|pple)|Apple (captures "pple")| |\1|Contents of Group 1|r(\w)g\1x|regex| |\2|Contents of Group 2|(\d\d)\+(\d\d)=\2\+\1|12+65=65+12| |(?: … )|Non-capturing group|A(?:nt|pple)|Apple| ---- ===== More White-Space ===== ^Character^Legend^Example^Sample Match^ |\t|Tab|T\t\w{2}|T ab| |\r|Carriage return character|see below| | |\n|Line feed character|see below| | |\r\n|Line separator on Windows|AB\r\nCD|AB| |:::|:::|:::|CD| |\N|Perl, PCRE (C, PHP, R…): one character that is not a line break|\N+|ABC| |\h|Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator| | | |\H|One character that is not a horizontal whitespace| | | |\v|.NET, JavaScript, Python, Ruby: vertical tab| | | |\v|Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator| | | |\V|Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace| | | |\R|Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)| | | ---- ===== More Quantifiers ===== ^Quantifier^Legend^Example^Sample Match^ |+|The + (one or more) is "greedy"|\d+|12345| |?|Makes quantifiers "lazy"|\d+?|1 in 12345| |*|The * (zero or more) is "greedy"|A*|AAA| |?|Makes quantifiers "lazy"|A*?|empty in AAA| |{2,4}|Two to four times, "greedy"|\w{2,4}|abcd| |?|Makes quantifiers "lazy"|\w{2,4}?|ab in abcd| ---- ===== Character Classes ===== ^Character^Legend^Example^Sample Match^ |[ … ]|One of the characters in the brackets|[AEIOU]|One uppercase vowel| |[ … ]|One of the characters in the brackets|T[ao]p|Tap or Top| |-|Range indicator|[a-z]|One lowercase letter| |[x-y]|One of the characters in the range from x to y|[A-Z]+|GREAT| |[ … ]|One of the characters in the brackets|[AB1-5w-z]|One of either: A,B,1,2,3,4,5,w,x,y,z| |[x-y]|One of the characters in the range from x to y|[ -~]+|Characters in the printable section of the ASCII table.| |[^x]|One character that is not x|[^a-z]{3}|A1!| |[^x-y]|One of the characters not in the range from x to y|[^ -~]+|Characters that are not in the printable section of the ASCII table.| |[\d\D]|One character that is a digit or a non-digit|[\d\D]+|Any characters, including new lines, which the regular dot doesn't match| |[\x41]|Matches the character at hexadecimal position 41 in the ASCII table, i.e. A|[\x41-\x45]{3}|ABE| ---- ===== Anchors and Boundaries ===== ^Anchor^Legend^Example^Sample Match^ |^|Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means "not")|^abc .*|abc (line start)| |$|End of string or end of line depending on multiline mode. Many engine-dependent subtleties.|.*? the end$|this is the end| |\A|Beginning of string (all major engines except JS)|\Aabc[\d\D]*|abc (string......start)| |\z|Very end of the string - Not available in Python and JS|the end\z|this is...\n...the end| |\Z|End of string or (except Python) before final line break - Not available in JS|the end\Z|this is...\n...the end\n| |\G|Beginning of String or End of Previous Match .NET, Java, PCRE (C, PHP, R…), Perl, Ruby| | | |\b|Word boundary - Most engines: position where one side only is an ASCII letter, digit or underscore|Bob.*\bcat\b|Bob ate the cat| |\b|Word boundary - .NET, Java, Python 3, Ruby: position where one side only is a Unicode letter, digit or underscore|Bob.*\b\кошка\b|Bob ate the кошка| |\B|Not a word boundary|c.*\Bcat\B.*|copycats| ---- ===== POSIX Classes ===== ^Character^Legend^Example^Sample Match^ |[:alpha:]|PCRE (C, PHP, R…): ASCII letters A-Z and a-z|[8[:alpha:]]+|WellDone88| |[:alpha:]|Ruby 2: Unicode letter or ideogram|[[:alpha:]\d]+|кошка99| |[:alnum:]|PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z|[[:alnum:]]{10}|ABCDE12345| |[:alnum:]|Ruby 2: Unicode digit, letter or ideogram|[[:alnum:]]{10}|кошка90210| |[:punct:]|PCRE (C, PHP, R…): ASCII punctuation mark|[[:punct:]]+|?!.,:;| |[:punct:]|Ruby: Unicode punctuation mark|[[:punct:]]+|‽,:〽⁆| ---- ===== Inline Modifiers ===== **NOTE:** None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m). ^Modifier^Legend^Example^Sample Match^ |(?i)|Case-insensitive mode (except JavaScript)|(?i)Monday|monDAY| |(?s)|DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n). Also known as "single-line mode" because the dot treats the entire input as a single line.|(?s)From A.*to Z|From A to Z| |(?m)|Multiline mode (except Ruby and JS) ^ and $ match at the beginning and end of every line.|(?m)1\r\n^2$\r\n^3$|1| |:::|:::|:::|2| |:::|:::|:::|3| |(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.|(?m)From A.*to Z|From A to Z| |(?x)|Free-Spacing Mode mode (except JavaScript). Also known as comment mode or whitespace mode.|(?x)abc[ ]d|abc d| |:::|Spaces must be in brackets|abc[ ]d| | |(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| | |(?d)|Java: Unix linebreaks only|The dot and the ^ and $ anchors are only affected by \n| | |(?^)|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| | ---- ===== Lookarounds ===== ^Lookaround^Legend^Example^Sample Match^ |(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789| |(?<=…)|Positive lookbehind|(?<=\d)cat|cat in 1cat| |(?!…)|Negative lookahead|(?!theatre)the\w+|theme| |(?[…-[…]]|.NET: character class subtraction. One character that is in those on the left, but not in the subtracted class.|[a-z-[aeiou]]|Any lowercase consonant| |[…-[…]]|.NET: character class subtraction.|[\p{IsArabic}-[\D]]|An Arabic character that is not a non-digit, i.e., an Arabic digit| |[…&&[…]]|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|[\S&&[\D]]|An non-whitespace character that is a non-digit.| |[…&&[…]]|Java, Ruby 2+: character class intersection.|[\S&&[\D]&&[^a-zA-Z]]|An non-whitespace character that a non-digit and not a letter.| |[…&&[^…]]|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.|[a-z&&[^aeiou]]|An English lowercase letter that is not a vowel.| |[…&&[^…]]|Java, Ruby 2+: character class subtraction|[\p{InArabic}&&[^\p{L}\p{N}]]|An Arabic character that is not a letter or a number| ---- ===== Other Syntax ===== ^Syntax^Legend^Example^Sample Match^ |\K|Keep Out. Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.|prefix\K\d+|12| |\Q…\E|Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.|\Q(C++ ?)\E|(C++ ?)|