This website uses cookies to ensure you get the best possible experience. See our Cookies Policy.
An Introduction to Regex
Every developer has likely come across regular expressions already. If you’re like me, they’re a scary mess of syntax jumbled together that don’t make much sense…until now! Once you get these basics under your belt, you’ll wonder why you ever avoided regex in the first place.
…a regular expression (abbreviated regex or regexp and sometimes called a rational expression) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. – Wikipedia
So here are the basics you need to know:
All regex patterns are between delimiters, typically a forward slash /. For example, /regex/
Use a caret symbol ^ to match the beginning of input. For example, /^regex/ would match “regex is a pattern”, but would not match “this post is about regex”
Use dollar symbol $ to match the end of input. For example, /regex$/ would match “This post is about regex”, but would not match “regex is at the beginning of this sentence”
Using a period . will match a single character. For example /re.ex/ will match “regex”, “re ex”, “refex”, “rebex”, etc
A plus symbol + will match one or more of the previous character. For example, /re+gex/ will match both “reegex”, “reeeegex”, or “regex”
Similarly, an asterisk * will match zero or more of the previous character. For example, /re*gex/ will match “reegex”, “reeeegex”, “regex”, and “rgex”
A question mark ? will match zero OR one of the previous character. For example, /re?gex/ will match “regex” or “rgex”, but not “reeeegex”.
You can also set your own minimum and maximum number of characters to match using curly blackets {1,2} which will match at least 1, and at most 2. For example, /re{1,2}gex/ will match “regex” and “reegex”, but not “rgex” or “reeegex”
Combining the above character matching syntax will allow you to run more advanced patterns to match.
.? will find zero or one character or any type
.+ will find one or more characters of any type
.* will find zero or more characters of any type
Enclosing characters without square brackets [] will allow you to match specific characters. These can be either specific characters, a range of characters you want to match, or a combination of these. Note that these are case sensitive.
To match single characters, add the characters within square brackets. For example, /[reG,]/ will match “r”, “e”, “G”, or “,”. This will not match “R”, “E”, or “g”.
Use a hyphen to match a range. For example, [a-c] will match any lowercase letter from a through to c, but will not match “A”, or “d”.
You can use a match a combination of things within these brackets. For example, [a-zA-CZ1-4] which will match lowercase “a” through to “z”, uppercase “A” through to “C”, uppercase “Z”, and “1” through to “4”. Note that this will only matches on character.
To exclude characters, add a caret ^ in front of the pattern. For example, [^a1-3] will match any character, except “a”, “1”, “2”, or “3”.
When looking for patterns within a pattern or specific sequence of characters you can use parenthesis ().
To match a string of characters as a group, add it within a subpattern. For example, /(regex)/ would match “regex” exactly.
To find alternative strings, you can separate them with pipe line |. For example, /(reg|ex)/ will match either “reg” or “ex”.
You can combine this with other symbols. For example, /(regex)+/ will match “regexregexregex”
These are like shorthand snippets, here are a few useful ones:
\d will match any digit
\D will match anything that is not a digit
\s will match any whitespace character
\S will match anything that is not a whitespace character
\t will match a tab
\n will match a newline
Once you’ve written a complicated pattern, rather than duplicating it you can reference back to it. For example, If your pattern looks like /(outer-pattern(inner-pattern))/ then using \1 would reference the pattern within “outer-pattern(inner-pattern)”, \2 would reference the pattern within “inner-pattern”. This would look something like /(outer-pattern(inner-pattern))\1\2/ and would be the equivalent of writing /(outer-pattern(inner-pattern))(outer-pattern(inner-pattern))(inner-pattern)/.
Similarly, /(reg)(ex)\1\2/ could be used, where \1 would reference “reg” and \2 would reference “ex”.
Just like how you would with any other language, you can escape within your regular expression pattern to match to a character used as syntax within the pattern. For example, /rege\./ will match “rege.” including the period. Without the backslash escaping the period, i.e. /rege./, we would match “regex”, “reget”, or “rege ” etc.
–
Once you understand what each symbol does, it becomes easier to read through more complicated examples and write your own regex to suit your pattern matching needs!
Additional Resources
RegExr – A tool for learning, building, and testing Regular Expressions
Stay in touch
Subscribe to our newsletter
– Emily Fox
Posted by: Emily Fox