PMG Digital Made for Humans

An Introduction to Regex

5 MINUTE READ | June 16, 2015

An Introduction to Regex

Author's headshot

Emily Fox

Emily Fox has written this article. More details coming soon.

Every developer has likely come across regular expressions already. If you’re like me, they’re a scary mess of syntax jumbled together that don’t make much sense…until now! Once you get these basics under your belt, you’ll wonder why you ever avoided regex in the first place.

…a regular expression (abbreviated regex or regexp and sometimes called a rational expression) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.Wikipedia

So here are the basics you need to know:

  1. All regex patterns are between delimiters, typically a forward slash /. For example, /regex/

  2. Use a caret symbol ^ to match the beginning of input. For example, /^regex/ would match “regex is a pattern”, but would not match “this post is about regex”

  3. Use dollar symbol $ to match the end of input. For example, /regex$/ would match “This post is about regex”, but would not match “regex is at the beginning of this sentence”

  1. Using a period . will match a single character. For example /re.ex/ will match “regex”, “re ex”, “refex”, “rebex”, etc

  2. A plus symbol + will match one or more of the previous character. For example, /re+gex/ will match both “reegex”, “reeeegex”, or “regex”

  3. Similarly, an asterisk * will match zero or more of the previous character. For example, /re*gex/ will match “reegex”, “reeeegex”, “regex”, and “rgex”

  4. A question mark ? will match zero OR one of the previous character. For example, /re?gex/ will match “regex” or “rgex”, but not “reeeegex”.

  5. You can also set your own minimum and maximum number of characters to match using curly blackets {1,2} which will match at least 1, and at most 2. For example, /re{1,2}gex/ will match “regex” and “reegex”, but not “rgex” or “reeegex”

Combining the above character matching syntax will allow you to run more advanced patterns to match.

  1. .? will find zero or one character or any type

  2. .+ will find one or more characters of any type

  3. .* will find zero or more characters of any type

Enclosing characters without square brackets [] will allow you to match specific characters. These can be either specific characters, a range of characters you want to match, or a combination of these. Note that these are case sensitive.

  1. To match single characters, add the characters within square brackets. For example, /[reG,]/ will match “r”, “e”, “G”, or “,”. This will not match “R”, “E”, or “g”.

  2. Use a hyphen to match a range. For example, [a-c] will match any lowercase letter from a through to c, but will not match “A”, or “d”.

  3. You can use a match a combination of things within these brackets. For example, [a-zA-CZ1-4] which will match lowercase “a” through to “z”, uppercase “A” through to “C”, uppercase “Z”, and “1” through to “4”. Note that this will only matches on character.

  4. To exclude characters, add a caret ^ in front of the pattern. For example, [^a1-3] will match any character, except “a”, “1”, “2”, or “3”.

When looking for patterns within a pattern or specific sequence of characters you can use parenthesis ().

  1. To match a string of characters as a group, add it within a subpattern. For example, /(regex)/ would match “regex” exactly.

  2. To find alternative strings, you can separate them with pipe line |. For example, /(reg|ex)/ will match either “reg” or “ex”.

  3. You can combine this with other symbols. For example, /(regex)+/ will match “regexregexregex”

These are like shorthand snippets, here are a few useful ones:

  1. \d will match any digit

  2. \D will match anything that is not a digit

  3. \s will match any whitespace character

  4. \S will match anything that is not a whitespace character

  5. \t will match a tab

  6. \n will match a newline

Once you’ve written a complicated pattern, rather than duplicating it you can reference back to it. For example, If your pattern looks like /(outer-pattern(inner-pattern))/ then using \1 would reference the pattern within “outer-pattern(inner-pattern)”, \2 would reference the pattern within “inner-pattern”. This would look something like /(outer-pattern(inner-pattern))\1\2/ and would be the equivalent of writing /(outer-pattern(inner-pattern))(outer-pattern(inner-pattern))(inner-pattern)/.

Similarly, /(reg)(ex)\1\2/ could be used, where \1 would reference “reg” and \2 would reference “ex”.

Just like how you would with any other language, you can escape within your regular expression pattern to match to a character used as syntax within the pattern. For example, /rege\./ will match “rege.” including the period. Without the backslash escaping the period, i.e. /rege./, we would match “regex”, “reget”, or “rege ” etc.

Once you understand what each symbol does, it becomes easier to read through more complicated examples and write your own regex to suit your pattern matching needs!

Additional Resources

Stay in touch

Bringing news to you

Subscribe to our newsletter

By clicking and subscribing, you agree to our Terms of Service and Privacy Policy

– Emily Fox


Related Content

thumbnail image

AlliPMG CultureCampaigns & Client WorkCompany NewsDigital MarketingData & Technology

PMG Innovation Challenge Inspires New Alli Technology Solutions

4 MINUTES READ | November 2, 2021

thumbnail image

Applying Function Options to Domain Entities in Go

11 MINUTES READ | October 21, 2019

thumbnail image

My Experience Teaching Through Jupyter Notebooks

4 MINUTES READ | September 21, 2019

thumbnail image

Working with an Automation Mindset

5 MINUTES READ | August 22, 2019

thumbnail image

3 Tips for Showing Value in the Tech You Build

5 MINUTES READ | April 24, 2019

thumbnail image

Testing React

13 MINUTES READ | March 12, 2019

thumbnail image

A Beginner’s Experience with Terraform

4 MINUTES READ | December 20, 2018

ALL POSTS