• About Us
    • New York
  • Work
  • Capabilities
  • Careers
  • Technology
  • Blog
  • Contact Us
  • About Us
    • New York
  • Work
  • Capabilities
  • Careers
  • Technology
  • Blog
  • Contact Us
June 16, 2015

An Introduction to Regex

Posted by Emily Fox

Every developer has likely come across regular expressions already. If you’re like me, they’re a scary mess of syntax jumbled together that don’t make much sense…until now! Once you get these basics under your belt, you’ll wonder why you ever avoided regex in the first place.

…a regular expression (abbreviated regex or regexp and sometimes called a rational expression) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. – Wikipedia

So here are the basics you need to know:

Getting Started

  1. All regex patterns are between delimiters, typically a forward slash /. For example, /regex/
  2. Use a caret symbol ^ to match the beginning of input. For example, /^regex/ would match “regex is a pattern”, but would not match “this post is about regex”
  3. Use dollar symbol $ to match the end of input. For example, /regex$/ would match “This post is about regex”, but would not match “regex is at the beginning of this sentence”

Matching Characters

  1. Using a period . will match a single character. For example /re.ex/ will match “regex”, “re ex”, “refex”, “rebex”, etc
  2. A plus symbol + will match one or more of the previous character. For example, /re+gex/ will match both “reegex”, “reeeegex”, or “regex”
  3. Similarly, an asterisk * will match zero or more of the previous character. For example, /re*gex/ will match “reegex”, “reeeegex”, “regex”, and “rgex”
  4. A question mark ? will match zero OR one of the previous character. For example, /re?gex/ will match “regex” or “rgex”, but not “reeeegex”.
  5. You can also set your own minimum and maximum number of characters to match using curly blackets {1,2} which will match at least 1, and at most 2. For example, /re{1,2}gex/ will match “regex” and “reegex”, but not “rgex” or “reeegex”

Combining Syntax

Combining the above character matching syntax will allow you to run more advanced patterns to match.

  1. .? will find zero or one character or any type
  2. .+ will find one or more characters of any type
  3. .* will find zero or more characters of any type

Character Classes

Enclosing characters without square brackets [] will allow you to match specific characters. These can be either specific characters, a range of characters you want to match, or a combination of these. Note that these are case sensitive.

  1. To match single characters, add the characters within square brackets. For example, /[reG,]/ will match “r”, “e”, “G”, or “,”. This will not match “R”, “E”, or “g”.
  2. Use a hyphen to match a range. For example, [a-c] will match any lowercase letter from a through to c, but will not match “A”, or “d”.
  3. You can use a match a combination of things within these brackets. For example, [a-zA-CZ1-4] which will match lowercase “a” through to “z”, uppercase “A” through to “C”, uppercase “Z”, and “1” through to “4”. Note that this will only matches on character.
  4. To exclude characters, add a caret ^ in front of the pattern. For example, [^a1-3] will match any character, except “a”, “1”, “2”, or “3”.

Subpatterns

When looking for patterns within a pattern or specific sequence of characters you can use parenthesis ().
33583925

  1. To match a string of characters as a group, add it within a subpattern. For example, /(regex)/ would match “regex” exactly.
  2. To find alternative strings, you can separate them with pipe line |. For example, /(reg|ex)/ will match either “reg” or “ex”.
  3. You can combine this with other symbols. For example, /(regex)+/ will match “regexregexregex”

Escape Sequences

These are like shorthand snippets, here are a few useful ones:

  1. \d will match any digit
  2. \D will match anything that is not a digit
  3. \s will match any whitespace character
  4. \S will match anything that is not a whitespace character
  5. \t will match a tab
  6. \n will match a newline

Back References

Once you’ve written a complicated pattern, rather than duplicating it you can reference back to it. For example, If your pattern looks like /(outer-pattern(inner-pattern))/ then using \1 would reference the pattern within “outer-pattern(inner-pattern)”, \2 would reference the pattern within “inner-pattern”. This would look something like /(outer-pattern(inner-pattern))\1\2/ and would be the equivalent of writing /(outer-pattern(inner-pattern))(outer-pattern(inner-pattern))(inner-pattern)/.

Similarly, /(reg)(ex)\1\2/ could be used, where \1 would reference “reg” and \2 would reference “ex”.

Matching regular expression syntax

Just like how you would with any other language, you can escape within your regular expression pattern to match to a character used as syntax within the pattern. For example, /rege\./ will match “rege.” including the period. Without the backslash escaping the period, i.e. /rege./, we would match “regex”, “reget”, or “rege ” etc.

–

Once you understand what each symbol does, it becomes easier to read through more complicated examples and write your own regex to suit your pattern matching needs!

Additional Resources

  • Regular Expressions Website
  • RegExr – A tool for learning, building, and testing Regular Expressions

– Emily Fox

codingDevelopmentregexregular expression
Previous
Next

Latest White Papers

  • Shifting Plans for 2020 & Beyond
  • Game On: How Brands Can Log Into A Diverse Multi-Billion Dollar Industry
  • What CCPA Means For Brands
  • How Google is Improving Consumer Data Privacy
  • Ways to Prepare for the Cookieless Future
  • See all White Papers

Featured Posts

  • Ad Age Names PMG #1 Best Place to Work in 2021
  • MediaPost Names PMG Independent Agency of the Year
  • PMG Client Portfolio Trends During Amazon Prime Day 2020
  • A Closer Look at the Congressional Big Tech Market Power Report
  • What to Know About Reddit

Categories

  • Consumer Insights
  • Content
  • Creative Design
  • Data Analytics
  • Development
  • Digital TV & Video
  • Ecommerce
  • Industry News
  • Local
  • Mobile
  • Paid Search
  • PMG Culture
  • Programmatic & Display
  • SEO
  • Social Media
  • Structured Data
Fort Worth

2845 West 7th Street
Fort Worth, TX 76107

Dallas

3102 Oak Lawn Avenue
Suite 650
Dallas, TX 75219

Austin

823 Congress Avenue
Suite 800
Austin, TX 78701

London

33 Broadwick Street
London
W1F 0DQ

New York

120 East 23rd Street
New York, NY 10010

Get in touch

(817) 420 9970
info@pmg.com

Subscribe to the PMG Newsletter
© 2021 PMG Worldwide, LLC, All Rights Reserved
  • Contact
  • Privacy Policy
 Tweet
 Share
 Tweet
 Share
 Tweet
 Share
 LinkedIn
We and our partners use cookies to personalize content, analyze traffic, and deliver ads. By using our website, you agree to the use of cookies as described in our Cookie Policy.