TLDR – Intro to Regex (Regular Expressions) Regex is a powerful tool for pattern matching, text searching, validation, and replacement across many programming languages. Core elements include metacharacters (., ^, $, *, +, ?), character classes ([a-z], \w, \d), and quantifiers (+, *, {m,n}).
Groups and named captures ((), (?<name>...)) allow complex pattern extraction and reuse. Flags like g, i, and m modify matching behavior (e.g. global, case-insensitive, multiline). Regex can be written using literals (/pattern/flags) or the RegExp constructor (new RegExp("pattern", "flags")).
Key methods : .test() (returns boolean), .match() (returns matches), .replace() (replaces text). Bracket expressions and negated classes (e.g. [^a-z]) allow fine-grained control over matching. Common use cases include email validation , string parsing, and input sanitization.
Mastering regex improves code efficiency and helps automate complex text-processing tasks
As a software developer, you’ve probably encountered regular expressions several times and were confused when seeing this daunting set of characters grouped together like this:
And you may have wondered what this gibberish means…Regular expressions (Regex or Regexp) are extremely useful in stepping up your algorithm game and will make you a better problem solver. The structure of regular expressions can be intimidating at first, but it is so rewarding once you grasp the patterns and implement them in your work properly.
Regex syntax consists of a sequence of characters, metacharacters, and quantifiers.
Metacharacters are special characters that define specific operations or behaviors in regex. Some commonly used metacharacters include: The dot (.) matches any character except line breaks. The caret (^) matches the start of a string or line (in multiline mode).
The dollar sign ($) matches the end of a string or line (in multiline mode). The asterisk (*) matches zero or more occurrences of the preceding element. The plus sign (+) matches one or more occurrences of the preceding element. The question mark (?) matches zero or one occurrence of the preceding element.
A character class is a set of characters that can be matched by a regex pattern. Character classes are defined using square brackets [] and can contain a list of characters, a range of characters, or a combination of both. [a-z] matches any lowercase letter from 'a' to 'z'.
[A-Z] matches any uppercase letter from 'A' to 'Z'. Character classes can also be negated by using the caret symbol ^ at the beginning of the class. For instance: [^a-zA-Z] matches any character that is not a letter (i.e., neither lowercase nor uppercase). [^0-9] matches any character that is not a digit.
This flexibility allows you to create specific and targeted search patterns to suit various use cases.
A word character in regex includes letters, digits, and underscores (_). These characters are matched using the shorthand \w. For example, \w+ matches one or more word characters, which can be helpful for finding words, variable names, or identifiers in text.
The opposite of a word character is a non-word character , which can be matched using the shorthand \W. For example, \W matches any character that is not a letter, digit, or underscore, such as punctuation or spaces. This distinction between word and non-word characters is crucial for creating accurate and efficient regex patterns.
Quantifiers are used to specify how many times a pattern should appear within the text being matched. The most common quantifiers are:
These quantifiers allow you to control the repetition of patterns in your regex, which is essential when you need to match repeating sequences or optional elements.
Groups are used to capture parts of a match for reuse or extraction. Groups are defined by enclosing a pattern in parentheses (). For example, (abc) captures the string 'abc', which can then be referenced in the same regex using \1(this refers to the first captured group).
Unnamed Groups : The basic parentheses (abc) will capture without a name.
Named Groups : Use the syntax (?<name>pattern) to capture a group with a name, such as (?<id>\d+), making the regex more readable and the captured content more accessible.
This capability to group and reference parts of a match adds a powerful layer of flexibility to your regex patterns, especially in complex matching scenarios.
Flags modify the behavior of a regular expression and can be added after the closing slash or as the second parameter in the RegExp constructor. Here are the most commonly used flags: g (global): The g flag makes the regex search for all occurrences of the pattern, not just the first one.
i (case-insensitive): The i flag makes the regex match letters regardless of their case, so a matches both 'a' and 'A'.
m (multiline): The m flag changes the behavior of the ^ and $ anchors to match the start and end of each line within the string, instead of the start and end of the entire string. Treat the string as multiple lines for ^ and $ matches (m). There are two types of regular expressions you can create:
To create a regular expression literal, you start and end with forward slashes ( /) to enclose the Regex pattern. Syntax: For a RegExp constructor, this method builds the expression for you. Syntax:
If your regular expression is constant and does not change its value, you should use the regex literal for better performance. In cases where it is dynamic and not a literal string (i.e., an expression), it is best to use the regex constructor (see above example).
There are three common Regex methods that you should be familiar with: test, match, and replace. Let's look at an example of the test method.
In the example above, the .test method returns a boolean - checking if the string contains a regex match or no match in the search pattern.
Here is a very basic example below. Later on, you will see how Regex match can be a powerful tool when combining the Regex with flags.
The .replace method searches for a string for a specified value (or regular expression) and returns a new string where the specified value is replaced.
You CANNOT replace multiple instances using a regular value, but you CAN do this with Regex. The example below is using a regular value. Inside bracket expressions, you can place any special character you want to use to specify the character sets.
For example, const regex = /[A-Z]/. Notice that A-Z is inside the square brackets. This will search for all uppercase letters in the alphabet. Here are some similar search patterns: [a-z] matches a string that has all lowercase letters in the entire alphabet [A-Z] matches a string that has all the uppercase letters in the entire alphabet
[abcd] matches a string that has a, b, c, d [a-d] exactly the same as previous example so you can either specify each character or group them [a-gA-C0-7] matches string that has lowercase letters a-g, uppercase letters A-C, or numbers 0-7 [^a-zA-Z] matches a string that DOES NOT have all lowercase or uppercase letters
*Inside a character set, the ^ character means all the characters that are NOT in the a-z or A-Z.
After we end with a slash character, we can either choose one specific flag or combine them. Regex uses flags to be more specific on how to properly find and match the defined custom characters. Before we go into the specific flags, you should keep in mind that flags are optional like the example below
Without flags, Regex will find the first character that returns true in an array within the slashes. So in this case, our code will return: [‘T’] because it found the first uppercase letter in the sentence.
The g in g flag stands for "global" which means it will return what is true within the entire regular expression. In other words, it will not only return after the first match, but ALL the occurrences that matched.
If we added the g flag at the end of our slash, it would return all the characters from the regular expression that is upper case.
Let’s say we changed const to be const regex = /[a-z]/m. The m flag will be checking to see the first instance of a lowercase letter from a-z so it will return [‘h’].
As an additional side note, there are three other character classes that can help when using multiple character sets for pattern matching. The negations of \d, \w, and \s will be \D, \W, and \S. It will find the following: \D matches any non digit character (same as [^0-9])
\W matches any non word character (same as [^a-zA-Z0-9_]) How to Use Quantifiers In Regular Expression Quantifiers are basic symbols in regular expressions that have a special meaning. * matches previous item zero or more times + matches previous item once or more times
? matches previous item zero or one times; makes preceding item optional . matches any single character (except line breaks)
{m, n} min is 0 or positive integer number that indicates minimum # of matches, and max is an integer equal to or greater than min indicating the maximum number of matches Let’s go through this example to demonstrate our understanding of quantifiers.
You can see that the regular expression is checking all the lowercase letters from a-z and using the + symbol to match up all the previous items. So when you console log found, it will return [ ‘for’, ‘if’, ‘rof’, ‘fi’ ]. Let’s say that + symbol was not there and the Regex was only:
Then it will return [ ‘f’, ‘o’, ‘r’, ‘i’, ‘f’, ‘r’, ‘o’, ‘f’, ‘f’, ‘i’ ]. Use Case: Regex for Email Address Formatting Remember this long string of characters we saw at the beginning of this article?
Now that we have learned the basic methods and terminologies used in Regex, let’s break down this once daunting but now understandable string of characters one step at a time.
The third piece is < strong>@\w+< /strong> and this will be checking for the @ character in the given email followed by the w flag to check for any alphanumeric character. This covers for the ‘@alumni’ piece of the email. The + quantifier continues to match up the previous sections of the email address.
The following piece of < strong>([.]?\w)+< /strong> is the same search pattern as our second piece except it’s only checking for the “.” character and alphanumeric character, excluding our “-” symbol. This represents “.school” in the email. Finally, we have the < strong>$< /strong> character to end our Regex string.
And that’s it! Now we know how to use Regex for a basic email validation. Additionally, you can implement brackets, flags, and/or quantifiers in your Regex to accommodate for other edge cases not considered in our Regex string. *$ means zero or more of the preceding character at the end of a string.
Regex is a tool used to search, match, and manipulate text patterns in a string, like finding specific words or validating inputs.
Start by learning basic patterns (like \d for digits, \w for words) and practice common use cases, such as searching, matching, and replacing text using regex tools or code.
Break it down into parts: start by identifying literals, metacharacters, quantifiers, and groups. Learn what each piece matches to understand the overall pattern. .+) matches one or more of any character, with a closing parenthesis often indicating a group. $1 and $2 are back references that refer to the first and second captured groups in a regex match.
The tilde ~ isn’t a special character in regex, but in some languages (like Perl), it's used to delimit regex patterns. % is not special in regex unless used in specific languages. It generally matches itself. \+ matches the literal + character since + is normally a quantifier for "one or more".
The underscore _ is treated as a literal character and matches itself. \s matches any whitespace character (spaces, tabs, line breaks).
