Regex Fundamentals: The Building Blocks
A regular expression is a pattern that describes a set of strings. Every character in a regex is either a literal (matches itself) or a metacharacter (has special meaning).
Character classes match any single character from a set:
.— any character except newline (withsflag: including newline)\d— digit [0-9];\D— not a digit\w— word character [a-zA-Z0-9_];\W— not a word character\s— whitespace (space, tab, newline, etc.);\S— not whitespace[aeiou]— any vowel;[^aeiou]— any non-vowel[a-z]— any lowercase letter;[A-Za-z0-9]— alphanumeric
Quantifiers specify how many times the preceding element can match:
*— 0 or more (greedy)+— 1 or more (greedy)?— 0 or 1 (makes preceding element optional){n}— exactly n times{n,m}— between n and m times (inclusive)*?+?— lazy (non-greedy) variants — match as few characters as possible
Anchors assert position, not characters:
^— start of string (or start of line withmflag)$— end of string (or end of line withmflag)\b— word boundary (between\wand\W)\B— non-word boundary
JavaScript Regex Flags — All Five Explained
JavaScript regular expressions support five flags that alter matching behaviour. You can combine any flags together.
| Flag | Name | Effect |
|---|---|---|
| g | global | Find all matches, not just the first |
| i | ignoreCase | Case-insensitive matching (A matches a) |
| m | multiline | ^ and $ match start/end of each line |
| s | dotAll | . matches newline characters too |
| u | unicode | Enables full Unicode support; required for emoji and multi-byte characters |
// g flag — find all matches
'aababc'.match(/a/g) // ['a', 'a', 'a']
// i flag — case-insensitive
'Hello'.match(/hello/i) // ['Hello']
// m flag — multiline anchors
'foo\nbar'.match(/^bar/m) // ['bar']
// s flag — dotAll
'foo\nbar'.match(/foo.bar/s) // ['foo\nbar']
// u flag — Unicode
'😀'.match(/./u) // ['😀'] (without u: matches half the emoji)Named Capture Groups — Cleaner, Self-Documenting Code
Numbered capture groups ((pattern)) are hard to read in complex regex patterns. Named capture groups use the syntax (?<name>pattern) and make your regex self-documenting:
// Date parsing with numbered groups — hard to read
const dateRe = /^(\d{4})-(\d{2})-(\d{2})$/;
const m = '2025-05-24'.match(dateRe);
const year = m[1]; // need to count parentheses
const month = m[2];
const day = m[3];
// Same with named groups — self-documenting
const dateNamed = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const { year, month, day } = '2025-05-24'.match(dateNamed).groups;
// year='2025', month='05', day='24'
Named groups are also available in replacements via $<name>:
// Reformat date from YYYY-MM-DD to DD/MM/YYYY
'2025-05-24'.replace(
/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/,
'$<d>/$<m>/$<y>'
);
// '24/05/2025'Lookaheads and Lookbehinds — Match Without Consuming
Lookahead and lookbehind assertions (collectively: lookarounds) let you assert that something does or doesn't exist at a position without including it in the match. They are zero-width — they don't consume characters.
(?=pattern)— positive lookahead: position must be followed by pattern(?!pattern)— negative lookahead: position must NOT be followed by pattern(?<=pattern)— positive lookbehind: position must be preceded by pattern(?<!pattern)— negative lookbehind: position must NOT be preceded by pattern
// Find numbers followed by "px" but don't include "px" in the match
'margin: 16px; padding: 8px;'.match(/\d+(?=px)/g)
// ['16', '8']
// Find "cat" not followed by "nap" or "fish"
'catnap catfish cat'.match(/cat(?!nap|fish)/g)
// ['cat'] (only the standalone cat)
// Extract price without the $ sign
'Price: $49.99'.match(/(?<=\$)[\d.]+/)
// ['49.99']
// Password validation: 8+ chars, at least one digit, one uppercase
const passwordRe = /^(?=.*\d)(?=.*[A-Z]).{8,}$/;
passwordRe.test('Password1') // true
passwordRe.test('password1') // false (no uppercase)12 Essential Regex Patterns
These patterns are battle-tested and cover the most common validation and parsing tasks. All are written for JavaScript.
// 1. Email (RFC 5322 simplified — not the full RFC regex which is enormous)
/^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/
// 2. URL (http and https)
/^https?:\/\/[\w\-]+(\.[\w\-]+)+([\w.,@?^=%&:/~+#\-]*[\w@?^=%&/~+#\-])?/
// 3. IPv4 address
/^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/
// 4. UUID v4
/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i
// 5. Semantic version (semver)
/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-[\w.-]+)?(?:\+[\w.-]+)?$/
// 6. ISO 8601 date
/^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$/
// 7. Hex color (3 or 6 digit)
/^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$/
// 8. Credit card (Luhn check still needed separately)
/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$/
// 9. Strong password (8+ chars, 1 digit, 1 uppercase, 1 special)
/^(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/
// 10. Slug (URL-friendly string)
/^[a-z0-9]+(?:-[a-z0-9]+)*$/
// 11. JWT token structure
/^[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+$/
// 12. YYYY-MM-DD date in text (not whole string)
/\b\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\b/gCatastrophic Backtracking — The Performance Killer
Catastrophic backtracking is a condition where a regex engine's backtracking causes exponential time complexity for certain inputs. It is one of the most dangerous regex pitfalls and has caused real-world outages (the Cloudflare outage of 2019 and the ReDoS vulnerability in many npm packages).
It typically occurs when you have nested quantifiers on overlapping patterns:
// ❌ Catastrophic — nested quantifiers on overlapping classes
const bad = /^(a+)+$/;
bad.test('aaaaaaaaaaaaaaaaab'); // hangs the browser tab
// Why: the engine tries every way to partition "aaa...a" before
// concluding it can't match the "b" at the end.
// ✅ Fixed — atomic grouping or possessive quantifiers (not in JS)
// In JS, rewrite to avoid ambiguity:
const good = /^a+$/;
How to identify vulnerable patterns:
- Nested quantifiers:
(a+)+,(a*)*,(a|a)+ - Alternation with overlapping branches inside a quantifier:
(foo|fo)+ - Polynomial patterns:
a?a?a?aaamatching "aaa" — can be exponential with enough repetition
Mitigations: Use possessive quantifiers where your regex engine supports them; set execution timeouts for user-supplied regex; consider using a linear-time regex engine (RE2, Rust's regex crate) for untrusted input.