What is the difference between test(), match(), and exec()?

test() returns a boolean — use it when you just need to know if a pattern matches. match() is called on a string and returns an array of matches (or null). exec() is called on the regex object and returns detailed information including capture groups; when used with the g flag it can be called repeatedly to iterate over all matches.

Why does my regex with the g flag behave differently on the second call?

Regex objects with the g or y flag maintain a lastIndex property that tracks where the next search should start. If you call exec() or test() on the same regex object in a loop without resetting lastIndex, you get stale results. Always create a new regex literal inside a function, or manually reset regex.lastIndex = 0 between uses.

How do I match a literal dot or parenthesis in a regex?

Escape it with a backslash: \. matches a literal dot, \( matches a literal opening parenthesis. Without the backslash, . is a wildcard and ( starts a capture group. Similarly, escape *, +, ?, [, ], {, }, ^, $, |, and \.

What is a non-capturing group and when should I use it?

A non-capturing group (?:pattern) groups the pattern for quantifiers or alternation without creating a capture group. Use it when you need the grouping behaviour (e.g. (foo|bar)+ applied to multiple alternatives) but don't need to extract the matched text. It's also slightly faster than a capturing group.

Can I use regex to parse HTML or XML?

Generally no. HTML and XML are context-free grammars that are not describable by a regular expression. Nested tags especially (like ) require a stack to parse correctly, which regex cannot provide. Use a proper HTML parser (DOMParser in the browser, cheerio or parse5 in Node.js, BeautifulSoup in Python) for any non-trivial HTML processing.

What does the u flag do and when do I need it?

The u (unicode) flag makes the regex engine treat the pattern and the input string as sequences of Unicode code points rather than UTF-16 code units. Without it, characters outside the Basic Multilingual Plane (emoji, many CJK extensions, historic scripts) may match incorrectly because they are two UTF-16 code units wide. Always use the u flag when your input may contain non-ASCII characters.

Regex Testing & Debugging: The Complete Developer Guide

Regex Fundamentals: The Building Blocks

A regular expression is a pattern that describes a set of strings. Every character in a regex is either a literal (matches itself) or a metacharacter (has special meaning).

Character classes match any single character from a set:

. — any character except newline (with s flag: including newline)
\d — digit [0-9]; \D — not a digit
\w — word character [a-zA-Z0-9_]; \W — not a word character
\s — whitespace (space, tab, newline, etc.); \S — not whitespace
[aeiou] — any vowel; [^aeiou] — any non-vowel
[a-z] — any lowercase letter; [A-Za-z0-9] — alphanumeric

Quantifiers specify how many times the preceding element can match:

* — 0 or more (greedy)
+ — 1 or more (greedy)
? — 0 or 1 (makes preceding element optional)
{n} — exactly n times
{n,m} — between n and m times (inclusive)
*? +? — lazy (non-greedy) variants — match as few characters as possible

Anchors assert position, not characters:

^ — start of string (or start of line with m flag)
$ — end of string (or end of line with m flag)
\b — word boundary (between \w and \W)
\B — non-word boundary

JavaScript Regex Flags — All Five Explained

JavaScript regular expressions support five flags that alter matching behaviour. You can combine any flags together.

Flag	Name	Effect
g	global	Find all matches, not just the first
i	ignoreCase	Case-insensitive matching (A matches a)
m	multiline	^ and $ match start/end of each line
s	dotAll	. matches newline characters too
u	unicode	Enables full Unicode support; required for emoji and multi-byte characters

// g flag — find all matches
'aababc'.match(/a/g)     // ['a', 'a', 'a']

// i flag — case-insensitive
'Hello'.match(/hello/i)  // ['Hello']

// m flag — multiline anchors
'foo\nbar'.match(/^bar/m)  // ['bar']

// s flag — dotAll
'foo\nbar'.match(/foo.bar/s)  // ['foo\nbar']

// u flag — Unicode
'😀'.match(/./u)          // ['😀']  (without u: matches half the emoji)

Named Capture Groups — Cleaner, Self-Documenting Code

Numbered capture groups ((pattern)) are hard to read in complex regex patterns. Named capture groups use the syntax (?<name>pattern) and make your regex self-documenting:

// Date parsing with numbered groups — hard to read
const dateRe = /^(\d{4})-(\d{2})-(\d{2})$/;
const m = '2025-05-24'.match(dateRe);
const year  = m[1]; // need to count parentheses
const month = m[2];
const day   = m[3];

// Same with named groups — self-documenting
const dateNamed = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const { year, month, day } = '2025-05-24'.match(dateNamed).groups;
// year='2025', month='05', day='24'

Named groups are also available in replacements via $<name>:

// Reformat date from YYYY-MM-DD to DD/MM/YYYY
'2025-05-24'.replace(
  /(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/,
  '$<d>/$<m>/$<y>'
);
// '24/05/2025'

Lookaheads and Lookbehinds — Match Without Consuming

Lookahead and lookbehind assertions (collectively: lookarounds) let you assert that something does or doesn't exist at a position without including it in the match. They are zero-width — they don't consume characters.

(?=pattern) — positive lookahead: position must be followed by pattern
(?!pattern) — negative lookahead: position must NOT be followed by pattern
(?<=pattern) — positive lookbehind: position must be preceded by pattern
(?<!pattern) — negative lookbehind: position must NOT be preceded by pattern

// Find numbers followed by "px" but don't include "px" in the match
'margin: 16px; padding: 8px;'.match(/\d+(?=px)/g)
// ['16', '8']

// Find "cat" not followed by "nap" or "fish"
'catnap catfish cat'.match(/cat(?!nap|fish)/g)
// ['cat']  (only the standalone cat)

// Extract price without the $ sign
'Price: $49.99'.match(/(?<=\$)[\d.]+/)
// ['49.99']

// Password validation: 8+ chars, at least one digit, one uppercase
const passwordRe = /^(?=.*\d)(?=.*[A-Z]).{8,}$/;
passwordRe.test('Password1')  // true
passwordRe.test('password1')  // false (no uppercase)

12 Essential Regex Patterns

These patterns are battle-tested and cover the most common validation and parsing tasks. All are written for JavaScript.

// 1. Email (RFC 5322 simplified — not the full RFC regex which is enormous)
/^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/

// 2. URL (http and https)
/^https?:\/\/[\w\-]+(\.[\w\-]+)+([\w.,@?^=%&:/~+#\-]*[\w@?^=%&/~+#\-])?/

// 3. IPv4 address
/^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/

// 4. UUID v4
/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i

// 5. Semantic version (semver)
/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-[\w.-]+)?(?:\+[\w.-]+)?$/

// 6. ISO 8601 date
/^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$/

// 7. Hex color (3 or 6 digit)
/^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$/

// 8. Credit card (Luhn check still needed separately)
/^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})$/

// 9. Strong password (8+ chars, 1 digit, 1 uppercase, 1 special)
/^(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/

// 10. Slug (URL-friendly string)
/^[a-z0-9]+(?:-[a-z0-9]+)*$/

// 11. JWT token structure
/^[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+$/

// 12. YYYY-MM-DD date in text (not whole string)
/\b\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\b/g

Catastrophic Backtracking — The Performance Killer

Catastrophic backtracking is a condition where a regex engine's backtracking causes exponential time complexity for certain inputs. It is one of the most dangerous regex pitfalls and has caused real-world outages (the Cloudflare outage of 2019 and the ReDoS vulnerability in many npm packages).

It typically occurs when you have nested quantifiers on overlapping patterns:

// ❌ Catastrophic — nested quantifiers on overlapping classes
const bad = /^(a+)+$/;
bad.test('aaaaaaaaaaaaaaaaab'); // hangs the browser tab

// Why: the engine tries every way to partition "aaa...a" before
// concluding it can't match the "b" at the end.

// ✅ Fixed — atomic grouping or possessive quantifiers (not in JS)
// In JS, rewrite to avoid ambiguity:
const good = /^a+$/;

How to identify vulnerable patterns:

Nested quantifiers: (a+)+, (a*)*, (a|a)+
Alternation with overlapping branches inside a quantifier: (foo|fo)+
Polynomial patterns: a?a?a?aaa matching "aaa" — can be exponential with enough repetition

Mitigations: Use possessive quantifiers where your regex engine supports them; set execution timeouts for user-supplied regex; consider using a linear-time regex engine (RE2, Rust's regex crate) for untrusted input.