New syntax

Named capture

XRegExp includes comprehensive support for named capture. Following are the details of XRegExp's named capture syntax:

Notes

Example

const repeatedWords = XRegExp.tag('gi')`\b(?<word>[a-z]+)\s+\k<word>\b`;
// Alternatively: XRegExp('\\b(?<word>[a-z]+)\\s+\\k<word>\\b', 'gi');

// Check for repeated words
repeatedWords.test('The the test data');
// -> true

// Remove any repeated words
const withoutRepeated = XRegExp.replace('The the test data', repeatedWords, '${word}');
// -> 'The test data'

const url = XRegExp(`^(?<scheme> [^:/?]+ ) ://   # aka protocol
                      (?<host>   [^/?]+  )       # domain name/IP
                      (?<path>   [^?]*   ) \\??  # optional path
                      (?<query>  .*      )       # optional query`, 'x');

// Get the URL parts
const parts = XRegExp.exec('https://google.com/path/to/file?q=1', url);
// parts -> ['https://google.com/path/to/file?q=1', 'https', 'google.com', '/path/to/file', 'q=1']
// parts.groups.scheme -> 'https'
// parts.groups.host   -> 'google.com'
// parts.groups.path   -> '/path/to/file'
// parts.groups.query  -> 'q=1'

// Named backreferences are available in replacement functions as properties of the last argument
XRegExp.replace('https://google.com/path/to/file?q=1', url, (match, ...args) => {
  const groups = args.pop();
  return match.replace(groups.host, 'xregexp.com');
});
// -> 'https://xregexp.com/path/to/file?q=1'

Regexes that use named capture work with all native methods. However, you need to use XRegExp.exec and XRegExp.replace for access to named backreferences, otherwise only numbered backreferences are available.

Annotations

Inline comments

Inline comments use the syntax (?#comment). They are an alternative to the line comments allowed in free-spacing mode.

Comments are a do-nothing (rather than ignore-me) metasequence. This distinction is important with something like \1(?#comment)2, which is taken as \1 followed by 2, and not \12. However, quantifiers following comments apply to the preceeding token, so x(?#comment)+ is equivalent to x+.

Example

const regex = XRegExp('^(?#month)\\d{1,2}/(?#day)\\d{1,2}/(?#year)(\\d{2}){1,2}', 'n');
const isDate = regex.test('04/20/2008'); // -> true

// Can still be useful when combined with free-spacing, because inline comments
// don't need to end with \n
const regex = XRegExp('^ \\d{1,2}      (?#month)' +
                      '/ \\d{1,2}      (?#day  )' +
                      '/ (\\d{2}){1,2} (?#year )', 'nx');

Annotations

Leading mode modifier

A mode modifier uses the syntax (?imnsuxA), where imnsuxA is any combination of XRegExp flags except g, y, or d. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.

Example

const regex = XRegExp('(?im)^[a-z]+$');
regex.ignoreCase; // -> true
regex.multiline; // -> true

When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate flags argument. For instance, XRegExp('(?s).+', 's') is valid.

Flags g, y, and d cannot be included in a mode modifier, or an error is thrown. This is because g, y, and d, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. XRegExp methods provide e.g. scope, sticky, and pos arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Additionally, consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags g, y, and d only make sense when applied to the regex as a whole. Allowing g, y, and d in a mode modifier might therefore create future compatibility problems.

The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.

Annotations

Stricter error handling

XRegExp makes any escaped letters or numbers a SyntaxError unless they form a valid and complete metasequence or backreference. This helps to catch errors early, and makes it safe for future versions of ES or XRegExp to introduce new escape sequences. It also means that octal escapes are always an error in XRegExp. ES3/5 do not allow octal escapes, but browsers support them anyway for backward compatibility, which often leads to unintended behavior.

XRegExp requires all backreferences, whether written as \n, \k<n>, or \k<name>, to appear to the right of the opening parenthesis of the group they reference.

XRegExp never allows \n-style backreferences to be followed by literal numbers. To match backreference 1 followed by a literal 2 character, you can use, e.g., (a)\k<1>2, (?x)(a)\1 2, or (a)\1(?#)2.

Unicode

XRegExp supports matching Unicode categories, scripts, and other properties via addon scripts. Such tokens are matched using \p{…}, \P{…}, and \p{^…}. See XRegExp Unicode addons for more details.

XRegExp additionally supports the \u{N…} syntax for matching individual code points. In ES6 this is supported natively, but only when using the u flag. XRegExp supports this syntax for code points 0FFFF even when not using the u flag, and it supports the complete Unicode range 010FFFF when using u.

Replacement text

XRegExp's replacement text syntax is used by the XRegExp.replace function. It adds $0 as a synonym of $& (to refer to the entire match), and adds $<n> and ${n} for backreferences to named and numbered capturing groups (in addition to $1, etc.). When the braces syntax is used for numbered backreferences, it allows numbers with three or more digits (not possible natively) and allows separating a backreference from an immediately-following digit (not always possible natively). XRegExp uses stricter replacement text error handling than native JavaScript, to help you catch errors earlier (e.g., the use of a $ character that isn't part of a valid metasequence causes an error to be thrown).

Following are the special tokens that can be used in XRegExp replacement strings:

XRegExp behavior for $<n> and ${n}:

XRegExp behavior for $n and $nn:

For comparison, following is JavaScript's native behavior for $n and $nn: