New syntax

Named capture

XRegExp includes comprehensive support for named capture. Capture names can use the characters A–Z, a–z, 0–9, _, and $ only. Following are the details of XRegExp's named capture syntax:

You can compare these details to named capture in other regex flavors.

Example

var regex = XRegExp('\\b(?<word>[a-z]+)\\s+\\k<word>\\b', 'gi'),
  str = 'The the test data.',
  parts;

// Check for repeated words
regex.test(str);
// -> true

// Remove any repeated words
str = XRegExp.replace(str, regex, '${word}');
// -> 'The test data.'

regex = XRegExp('^(?<scheme> [^:/?]+ ) ://   # aka protocol   \n\
                  (?<host>   [^/?]+  )       # domain name/IP \n\
                  (?<path>   [^?]*   ) \\??  # optional path  \n\
                  (?<query>  .*      )       # optional query', 'x');
str = 'http://google.com/path/to/file?q=1';

// Get the URL parts
parts = XRegExp.exec(str, regex);
// parts -> ['http://google.com/path/to/file?q=1', 'http', 'google.com', '/path/to/file', 'q=1']
// parts.scheme -> 'http'
// parts.host   -> 'google.com'
// parts.path   -> '/path/to/file'
// parts.query  -> 'q=1'

// Named backreferences are available in replacement functions as properties of the first argument
str = XRegExp.replace(str, regex, function (match) {
  return match.replace(match.host, 'xregexp.com');
});
// -> 'http://xregexp.com/path/to/file?q=1'

Regexes that use named capture work with all native methods. However, you need to use XRegExp.exec and XRegExp.replace for access to named backreferences, otherwise only numbered backreferences are available. If you run XRegExp.install('natives') first, named backreferences are always available, even when using the native RegExp.prototype.exec, String.prototype.match, and String.prototype.replace methods.

Annotations

Inline comments

Inline comments use the syntax (?#comment). They are an alternative to the line comments allowed in free-spacing mode.

Comments are a do-nothing (rather than ignore-me) metasequence. This distinction is important with something like \1(?#comment)2, which is taken as \1 followed by 2, and not \12. However, quantifiers following comments apply to the preceeding token, so x(?#comment)+ is equivalent to x+.

Example

var regex = XRegExp('^(?#month)\\d{1,2}/(?#day)\\d{1,2}/(?#year)(\\d{2}){1,2}', 'n');
var isDate = regex.test('04/20/2008'); // -> true

// Can still be useful when combined with free-spacing, because inline comments
// don't need to end with \n
var regex = XRegExp('^ \\d{1,2}      (?#month)' +
                    '/ \\d{1,2}      (?#day  )' +
                    '/ (\\d{2}){1,2} (?#year )', 'nx');

Annotations

Leading mode modifier

A mode modifier uses the syntax (?imnsx), where imnsx is any combination of XRegExp flags except g or y. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.

Example

var regex = XRegExp('(?im)^[a-z]+$');
regex.ignoreCase; // -> true
regex.multiline; // -> true

When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate flags argument. For instance, XRegExp('(?s).+', 's') is perfectly valid.

Flags g and y cannot be included in a mode modifier, or an error is thrown. This is because g and y, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. In fact, XRegExp methods provide e.g. scope, sticky, and pos arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Also consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags g and y only make sense when applied to the regex as a whole. Allowing g and y in a mode modifier might therefore create future compatibility problems.

The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.

Annotations

Stricter error handling

XRegExp makes any escaped letters or numbers a SyntaxError unless they form a valid and complete metasequence or backreference. This helps to catch errors early, and makes it safe for future versions of ES or XRegExp to introduce new escape sequences. It also means that octal escapes are always an error in XRegExp. ES3/5 do not allow octal escapes, but browsers support them anyway for backward compatibility, which often leads to unintended behavior.

XRegExp requires all backreferences, whether written as \n, \k<n>, or \k<name>, to appear to the right of the opening parenthesis of the group they reference.

XRegExp never allows \n-style backreferences to be followed by literal numbers. To match backreference one followed by a literal 2 character, you can use, e.g., (a)\k<1>2, (?x)(a)\1 2, or (a)\1(?#)2.

Unicode properties

XRegExp supports matching Unicode categories, scripts, blocks, and other properties via addon scripts. Such tokens are matched using \p{…}, \P{…}, and \p{^…}. See XRegExp Unicode addons for more details.

Replacement text

XRegExp adds $0 as a synonym of $& (to refer to the entire match), and adds ${n} for backreferences to named and numbered capturing groups (in addition to $1, etc.). When the curly brace syntax is used for numbered backreferences, it allows numbers with three or more digits (not possible natively) and allows separating a backreference from an immediately-following digit (not always possible natively). XRegExp uses stricter replacement text error handling than native JavaScript, to help you catch errors earlier (e.g., the use of a $ character that isn't part of a valid metasequence causes an error to be thrown).

XRegExp's replacement text syntax is used by the XRegExp.replace function. It can also be used by String.prototype.replace if you first run XRegExp.install('natives').

Following are the special tokens that can be used in XRegExp replacement strings:

XRegExp behavior for ${n}:

XRegExp behavior for $n and $nn:

For comparison, following is JavaScript's native behavior for $n and $nn: