XRegExp includes comprehensive support for named capture. Following are the details of XRegExp's named capture syntax:
(?<name>…)
\k<name>
$<name>
result.groups.name
SyntaxError
RegExpIdentifierName
).const repeatedWords = XRegExp.tag('gi')`\b(?<word>[a-z]+)\s+\k<word>\b`; // Alternatively: XRegExp('\\b(?<word>[a-z]+)\\s+\\k<word>\\b', 'gi'); // Check for repeated words repeatedWords.test('The the test data'); // -> true // Remove any repeated words const withoutRepeated = XRegExp.replace('The the test data', repeatedWords, '${word}'); // -> 'The test data' const url = XRegExp(`^(?<scheme> [^:/?]+ ) :// # aka protocol (?<host> [^/?]+ ) # domain name/IP (?<path> [^?]* ) \\?? # optional path (?<query> .* ) # optional query`, 'x'); // Get the URL parts const parts = XRegExp.exec('https://google.com/path/to/file?q=1', url); // parts -> ['https://google.com/path/to/file?q=1', 'https', 'google.com', '/path/to/file', 'q=1'] // parts.groups.scheme -> 'https' // parts.groups.host -> 'google.com' // parts.groups.path -> '/path/to/file' // parts.groups.query -> 'q=1' // Named backreferences are available in replacement functions as properties of the last argument XRegExp.replace('https://google.com/path/to/file?q=1', url, (match, ...args) => { const groups = args.pop(); return match.replace(groups.host, 'xregexp.com'); }); // -> 'https://xregexp.com/path/to/file?q=1'
Regexes that use named capture work with all native methods. However, you need to use XRegExp.exec
and XRegExp.replace
for access to named backreferences, otherwise only numbered backreferences are available.
SyntaxError
.lastMatch
property of the global RegExp
object or the RegExp.prototype.compile
method, since those features were deprecated in JavaScript 1.5.Inline comments use the syntax (?#comment)
. They are an alternative to the line comments allowed in free-spacing mode.
Comments are a do-nothing (rather than ignore-me) metasequence. This distinction is important with something like \1(?#comment)2
, which is taken as \1
followed by 2
, and not \12
. However, quantifiers following comments apply to the preceeding token, so x(?#comment)+
is equivalent to x+
.
const regex = XRegExp('^(?#month)\\d{1,2}/(?#day)\\d{1,2}/(?#year)(\\d{2}){1,2}', 'n'); const isDate = regex.test('04/20/2008'); // -> true // Can still be useful when combined with free-spacing, because inline comments // don't need to end with \n const regex = XRegExp('^ \\d{1,2} (?#month)' + '/ \\d{1,2} (?#day )' + '/ (\\d{2}){1,2} (?#year )', 'nx');
A mode modifier uses the syntax (?imnsuxA)
, where imnsuxA
is any combination of XRegExp flags except g
, y
, or d
. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.
const regex = XRegExp('(?im)^[a-z]+$'); regex.ignoreCase; // -> true regex.multiline; // -> true
When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate flags
argument. For instance, XRegExp('(?s).+', 's')
is valid.
Flags g
, y
, and d
cannot be included in a mode modifier, or an error is thrown. This is because g
, y
, and d
, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. XRegExp methods provide e.g. scope
, sticky
, and pos
arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Additionally, consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags g
, y
, and d
only make sense when applied to the regex as a whole. Allowing g
, y
, and d
in a mode modifier might therefore create future compatibility problems.
The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.
(?-i)
, simultaneously setting and unsetting flags via (?i-m)
, and enabling flags for subpatterns only via (?i:…)
. XRegExp does not support these extended options.XRegExp makes any escaped letters or numbers a SyntaxError
unless they form a valid and complete metasequence or backreference. This helps to catch errors early, and makes it safe for future versions of ES or XRegExp to introduce new escape sequences. It also means that octal escapes are always an error in XRegExp. ES3/5 do not allow octal escapes, but browsers support them anyway for backward compatibility, which often leads to unintended behavior.
XRegExp requires all backreferences, whether written as \n
, \k<n>
, or \k<name>
, to appear to the right of the opening parenthesis of the group they reference.
XRegExp never allows \n
-style backreferences to be followed by literal numbers. To match backreference 1 followed by a literal 2
character, you can use, e.g., (a)\k<1>2
, (?x)(a)\1 2
, or (a)\1(?#)2
.
XRegExp supports matching Unicode categories, scripts, and other properties via addon scripts. Such tokens are matched using \p{…}
, \P{…}
, and \p{^…}
. See XRegExp Unicode addons for more details.
XRegExp additionally supports the \u{N…}
syntax for matching individual code points. In ES6 this is supported natively, but only when using the u
flag. XRegExp supports this syntax for code points 0
–FFFF
even when not using the u
flag, and it supports the complete Unicode range 0
–10FFFF
when using u
.
XRegExp's replacement text syntax is used by the XRegExp.replace
function. It adds $0
as a synonym of $&
(to refer to the entire match), and adds $<n>
and ${n}
for backreferences to named and numbered capturing groups (in addition to $1
, etc.). When the braces syntax is used for numbered backreferences, it allows numbers with three or more digits (not possible natively) and allows separating a backreference from an immediately-following digit (not always possible natively). XRegExp uses stricter replacement text error handling than native JavaScript, to help you catch errors earlier (e.g., the use of a $
character that isn't part of a valid metasequence causes an error to be thrown).
Following are the special tokens that can be used in XRegExp replacement strings:
$$
- Inserts a literal $
character.$&
, $0
- Inserts the matched substring.$`
- Inserts the string that precedes the matched substring (left context).$'
- Inserts the string that follows the matched substring (right context).$n
, $nn
- Where n/nn are digits referencing an existing capturing group, inserts
backreference n/nn.$<n>
, ${n}
- Where n is a name or any number of digits that reference an existent capturing
group, inserts backreference n.XRegExp behavior for $<n>
and ${n}
:
n
is an integer. Use 0
for the entire match. Any number of leading zeros may be used.n
, if it exists. Does not overlap with numbered capture since XRegExp does not allow named capture to use a bare integer as the name.XRegExp behavior for $n
and $nn
:
${…}
for more digits.$1
is an error if there are no capturing groups.$10
is an error if there are less than 10 capturing groups. Use ${1}0
instead.$01
is equivalent to $1
if a capturing group exists, otherwise it's an error.$0
(not followed by 1-9) and $00
are the entire match.For comparison, following is JavaScript's native behavior for $n
and $nn
:
$1
is a literal $1
if there are no capturing groups.$10
is $1
followed by a literal 0
if there are less than 10 capturing groups.$01
is equivalent to $1
if a capturing group exists, otherwise it's a literal $01
.$0
is a literal $0
.