Table of contents

API

XRegExp(pattern, [flags])

Accepts a pattern and flags; returns a new, extended RegExp object. Differs from a native regular expression in that additional syntax and flags are supported and cross-browser regex syntax inconsistencies are ameliorated.

Parameters:
  • pattern : String or RegExp
    The regular expression pattern String, or an existing RegExp object to copy.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s and x. Flags cannot be provided when constructing one RegExp from another.
Returns:
  • RegExp
    An extended regular expression object.

Example code

var regex = XRegExp("(?<month> [0-9]+ ) [-/.\\s]  # month\n\
                     (?<day>   [0-9]+ ) [-/.\\s]  # day  \n\
                     (?<year>  [0-9]+ )           # year   ", "x");

var input = "04/20/2009";
input.replace(regex, "${year}-${month}-${day}"); // "2009-04-20"

var match = regex.exec(input);
match.month; // "04"

regex instanceof RegExp; // true
regex.constructor == RegExp; // true

For more information about named capture, see New syntax: Named capture; and see New flags: Free-spacing and line comments for details about the x flag.

Regexes, strings, and backslashes

JavaScript string literals (as opposed to, e.g., user input or text extracted from the DOM) use a backslash as an escape character. The string literal "\\" therefore contains a single backslash, and its length property value is 1. However, a backslash is also an escape character in regular expression syntax, where the pattern \\ matches a single backslash. When providing string literals to the RegExp or XRegExp constructor functions, four backslashes are therefore needed to match a single backslash, e.g., XRegExp("\\\\"). Only two of those backslashes are actually passed into the constructor function. The other two are used to escape the backslashes in the string before the function ever sees the string.

The same issue is at play with the \\s sequences in the example code just shown. XRegExp is provided with the two characters \s, which it in turn recognizes as the metasequence used to match a whitespace character. \n (used at the end of the first two lines) is another metasequence in JavaScript string literals and inserts actual line feed characters into the string, which terminate the free-spacing mode line comments that start with #. The backslashes at the very end of the first two lines allow the string to continue to the next line, which avoids the need to concatenate multiple strings when extending a string beyond one line of code.

XRegExp.addToken(regex, handler, [scope], [trigger])

Provides a means to create custom flags and extend or change the regular expression language accepted by XRegExp. This function is used internally for XRegExp's own syntax extensions and can be used to create XRegExp plugins.

This function is intended for users with advanced knowledge of JavaScript's regular expression syntax and behavior. Beginning regexers may want to stick to plugins or provided examples that take advantage of this method. To disable further changes to XRegExp syntax, run XRegExp.freezeTokens() after loading XRegExp and any plugins.

Parameters:
  • regex : RegExp
    A regular expression object that matches the token you are adding.
  • handler : Function
    A function that returns a new pattern string (limited to native JavaScript regular expression syntax) to replace the matched pattern within all future regexes built by the XRegExp constructor. handler is invoked with two arguments:
    1. The match array, which contains the matched string, backreferences, and extended properties. The match array is the same as returned by RegExp.prototype.exec, and may contain additional properties from named capture.
    2. The scope where the match was found, equal to XRegExp.INSIDE_CLASS or XRegExp.OUTSIDE_CLASS.
    In addition to the above arguments, handler has access to persistent properties of the regular expression being built, through this (see below).
  • scope : Number [optional] [default: XRegExp.OUTSIDE_CLASS]
    The scope where the token applies: XRegExp.INSIDE_CLASS or XRegExp.OUTSIDE_CLASS. These flags can be combined using bitwise OR: XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS. This would specify that the token applies everywhere (i.e., both inside and outside of character classes). The default scope is outside of character classes only.
  • trigger : Function [optional]
    A function that returns a boolean indicating whether the token should be applied (e.g., if a flag is set). trigger has access to persistent properties of the regular expression being built, through this (see below). trigger is run each time your token is found during a regex's construction, rather than when the construction process starts. As a result, it can check the value of settings changed by other tokens (e.g., one token may temporarily falsify the condition checked by the trigger of another token). If trigger returns false, the matched pattern segment can be matched by other tokens.
Returns:
  • undefined
    Does not return a value.

The handler and trigger functions have access to special properties (accessed through this) that apply to the regular expression being compiled. Any data stored on the this object persists during the XRegExp construction process. The this.hasFlag(flag) method is always available, and returns a boolean indicating whether the regex has the provided, single-character flag. It can be used, e.g., within the trigger function to add support for new flags that change the interpretation of regex syntax.

Added tokens do not cascade. If more than one token can match the same string, the one added last wins. Thus, you can add a generic token (e.g., \\p{[^}]+}), then follow it with more specific tokens that sometimes override it (e.g., \\p{L}).

Example code

// Many regex flavors support \a for matching the bell control character.
// JavaScript does not, so lets add it
XRegExp.addToken(
    /\\a/,
    function () {return "\\x07"},
    XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS
);
XRegExp("\\a[\\a-\\n]+").test("\x07\x0A\x07"); // true

// Add support for escape sequences: \Q⋯\E and \Q⋯
XRegExp.addToken(
    /\\Q([\s\S]*?)(?:\\E|$)/,
    function (match) {return XRegExp.escape(match[1])},
    XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS
);
XRegExp("^\\Q({?*+})").test("({?*+})"); // true

// Add the U (ungreedy) flag from PCRE and RE2, which reverses greedy and lazy quantifiers
XRegExp.addToken(
    /([?*+]|{\d+(?:,\d*)?})(\??)/,
    function (match) {return match[1] + (match[2] ? "" : "?")},
    XRegExp.OUTSIDE_CLASS,
    function () {return this.hasFlag("U")}
);
XRegExp("a+").exec("aaa")[0]; // "aaa"
XRegExp("a+?").exec("aaa")[0]; // "a"
XRegExp("a+", "U").exec("aaa")[0]; // "a"
XRegExp("a+?", "U").exec("aaa")[0]; // "aaa"

// Add support for POSIX character classes (e.g., [[:alpha:]])
XRegExp.addToken(
    /\[:([a-z\d]+):]/i,
    function () {
        var posix = {
            alnum:  "A-Za-z0-9",
            alpha:  "A-Za-z",
            ascii:  "\\0-\\x7F",
            blank:  " \\t",
            cntrl:  "\\0-\\x1F\\x7F",
            digit:  "0-9",
            graph:  "\\x21-\\x7E",
            lower:  "a-z",
            print:  "\\x20-\\x7E",
            punct:  "!\"#$%&'()*+,\\-./:;<=>?@[\\\\\\]^_`{|}~",
            space:  " \\t\\r\\n\\v\\f",
            upper:  "A-Z",
            word:   "A-Za-z0-9_",
            xdigit: "A-Fa-f0-9"
        };
        return function (match) {
            if (!posix[match[1]])
                throw SyntaxError(match[1] + " is not a valid POSIX character class");
            return posix[match[1]];
        };
    }(),
    XRegExp.INSIDE_CLASS
);
XRegExp("^[[:xdigit:][:space:]]+$").test("00A9 1B7F"); // true

Tokens can be more complex than the examples just shown. See custom token examples for more ideas.

XRegExp.cache(pattern, [flags])

Accepts a pattern and flags; returns an extended RegExp object. If the pattern and flag combination has previously been cached, the cached copy is returned, otherwise the new object is cached.

Parameters:
  • pattern : String
    The regular expression pattern.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s and x.
Returns:
  • RegExp
    The extended regular expression object.

Example code

while (XRegExp.cache(".", "gs").test(subject)) {
	// The regex is only compiled once
	...
}

var regex1 = XRegExp.cache("\\b ex", "gix");
var regex2 = XRegExp.cache("\\b ex", "gix");
// regex1 and regex2 are now references to the same RegExp object

XRegExp.copyAsGlobal(regex)

Accepts a RegExp instance; returns a copy with the g (global) flag set. This allows the regex to be used in while loops (for match iteration), etc. The copy has a fresh lastIndex (set to zero).

Note that if you want to copy a regex without forcing the global property, you can use XRegExp(regex).

Parameters:
  • regex : RegExp
    The regular expression object to copy.
Returns:
  • RegExp
    A new regular expression object with the global property set to true.

Example code

function parse (str, regex) {
	var match;
	regex = XRegExp.copyAsGlobal(regex);
	while (match = regex.exec(str)) {
		...
	}
}

XRegExp.escape(string)

Accepts a string; returns the string with regex metacharacters escaped. The returned string can safely be used at any point within a regex to match the provided literal string. The escaped characters are [, ], {, }, (, ), -, *, +, ?, ., \, ^, $, |, ,, #, and whitespace (see free-spacing for the list of whitespace characters).

Parameters:
  • string : String
    The string to escape.
Returns:
  • String
    The escaped string.

Example code

var str = XRegExp.escape("escaped? [x]");
var regex = XRegExp(str); // RegExp would work identically here

regex.test("escaped? [x]"); // true
regex.source == "escaped\\?\\ \\[x\\]"; // true

XRegExp.execAt(string, regex, [pos], [anchored])

Accepts a string to search, regex to search with, optional position to start the search within the string, and an optional boolean indicating whether matches must start at-or-after the position or at the specified position only. This function ignores the lastIndex property of the provided regex.

Parameters:
  • string : String
    The string to search.
  • regex : RegExp
    The regular expression to search with.
  • pos : Number [optional] [default: 0]
    The position to start the search within the string.
  • anchored : Boolean [optional] [default: false]
    Indicates whether matches can occur at or after pos (the default) or must start at pos.
Returns:
  • A result Array or null.
    The result is the same as from RegExp.prototype.exec. Result arrays contain the matched text at key zero, and any backreferences at subsequent keys. Result arrays also contain special index and input properties, and may contain additional properties from named capture. If a match is not found, null is returned.

Example code

var str = "Result: 25.";
XRegExp.execAt(str, /\d+/); // returns ["25"] with the index property set to 8
XRegExp.execAt(str, /\d+/, 0); // returns ["25"] with the index property set to 8
XRegExp.execAt(str, /\d+/, 10); // returns null
XRegExp.execAt(str, /\d+/, 0, true); // returns null
XRegExp.execAt(str, /\d+/, 8, true); // returns ["25"] with the index property set to 8

XRegExp.freezeTokens()

Breaks the unrestorable link to XRegExp's private list of tokens, and thereby prevents the addition of new syntax and flags. After freezeTokens runs, attempts to call XRegExp.addToken() will throw an error.

This function should be run after XRegExp and any plugins are loaded.

Parameters:
  • N/A
Returns:
  • undefined
    Does not return a value.

Example code

<script src="xregexp-min.js"></script>
<script>XRegExp.freezeTokens();</script>

XRegExp.isRegExp(object)

Accepts any value; returns a boolean indicating whether the argument is a RegExp object (note that this is also true for regex literals and regexes created using the XRegExp constructor). This function works correctly with variables created in another frame, when instanceof and regex.constructor checks would fail to work as intended.

Parameters:
  • object : *
    The object or value of any type to check for being of type RegExp.
Returns:
  • Boolean
    Whether the object is a RegExp: true or false.

Example code

XRegExp.isRegExp("string"); // false
XRegExp.isRegExp(/regex/i); // true
XRegExp.isRegExp(RegExp("^", "gm")); // true
XRegExp.isRegExp(XRegExp(".", "s")); // true

function search (string, searchValue) {
	if (!XRegExp.isRegExp(searchValue)) {
		regex = XRegExp(XRegExp.escape(searchValue));
	}
	...
}

XRegExp.iterate(string, regex, callback, [context])

Executes the provided callback function once per match of regex within string. This provides a simpler and cleaner way to iterate over regex matches compared to the traditional approaches of subverting String.prototype.replace or repeatedly calling RegExp.prototype.exec within a while loop.

Parameters:
  • string : String
    The string to traverse while searching for matches.
  • regex : RegExp
    The pattern to search for.
  • callback : Function
    The function to execute for each match. callback is invoked with four arguments:
    1. The match array, which contains the matched string, backreferences, and extended properties. The match array is the same as returned by RegExp.prototype.exec, and may contain additional properties from named capture.
    2. The index of the match (a zero-based count of how many matches have been found so far).
    3. The string being traversed.
    4. The regex object being used to traverse the string. In fact, this is a persistent copy of the regex with the g flag added. This regex object may be altered by the callback function (e.g., by manipulating its lastIndex property or adding custom properties) without affecting the original regex object.
  • context : Object [optional]
    Object to use as this when executing callback.
Returns:
  • undefined
    Does not return a value.

Searches always start at the beginning of the string and continue until the end, regardless of the state of the regex's global property and initial lastIndex.

Example code

// populate an array with match objects
var matches = [];
XRegExp.iterate(str, regex, function (match) {
	matches.push(match);
});

// extract every other digit from a string
matches = [];
XRegExp.iterate("1a2345", /\d/, function (match, i) {
	if (i % 2) matches.push(+match[0]);
});
// matches: [2, 4]

// count the occurences of each word in a string, providing an object as the context argument
var words = {};
XRegExp.iterate("Run Forrest, run.", XRegExp("(?<word>\\w+)"), function (match) {
	var word = match.word.toLowerCase();
	if (!this[word]) this[word] = 0;
	this[word]++;
}, words);
// words: {run: 2, forrest: 1}

XRegExp.matchChain(string, regexes)

Accepts a string to search and an array of regexes; returns the result of using each successive regex to search within the matches of the previous regex. The array of regexes can also contain objects with regex and backref properties, in which case the named or numbered backreferences specified are passed forward to the next regex or returned.

Parameters:
  • string : String
    The string to search.
  • regexes : Array of RegExps and/or Objects
    The list of regexes with which to successively search within the results of the previous regex.
Returns:
  • Array
    An array of match strings. If no matches are found, an empty array is returned.

Example code

var str = "1 <b>2</b> 3 <b>4 5</b>";
var result = XRegExp.matchChain(str, [/<b>.*?<\/b>/i, /\d+/]);
// result: ["2", "4", "5"]

str = '<img src="http://x.com/img.png">' +
	'<script src="http://google.com/path/file.ext">' +
	'<img src="http://google.com/path/to/img.jpg?x">' +
	'<img src="http://google.com/img2.gif"/>';
result = XRegExp.matchChain(str, [
	// match <img> tags; pass forward attributes in backref 1
	{regex: /<img\b([^>]+)>/i, backref: 1},
	// match src attributes; pass forward attribute values in backref "src"
	{regex: XRegExp('(?ix) \\s src=" (?<src> [^"]+ )'), backref: "src"},
	// match URLs with host "google.com"; pass forward URL paths in backref 1
	{regex: XRegExp("^https?://google\\.com(/[^#?]+)", "i"), backref: 1},
	// match and pass forward/return filenames (strip directory paths)
	/[^\/]+$/
]);
// result: ["img.jpg", "img2.gif"]

XRegExp.version

Holds the version number of XRegExp as a string containing three integers separated by dots—e.g., "1.0.0".

RegExp.prototype.apply(context, args)

Returns the result of calling RegExp.prototype.exec with the first value in the args array. This is intended to allow working generically with both functions and regular expressions.

Parameters:
  • context : Object
    The context is ignored. It is accepted for congruity with Function.prototype.apply.
  • args : Array
    The first value in the args array is passed to the RegExp.prototype.exec method.
Returns:
  • A result Array or null.

Example code

// Return an array with the elements of the existng array for which
// the provided filtering function returns true
Array.prototype.filter = function (fn, context) {
	var results = [];
	for (var i = 0; i < this.length; i++) {
		if (fn.apply(context, [this[i], i, this])) {
			results.push(this[i]);
		}
	}
	return results;
};

var output = ["a", "ba", "ab", "b"].filter(/^a/); // ["a", "ab"]

RegExp.prototype.call(context, string)

Returns the result of calling RegExp.prototype.exec with the provided string. This is intended to allow working generically with both functions and regular expressions.

Parameters:
  • context : Object
    The context is ignored. It is accepted for congruity with Function.prototype.call.
  • string : String
    The value to be passed to the RegExp.prototype.exec method.
Returns:
  • A result Array or null.

Example code

function validate (str, validators) {
	for (var i = 0; i < validators.length; i++) {
		if (!validators[i].call(null, str)) {
			return false;
		}
	}
	return true;
}

// Validate that the string contains at least 1 special character and has 8 or more characters.
// The length-checking function could be replaced with the regex /^[\s\S]{8}/.
validate("password!",
         [ function (str) {return str.length >= 8},
           /[\W_]/ ]); // true