XRegExp(pattern, [flags])

Accepts a pattern and flags; returns a new, extended RegExp object. Differs from a native regular expression in that additional syntax and flags are supported and cross-browser regex syntax inconsistencies are ameliorated.

Parameters:
  • pattern : String or RegExp
    The regular expression pattern String, or an existing RegExp object to copy.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s and x. Flags cannot be provided when constructing one RegExp from another.
Returns:
  • RegExp
    An extended regular expression object.

Example code

var regex = XRegExp("(?<month> [0-9]+ ) [-/.\\s]  # month\n\
                     (?<day>   [0-9]+ ) [-/.\\s]  # day  \n\
                     (?<year>  [0-9]+ )           # year   ", "x");

var input = "04/20/2009";
var output = input.replace(regex, "${year}-${month}-${day}"); // -> "2009-04-20"
var match = regex.exec(input);
match.month; // -> "04"

regex instanceof RegExp; // -> true
regex.constructor == RegExp; // -> true

Regexes, strings, and backslashes

JavaScript string literals (as opposed to, e.g., user input or text extracted via the DOM) use a backslash as an escape character. The string literal "\\" therefore contains a single backslash, and its length property value is 1. However, a backslash is also an escape character in regular expression syntax, where the pattern \\ matches a single backslash. When providing string literals to the RegExp or XRegExp constructor functions, four backslashes are needed to match a single backslash, e.g., XRegExp("\\\\"). Only two of those backslashes are actually passed into the constructor function. The other two are used to escape the backslashes in the string before the function ever sees it.

The same issue is at play with the \\s sequences in the example code just shown. XRegExp is provided with the two characters \s, which it in turn recognizes as the metasequence used to match a whitespace character. \n (used at the end of the first two lines) is another metasequence in JavaScript string literals and inserts actual line feed characters into the string, which terminate the free-spacing mode line comments that start with #. The backslashes at the very end of the first two lines allow the string to continue to the next line, which avoids the need to concatenate multiple strings when extending a string beyond one line.

XRegExp.addToken(pattern, handler, [scope], [trigger])

Provides a means to create custom flags and extend or change the regular expression language accepted by XRegExp. This function is used internally for XRegExp's own syntax extensions and can be used to create XRegExp plugins.

WARNING: This function is intended for users with advanced knowledge of JavaScript's regular expression syntax and behavior. Novice regexers may prefer to stick to plugins or provided examples that take advantage of this method. To disable further changes to XRegExp syntax, run XRegExp.freezeTokens() after loading XRegExp and any plugins.

Parameters:
  • pattern : RegExp or String
    A regular expression that matches the token you are adding. If a non-regex object is provided, it is implicitly converted to a regex by using XRegExp(pattern).
  • handler : Function
    A function that returns a new pattern string (limited to native JavaScript regular expression syntax) to replace the matched pattern within all future regexes built by the XRegExp constructor. handler is invoked with two arguments:
    1. The match array, which contains the matched string, backreferences, and extended properties. The match array is the same as if returned by RegExp.prototype.exec, and may contain additional properties from named capture.
    2. The scope where the match was found, equal to XRegExp.INSIDE_CLASS or XRegExp.OUTSIDE_CLASS.
    In addition to the above arguments, handler has access to persistent properties of the regular expression being built, through this (see below).
  • scope : Number [optional] [default: XRegExp.OUTSIDE_CLASS]
    The scope where the token applies: XRegExp.INSIDE_CLASS or XRegExp.OUTSIDE_CLASS. These flags can be combined using bitwise OR: XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS. This would specify that the token applies everywhere (i.e., both inside and outside of character classes). The default scope is outside of character classes only.
  • trigger : Function [optional]
    A function that returns a boolean indicating whether the token should be applied (e.g., if a flag is set). trigger has access to persistent properties of the regular expression being built, through this (see below). trigger is run each time your token is found during a regex's construction, rather than when the construction process starts. As a result, it can check the value of settings changed by other tokens (e.g., one token may temporarily falsify the condition checked by the trigger of another token). If trigger returns false, the matched pattern segment can be matched by other tokens.
Returns:
  • undefined
    Does not return a value.

The handler and trigger functions have access to special properties (accessed through this) that apply to the regular expression being compiled. Any data stored on the this object persists during the XRegExp construction process. The this.hasFlag(flag) method is always available, and returns a boolean indicating whether the regex has the provided, single-character flag. It can be used, e.g., within the trigger function to add support for new flags that change the interpretation of regex syntax.

Added tokens do not cascade. If more than one token can match the same string, the one added last wins. Thus, you can add a generic token (e.g., \\p{[^}]+}), then follow it with more specific tokens that sometimes override it (e.g., \\p{L}).

Example code

// Many regex flavors support \a for matching the bell control character.
// JavaScript does not, so lets add it.
XRegExp.addToken(
    /\\a/,
    function () {return "\\x07"},
    XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS
);
XRegExp("\\a[\\a-\\n]+").test("\x07\x0A\x07"); // -> true

// Add support for escape sequences: \Q⋯\E and \Q⋯
XRegExp.addToken(
    /\\Q([\s\S]*?)(?:\\E|$)/,
    function (match) {return XRegExp.escape(match[1])},
    XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS
);
XRegExp("^\\Q({?*+})").test("({?*+})"); // -> true

// Add variable interpolation via $<⋯>, activated by using "$" as a flag. This is intended
// merely as a demonstration of adding new flags, and may not be a good idea in practice.
XRegExp.addToken(
    /\$<([^>]+)>/,
    function (match) {return String(eval(match[1]))},
    XRegExp.INSIDE_CLASS | XRegExp.OUTSIDE_CLASS,
    function () {return this.hasFlag("$")}
);
var str = "pattern*string";
XRegExp("$<str> ($<10+5>)+ \\$<str>$", "i$").test("PatternnnString 1515 $<str>"); // -> true
XRegExp("$<str> ($<10+5>)+ \\$<str>$", "i").test("PatternnnString 1515 $<str>"); // -> false
// Note that escaped tokens work correctly

Tokens can be more complex than the examples just shown. See Custom Token Examples for more ideas.

XRegExp.cache(pattern, [flags])

Accepts a pattern and flags; returns an extended RegExp object. If the pattern and flag combination has previously been cached, the cached copy is returned, otherwise the new object is cached.

Parameters:
  • pattern : String
    The regular expression pattern.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s and x.
Returns:
  • RegExp
    The extended regular expression object.

Example code

while (XRegExp.cache(".", "gs").test(subject)) {
	// The regex is only compiled once
	...
}

var regex1 = XRegExp.cache("\\b ex", "gix");
var regex2 = XRegExp.cache("\\b ex", "gix");
// regex1 and regex2 are now references to the same RegExp object

XRegExp.escape(string)

Accepts a string; returns the string with regex metacharacters escaped. The returned string can safely be used at any point within a regex to match the provided literal string. The escaped characters are [, ], {, }, (, ), -, *, +, ?, ., \, ^, $, |, ,, #, and whitespace (see free-spacing for the list of whitespace characters).

Parameters:
  • string : String
    The string to escape.
Returns:
  • String
    The escaped string.

Example code

var str = XRegExp.escape("escaped? [x]");
var regex = XRegExp(str); // RegExp would work identically here

regex.test("escaped? [x]"); // -> true
regex.source == "escaped\\?\\ \\[x\\]"; // -> true

XRegExp.freezeTokens()

Breaks the unrestorable link to XRegExp's private list of tokens, and thereby prevents the addition of new syntax and flags. After freezeTokens runs, attempts to call XRegExp.addToken() will throw an error.

This function should be run after XRegExp and any plugins are loaded.

Parameters:
  • N/A
Returns:
  • undefined
    Does not return a value.

Example code

<script src="xregexp-min.js"></script>
<script>XRegExp.freezeTokens();</script>

XRegExp.isRegExp(object)

Accepts any value; returns a boolean indicating whether the argument is a RegExp object (note that this is also true for regex literals and regexes created using the XRegExp constructor). This function works correctly with variables created in another frame, when instanceof and regex.constructor checks would fail to work as intended.

Parameters:
  • object : *
    The object or value of any type to check for being of type RegExp.
Returns:
  • Boolean
    Whether the object is a RegExp: true or false.

Example code

XRegExp.isRegExp("string"); // -> false
XRegExp.isRegExp(/regex/i); // -> true
XRegExp.isRegExp(RegExp("^", "gm")); // -> true
XRegExp.isRegExp(XRegExp(".", "s")); // -> true

function search (string, searchValue) {
	if (!XRegExp.isRegExp(searchValue)) {
		regex = XRegExp(XRegExp.escape(searchValue));
	}
	...
}

XRegExp.matchWithinChain(string, regexes, [detailMode])

Accepts a string to search, an array of regexes, and a boolean indicating whether an array of strings or an array of match arrays (as if returned by RegExp.prototype.exec) should be returned. The result is generated by using each successive regex to search within the matches of the previous regex only. When returning match arrays instead of strings, each match array's index property is set relative to the entire subject string.

Parameters:
  • string : String
    The string to search.
  • regexes : Array of RegExps
    The list of regexes with which to successively search within the results of the previous regex.
  • detailMode : Boolean [optional] [default: false]
    If false, an array of matched strings is returned; if true, an array of match arrays (which contain the matched strings, backreferences, and special properties including index and named backreferences) is returned.
Returns:
  • Array
    The matches found by the final regex when searching within the matches passed down by any previous regexes. The types of values in the returned array depends on the detailMode parameter. If no matches are found, an empty array is returned.

Example code

XRegExp.matchWithinChain("1 <b>2</b> 3 <b>4 5</b>", [/<b>.*?<\/b>/i, /\d+/]);
// -> ["2", "4", "5"]

var result = XRegExp.matchWithinChain("01 <b>002 3</b>",
                                      [/<b>.*?<\/b>/i, XRegExp("0*(?<num>[1-9]\\d*)")],
                                      true);
/* ->
result: [["002", "2"], ["3", "3"]]
result[0][0]: "002"
result[0].num: "2"
result[0].index: 6
*/

XRegExp.version

Stores the version number of XRegExp in use as a string, e.g., "1.1.0".

RegExp.prototype.addFlags(flags)

Returns a new RegExp object created by recompiling the regex with the additional flags, which may include non-native flags. The original regex object is not altered. See flags for additional details.

Parameters:
  • flags : String
    The regular expression flags to add; may include non-native flags s and x.
Returns:
  • RegExp
    An extended regular expression object.

Example code

function parse (input, regex) {
	var match;
	// regex must be global for the while loop to work correctly
	if (!regex.global) {
		regex = regex.addFlags("g");
	}
	while (match = regex.exec(input)) {
		...
	}
}

// This causes the regex literal's source to be recompiled as a new XRegExp
var regex = /\w.\w/.addFlags("gs");

Regex literals and the addFlags method

XRegExp offers partial support for applying regex extensions to regex literals (e.g., /regex/) through the addFlags method, since any use of addFlags applies XRegExp's extended syntax and can be used to add non-native flags. However, syntax that causes native JavaScript regular expression parsers to throw an error cannot be used within regex literals. For example, named capture is not supported within regex literals because (?<name>⋯) will cause an "invalid quantifier" error to be thrown as soon as (?< is encountered—before the addFlags method is called.

RegExp.prototype.apply(context, args)

Returns the result of calling RegExp.prototype.exec with the first value in the args array. This is intended to allow working generically with both functions and regular expressions.

Parameters:
  • context : Object
    The context is ignored. It is accepted for congruity with Function.prototype.apply.
  • args : Array
    The first value in the args array is passed to the RegExp.prototype.exec method.
Returns:
  • A result Array or null.

Example code

// Return an array with the elements of the existng array for which
// the provided filtering function returns true
Array.prototype.filter = function (fn, context) {
	var results = [];
	for (var i = 0; i < this.length; i++) {
		if (fn.apply(context, [this[i], i, this])) {
			results.push(this[i]);
		}
	}
	return results;
};

var output = ["a", "ba", "ab", "b"].filter(/^a/); // -> ["a", "ab"]

RegExp.prototype.call(context, string)

Returns the result of calling RegExp.prototype.exec with the provided string. This is intended to allow working generically with both functions and regular expressions.

Parameters:
  • context : Object
    The context is ignored. It is accepted for congruity with Function.prototype.call.
  • string : String
    The value to be passed to the RegExp.prototype.exec method.
Returns:
  • A result Array or null.

Example code

function validate (string, validators) {
	for (var i = 0; i < validators.length; i++) {
		if (!validators[i].call(null, string)) {
			return false;
		}
	}
	return true;
}

// Validate that the string contains at least 1 special character and has 8 or more characters.
// The length-checking function could be replaced with the regex /^[\s\S]{8}/.
validate("password!",
         [ /[\W_]/,
           function (string) {return string.length >= 8} ]);
// -> true

RegExp.prototype.forEachExec(string, callback, [context])

Executes the provided callback function once per match within string. This provides a simpler and cleaner way to iterate over regex matches compared to the traditional approaches of subverting String.prototype.replace or repeatedly calling RegExp.prototype.exec within a while loop.

Parameters:
  • string : String
    The string to traverse while searching for matches.
  • callback : Function
    The function to execute for each match. callback is invoked with four arguments:
    1. The match array, which contains the matched string, backreferences, and extended properties. The match array is the same as if returned by RegExp.prototype.exec, and may contain additional properties from named capture.
    2. The index of the match (a zero-based count of how many matches have been found so far).
    3. The string being traversed.
    4. The regex object being used to traverse the string. In fact, this is a persistent copy of the regex with the g flag added. This regex object may be altered by the callback function (e.g., by manipulating its lastIndex property or adding custom properties) without affecting the original regex object.
  • context : Object [optional]
    Object to use as this when executing callback.
Returns:
  • undefined
    Does not return a value.

Searches always start at the beginning of the string and continue until the end, regardless of the state of the regex's global property and initial lastIndex.

Example code

// populate an array with match objects
var matches = [];
regex.forEachExec(str, function (match) {
	matches.push(match);
});

// extract every other digit from a string
matches = [];
/\d/.forEachExec("1a2345", function (match, i) {
	if (i % 2) matches.push(+match[0]);
});
// matches -> [2, 4]

// count the occurences of each word in a string, providing an object as the context argument
var words = {};
XRegExp("(?<word>\\w+)").forEachExec("Run Forrest, run.", function (match) {
	var word = match.word.toLowerCase();
	if (!this[word]) this[word] = 0;
	this[word]++;
}, words);
// words -> {run: 2, forrest: 1}

RegExp.prototype.validate(string)

Returns true only if the entire string is matched by the regex—i.e., a match is found at the beginning of the string, and that match extends to the end of the string—otherwise false is returned. When using the m flag (which causes ^ and $ metacharacters to match at the beginning and end of each line), this may differ from the result of calling RegExp.prototype.test on a regular expression that starts with ^ and ends with $, since the validate method's result is not altered by the m flag (although the regex's matches may be). Also note that this method is smart enough to avoid false negatives when the regex is capable of matching both a part of and all of the string.

Parameters:
  • string : String
    The string to validate.
Returns:
  • Boolean
    Whether the regular expression pattern matches the string entirely: true or false.

Example code

var regex = /\d+/;

regex.validate("123"); // -> true
regex.validate("123.45"); // -> false
regex.test("123.45"); // -> true