Table of contents

Plugins

Unicode

Download the Unicode plugin (all sizes are after minification and gzipping):

This plugin adds support for Unicode categories, scripts, and blocks. It uses the Unicode 5.2 character database, which is the latest version as of 2010-03-13. For more information about how Unicode categories, scripts, and blocks are used in regular expressions, see regexp.info/unicode.html.

Usage examples:

// categories
XRegExp("\\p{Sc}\\p{N}+"); // Sc: currency symbol, N: number

// scripts
XRegExp("\\p{Cyrillic}");
XRegExp("[^\\p{Latin}\\p{Common}]");

// blocks (use "In" prefix)
XRegExp("\\p{InLatin Extended-A}");
XRegExp("\\P{InPrivate Use Area}"); // uppercase \P for negation
XRegExp("\\p{^InMongolian}"); // alternative negation syntax

In accordance with the Unicode standard, letter case, spaces, hyphens, and underscores are ignored when comparing Unicode token names. Hence, \p{InLatinExtendedA} is equivalent to \p{InLatin_Extended-A} and \p{in latin extended a}.

All categories, scripts, and blocks can be inverted by using an uppercase P. For example, \P{N} matches any code point that is not in the Number category, and \P{Arabic} matches code points that are not in Arabic script.

To activate this plugin, simply load it after loading XRegExp 1.0 or later.

<script src="xregexp.js"></script>
<script src="xregexp-unicode-base.js"></script>
<script>
	var unicodeWord = XRegExp("^\\p{L}+$");

	unicodeWord.test("Русский"); // true
	unicodeWord.test("日本語"); // true
	unicodeWord.test("العربية"); // true
</script>

<!-- \p{L} is included in the base script, but other categories, scripts,
and blocks require token packages -->
<script src="xregexp-unicode-scripts.js"></script>
<script>
	XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
</script>

Note that the use of negated (uppercase \P) Unicode tokens within character classes is not currently supported (an error is thrown)—i.e., \p{L}, \P{L}, [\p{L}], and [^\p{L}] are all OK, but not [\P{L}] or [^\P{L}]. However, you can emulate this usage, if necessary, via alternation and/or lookahead, as shown in the following table:

Instead of: Use:
[\P{N}]+\P{N}+
[\P{N}a-z]+(?:\P{N}|[a-z])+
[^\P{N}a-z]+(?:(?!\P{N})[^a-z])+

Match Recursive

Download the Match Recursive plugin (0.8 KB when minified and gzipped).

Adds the following function to the XRegExp namespace:

XRegExp.matchRecursive(string, left, right, [flags], [options])

Accepts a string to search, left and right format delimiters as regex pattern strings, optional regex flags, and optional extended options. Returns an array of matches, allowing nested instances of the left and right delimiters. Use the g flag to return all matches, otherwise only the first is returned.

Parameters:
  • string : String
    The string to search.
  • left : String
    The left delimiter as a regex pattern.
  • right : String
    The right delimiter as a regex pattern.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s, x, and y.
  • options : Object [optional]
    • valueNames : Array [optional]
      Changes the return format from an array of matches to a two-dimensional array of identified string parts with extended position data. Expected to be either undefined or an array with four values to be used for identifying elements in the returned array. The four element types are the text between matches and at the beginning and end of the string, left delimiter, matched text, and right delimiter. If any of the four values are set to null, all instances of that element type are omitted from the returned array. See the example code below for more information.
    • escapeChar : String [optional]
      A single-character string to be used as an escape character. Instances of the left and right delimiters escaped with this character are ignored both inside and outside of non-escaped delimiters.
      WARNING: The escapeChar option is considered experimental and might be changed or removed in future versions.
Returns:
  • Array
    The return format is determined by the valueNames option.
    • If valueNames is undefined:
      An array containing the text within each outermost delimiter pair. E.g., ["one", "two", "three"]. If there are no matches, an empty array is returned.
    • If valueNames is an array with four values:
      A two-dimensional array of identified string parts with extended position data. E.g., [["text","...",0,3], ["left","(",3,4], ["match","a(b())c",4,11], ["right",")",11,12]]. Empty "text" segments (the text before, after, and between matches) are omitted from the results.

Example code

var input = "(t((e))s)t()(ing)";
var output = XRegExp.matchRecursive(input, "\\(", "\\)");
// -> ["t((e))s"]

// Global match
output = XRegExp.matchRecursive(input, "\\(", "\\)", "g");
// -> ["t((e))s", "", "ing"]

// Unbalanced delimiter on the left or right
output = XRegExp.matchRecursive("<<t>est", "<", ">", "g");
output = XRegExp.matchRecursive("<t>>est", "<", ">", "g");
// **both lines throw an error**

// Ignoring escaped delimiters
input = "t\\{e\\\\{s{t\\{i}ng}";
output = XRegExp.matchRecursive(input, "{", "}", "g", {escapeChar: "\\"});
// -> ["s{t\\{i}ng"]

// Extended information mode with valueNames
input = "HTML: <div id='x'>A <div>nested <div /></div> element.</div>";
// The left delimiter is designed to skip self-closed <div /> elements
output = XRegExp.matchRecursive(input, "<div\\b(?:[^>](?!/>))*>", "</div>", "i",
                                {valueNames: ["text", "left", "match", "right"]});
/* ->
[ ["text", "HTML: ", 0, 6],
  ["left", "<div id='x'>", 6, 18],
  ["match", "A <div>nested <div /></div> element.", 18, 54],
  ["right", "</div>", 54, 60] ]
*/

// Omitting unneeded parts with null valueNames
input = "...{1}..{function(a,b){return a+b;}}";
output = XRegExp.matchRecursive(input, "{", "}", "g",
                                {valueNames: ["literal", null, "value", null]});
/* ->
[ ["literal", "...", 0, 3],
  ["value", "1", 4, 5],
  ["literal", "..", 6, 8],
  ["value", "function(a,b){return a+b;}", 9, 35] ]
*/

/* The matchRecursive function specifically supports the y flag (sticky mode).
This mode requires the first match to appear at the beginning of the string,
with each subsequent match immediately following the last. (Outside of the
matchRecursive function, the y flag requires native browser support.) */
input = "<1><2><3>4<5>";
output = XRegExp.matchRecursive(input, "<", ">", "gy");
// -> ["1", "2", "3"]