Table of contents

Unicode

This plugin adds support for Unicode properties and blocks. It follows the Unicode 5.1 character database, which is the latest finalized version as of 2009-06-24.

The Unicode plugin enables the following Unicode properties/categories in any XRegExp:

It also enables all 136 blocks that the code points U+0000 through U+FFFF are divided into. Unicode blocks use the prefix "In", following Perl and Java (.NET uses "Is"). Following are the supported blocks in alphabetical order:

In accordance with the Unicode standard, casing, spaces, hyphens, and underscores are ignored when comparing block names. Hence, \p{InLatinExtendedA} is equivalent to \p{InLatin Extended-A} and \p{in latin extended a}.

All properties and blocks can be inverted by using an uppercase P. For example, \P{N} matches any code point that is not in the Number category. \P{InArabic} matches code points that are not in the Arabic block.

IMPORTANT: The use of Unicode properties and blocks within character classes is not currently supported—though it is planned for a future version. In the meantime, you can emulate this usage via alternation and/or lookahead, as shown in the following table:

Instead of: Use:
[\p{N}]\p{N}
[\p{N}a-z~](?:\p{N}|[a-z~])
[\p{N}\P{Z}](?:\p{N}|\P{Z})
[\p{N}\P{Z}a-z~](?:\p{N}|\P{Z}|[a-z~])
[^\p{N}]\P{N}
[^\p{N}a-z~](?:(?!\p{N})[^a-z~])
[^\p{N}\P{Z}](?:(?!\p{N}|\P{Z})[\s\S])
[^\p{N}\P{Z}a-z~](?:(?!\p{N}|\P{Z})[^a-z~])

Additionally, Unicode subcategories like \p{Sc} (currency symbol) and scripts like \p{Latin} are not currently supported. For more information about the use of Unicode in regular expressions, see regexp.info/unicode.html.

To activate this plugin, simply load it after loading XRegExp 1.0 or later.

<script src="xregexp.js"></script>
<script src="xregexp-unicode.js"></script>
<script>
	var unicodeWord = XRegExp("^\\p{L}+$");
	alert(unicodeWord.test("Ниндзя")); // -> true
</script>

Download the Unicode plugin (5.3 KB when minified and gzipped).

If you are still using XRegExp 0.6.x, you need an older version: Unicode plugin 0.2.1 for XRegExp 0.6.x.

Match Recursive

Adds the following function to the XRegExp namespace:

XRegExp.matchRecursive(string, left, right, [flags], [options])

Accepts a string to search, left and right format delimiters as regex pattern strings, optional regex flags, and optional extended options. Returns an array of matches, allowing nested instances of the left and right delimiters. Use the g flag to return all matches, otherwise only the first is returned.

Parameters:
  • string : String
    The string to search.
  • left : String
    The left delimiter as a regex pattern.
  • right : String
    The right delimiter as a regex pattern.
  • flags : String [optional]
    The regular expression flags; may include non-native flags s, x, and y.
  • options : Object [optional]
    • valueNames : Array [optional]
      Changes the return format from an array of matches to a two-dimensional array of identified string parts with extended position data. Expected to be either undefined or an array with four values to be used for identifying elements in the returned array. The four element types are the text between matches and at the beginning and end of the string, left delimiter, matched text, and right delimiter. If any of the four values are set to null, all instances of that element type are omitted from the returned array. See the example code below for more information.
    • escapeChar : String [optional]
      A single-character string to be used as an escape character. Instances of the left and right delimiters escaped with this character are ignored both inside and outside of non-escaped delimiters.
      WARNING: The escapeChar option is considered experimental and might be changed or removed in future versions.
Returns:
  • Array
    The return format is determined by the valueNames option.
    • If valueNames is undefined:
      An array containing the text within each outermost delimiter pair. E.g., ["one", "two", "three"]. If there are no matches, an empty array is returned.
    • If valueNames is an array with four values:
      A two-dimensional array of identified string parts with extended position data. E.g., [["text","...",0,3], ["left","(",3,4], ["match","a(b())c",4,11], ["right",")",11,12]]. Empty "text" segments (the text before, after, and between matches) are omitted from the results.

Example code

var input = "(t((e))s)t()(ing)";
var output = XRegExp.matchRecursive(input, "\\(", "\\)");
// -> ["t((e))s"]

// Global match
output = XRegExp.matchRecursive(input, "\\(", "\\)", "g");
// -> ["t((e))s", "", "ing"]

// Unbalanced delimiter on the left or right
output = XRegExp.matchRecursive("<<t>est", "<", ">", "g");
output = XRegExp.matchRecursive("<t>>est", "<", ">", "g");
// **both lines throw an error**

// Ignoring escaped delimiters
input = "t\\{e\\\\{s{t\\{i}ng}";
output = XRegExp.matchRecursive(input, "{", "}", "g", {escapeChar: "\\"});
// -> ["s{t\\{i}ng"]

// Extended information mode with valueNames
input = "HTML: <div id='x'>A <div>nested <div /></div> element.</div>";
// The left delimiter is designed to skip self-closed <div /> elements
output = XRegExp.matchRecursive(input, "<div\\b(?:[^>](?!/>))*>", "</div>", "i",
                                {valueNames: ["text", "left", "match", "right"]});
/* ->
[ ["text", "HTML: ", 0, 6],
  ["left", "<div id='x'>", 6, 18],
  ["match", "A <div>nested <div /></div> element.", 18, 54],
  ["right", "</div>", 54, 60] ]
*/

// Omitting unneeded parts with null valueNames
input = "...{1}..{function(a,b){return a+b;}}";
output = XRegExp.matchRecursive(input, "{", "}", "g",
                                {valueNames: ["literal", null, "value", null]});
/* ->
[ ["literal", "...", 0, 3],
  ["value", "1", 4, 5],
  ["literal", "..", 6, 8],
  ["value", "function(a,b){return a+b;}", 9, 35] ]
*/

/* The matchRecursive function specifically supports the y flag (sticky mode).
This mode requires the first match to appear at the beginning of the string,
with each subsequent match immediately following the last. (Outside of the
matchRecursive function, the y flag requires native browser support.) */
input = "<1><2><3>4<5>";
output = XRegExp.matchRecursive(input, "<", ">", "gy");
// -> ["1", "2", "3"]

Download the Match Recursive plugin (0.8 KB when minified and gzipped).