Download the Unicode plugin (all sizes are after minification and gzipping):
This plugin adds support for Unicode categories, scripts, and blocks. It uses the Unicode 5.2 character database, which is the latest version as of 2010-03-13. For more information about how Unicode categories, scripts, and blocks are used in regular expressions, see regexp.info/unicode.html.
Usage examples:
// categories
XRegExp("\\p{Sc}\\p{N}+"); // Sc: currency symbol, N: number
// scripts
XRegExp("\\p{Cyrillic}");
XRegExp("[^\\p{Latin}\\p{Common}]");
// blocks (use "In" prefix)
XRegExp("\\p{InLatin Extended-A}");
XRegExp("\\P{InPrivate Use Area}"); // uppercase \P for negation
XRegExp("\\p{^InMongolian}"); // alternative negation syntax
In accordance with the Unicode standard, letter case, spaces, hyphens, and underscores are ignored when comparing Unicode token names. Hence, \p{InLatinExtendedA} is equivalent to \p{InLatin_Extended-A} and \p{in latin extended a}.
All categories, scripts, and blocks can be inverted by using an uppercase P. For example, \P{N} matches any code point that is not in the Number category, and \P{Arabic} matches code points that are not in Arabic script.
To activate this plugin, simply load it after loading XRegExp 1.0 or later.
<script src="xregexp.js"></script> <script src="xregexp-unicode-base.js"></script> <script> var unicodeWord = XRegExp("^\\p{L}+$"); unicodeWord.test("Русский"); // true unicodeWord.test("日本語"); // true unicodeWord.test("العربية"); // true </script> <!-- \p{L} is included in the base script, but other categories, scripts, and blocks require token packages --> <script src="xregexp-unicode-scripts.js"></script> <script> XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true </script>
Note that the use of negated (uppercase \P) Unicode tokens within character classes is not currently supported (an error is thrown)—i.e., \p{L}, \P{L}, [\p{L}], and [^\p{L}] are all OK, but not [\P{L}] or [^\P{L}]. However, you can emulate this usage, if necessary, via alternation and/or lookahead, as shown in the following table:
| Instead of: | Use: |
|---|---|
[\P{N}]+ | \P{N}+ |
[\P{N}a-z]+ | (?:\P{N}|[a-z])+ |
[^\P{N}a-z]+ | (?:(?!\P{N})[^a-z])+ |
Download the Match Recursive plugin (0.8 KB when minified and gzipped).
Adds the following function to the XRegExp namespace:
XRegExp.matchRecursive(string, left, right, [flags], [options])Accepts a string to search, left and right format delimiters as regex pattern strings, optional regex flags, and optional extended options. Returns an array of matches, allowing nested instances of the left and right delimiters. Use the g flag to return all matches, otherwise only the first is returned.
| Parameters: |
|
|---|---|
| Returns: |
|
var input = "(t((e))s)t()(ing)";
var output = XRegExp.matchRecursive(input, "\\(", "\\)");
// -> ["t((e))s"]
// Global match
output = XRegExp.matchRecursive(input, "\\(", "\\)", "g");
// -> ["t((e))s", "", "ing"]
// Unbalanced delimiter on the left or right
output = XRegExp.matchRecursive("<<t>est", "<", ">", "g");
output = XRegExp.matchRecursive("<t>>est", "<", ">", "g");
// **both lines throw an error**
// Ignoring escaped delimiters
input = "t\\{e\\\\{s{t\\{i}ng}";
output = XRegExp.matchRecursive(input, "{", "}", "g", {escapeChar: "\\"});
// -> ["s{t\\{i}ng"]
// Extended information mode with valueNames
input = "HTML: <div id='x'>A <div>nested <div /></div> element.</div>";
// The left delimiter is designed to skip self-closed <div /> elements
output = XRegExp.matchRecursive(input, "<div\\b(?:[^>](?!/>))*>", "</div>", "i",
{valueNames: ["text", "left", "match", "right"]});
/* ->
[ ["text", "HTML: ", 0, 6],
["left", "<div id='x'>", 6, 18],
["match", "A <div>nested <div /></div> element.", 18, 54],
["right", "</div>", 54, 60] ]
*/
// Omitting unneeded parts with null valueNames
input = "...{1}..{function(a,b){return a+b;}}";
output = XRegExp.matchRecursive(input, "{", "}", "g",
{valueNames: ["literal", null, "value", null]});
/* ->
[ ["literal", "...", 0, 3],
["value", "1", 4, 5],
["literal", "..", 6, 8],
["value", "function(a,b){return a+b;}", 9, 35] ]
*/
/* The matchRecursive function specifically supports the y flag (sticky mode).
This mode requires the first match to appear at the beginning of the string,
with each subsequent match immediately following the last. (Outside of the
matchRecursive function, the y flag requires native browser support.) */
input = "<1><2><3>4<5>";
output = XRegExp.matchRecursive(input, "<", ">", "gy");
// -> ["1", "2", "3"]