New flags

About flags

XRegExp provides three new flags, which can be combined with native flags and arranged in any order. Unlike native flags, non-native flags do not show up as properties on regular expression objects.

Dot matches all (s)

The now abandoned ES4 proposals called for recognizing the C1/Unicode NEL "next line" control code (U+0085) as an additional newline code point in that standard.

When using XRegExp's Unicode Properties addon, you can match any code unit without the s flag by using \p{Any}.

To make unescaped dots outside of character classes match any code point rather than code unit, you can use this tiny XRegExp addon.

Usually, a dot does not match newlines. However, a mode in which dots match any code unit (including newlines) can be as useful as one where dots don't. The s flag allows the mode to be selected on a per-regex basis. Escaped dots (\.) and dots within character classes ([.]) are always equivalent to literal dots. The newline code points are as follows:

Annotations

Free-spacing and line comments (x)

It might be better to think of whitespace and comments as do-nothing (rather than ignore-me) metacharacters. This distinction is important with something like \12 3, which with the x flag is taken as \12 followed by 3, and not \123. However, quantifiers following whitespace or comments apply to the preceeding token, so x + is equivalent to x+.

Unicode 1.1.5–4.0.0 assigned code point U+200b (ZWSP) to the Zs (Space separator) category, which means that some browsers or regex engines might include this additional code point in those matched by \s, etc. Unicode 4.0.1 moved ZWSP to the Cf (Format) category.

Unicode 1.1.5 assigned code point U+feff (ZWNBSP) to the Zs (Space separator) category. Unicode 2.0.14 moved ZWNBSP to the Cf (Format) category. ES5 explicitly includes ZWNBSP in its list of whitespace characters, even though this does not match any version of the Unicode standard since 1996.

ES5's \s is similar but not equivalent to \p{Z} (the Separator category) from regex libraries that support Unicode categories, including XRegExp's own Unicode Categories addon. The difference is that \s includes code points U+0009U+000d and U+feff, which are not assigned the Separator category in the Unicode character database.

ES5's \s is nearly equivalent to \p{WhiteSpace} from the Unicode Properties addon. The differences are: 1. \p{WhiteSpace} does not include U+feff (ZWNBSP). 2. \p{WhiteSpace} includes U+0085 (NEL), which is not assigned the Separator category in the Unicode character database.

Aside: not all JavaScript regex syntax is Unicode-aware. According to ES3/5, \s, \S, ., ^, and $ use Unicode-based interpretations of whitespace and newline, while \d, \D, \w, \W, \b, and \B use ASCII-only interpretations of digit, word character, and word boundary (e.g., /a\b/.test("naïve") returns true). Many browsers get some of these details wrong.

To check which code points are matched by these tokens in your browser, try the JavaScript regex Unicode compatibility test. For more details, see JavaScript, Regex, and Unicode.

This flag has two complementary effects. First, it causes most whitespace to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading #. Specifically, it turns most whitespace into an "ignore me" metacharacter, and # into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are not free-format, even with x), and as with other metacharacters, you can escape whitespace and # that you want to be taken literally. Of course, you can always use \s to match whitespace.

The ignored whitespace characters are those matched natively by \s. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus U+feff. Following are the code points that should be matched by \s according to ES5 and Unicode 4.0.1–6.1.0:

Annotations

Explicit capture (n)

Specifies that the only valid captures are explicitly named groups of the form (?<name>…). This allows unnamed (…) parentheses to act as noncapturing groups without the syntactic clumsiness of the expression (?:…).

Annotations