wiki:ICgrepSyntax

ICgrep Syntax

ICgrep supports an extensive regular expression syntax, incorporating Posix, PCRE and ICU features.

Regular Expression Metacharacters

\a Match a BELL, \u0007
\b Zero-width match at a word boundary: transition between word (\w) and non-word (\W) characters
\B Zero-width match if the current position is not a word boundary.
\cX Match a control-X character.
\d Match any character with the Unicode General Category of Nd (Number, Decimal Digit.)
\D Match any character that is not a decimal digit.
\e Match an ESCAPE, \u001B.
\f Match a FORM FEED, \u000C.
\p{Unicode property expression} Match any character with the specified Unicode property.
\P{Unicode property expression} Match any character not having the specified Unicode property.
\s Match a white space character.
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION, \u0009.
\uhhhh Match the character with the hex codepoint value hhhh.
\Uhhhhhhhh Match the character with the 8-digit hex codepoint value hhhhhhhh.
\u{hhh} Match the character with the hex code point value hhh (1-6 hex digits)
\w Match a word character.
\W Match a non-word character.
\x{hhh} Match the character with the hex code point value hhh (1-6 hex digits)
\xhh Match the character with two digit hex value hh
\0ooo Match the character with the octal codepoint value ooo.
[class] Match one character from the character class expression
. Match any legal character.
^ Zero-width match at the beginning of a line.
$ Zero-width match at the end of a line.
\ Escape the following metacharacter, match it literally.

Regular Expression Operators

| Alternation. A|B matches either A or B.
* Match the preceding item 0 or more times.
+ Match the preceding item 1 or more times.
? The preceding item is optional (equivalent to {0,1}).
{n} Match the preceding item exactly n times.
{n,} Match the preceding item at least n times.
{n,m} Match the preceding item between n and m times.
( ... ) Matched the parenthesized subexpression.
(?: ... ) Matched the parenthesized subexpression.
(?= ... ) Zero-width look-ahead assertion (single Unicode character lookahead).
(?! ... ) Negative look-ahead assertion (single Unicode character lookahead).
(?<= ... ) Zero-width look-behind assertion. Ensure that the parenthesized subexpression also matches to this point (arbitrary subexpression).
(?<! ... ) Negative look-behind assertion. Zero-width match if the parenthesized subexpression does not match at this position.
(?i: ... ) Match the parenthesized expression in case-insensitive mode.
(?i) Turn on case-insensitive mode for the rest of the current subexpression.

ICgrep also recognizes the "non-greedy" repetition operators *?, +?, ??, {n,m}?, as semantically equivalent to the normal ("greedy") versions. Due to the underlying parallel method, there is no performance distinction.

Character Class Expressions

[abc] Class that matches any of the characters a, b or c
[^abc] Negated class - match any character except a, b or c
[b-e] Range - match any character in the consecutive Unicode range from b to e.
[\p{Unicode property expression}] Match any character having the given Unicode property.
[\p{Letter}&&\p{script=cyrillic}] Character class intersection. Match the set of all Cyrillic letters.
[\p{Letter}--\p{script=latin}] Character class subtraction. Match all non-Latin letters.
[:script=Greek:] Alternative Posix syntax for properties. Unicode rules apply.
Last modified 3 years ago Last modified on Mar 2, 2015, 9:22:32 AM