wiki:ICgrepSyntax

Version 5 (modified by cameron, 5 years ago) (diff)

--

ICgrep Syntax

ICgrep supports an extensive regular expression syntax.

Regular Expression Metacharacters

\a Match a BELL, \u0007
\b Zero-width match at a word boundary: transition between word (\w) and non-word (\W) characters
\B Zero-width match if the current position is not a word boundary.
\cX Match a control-X character.
\d Match any character with the Unicode General Category of Nd (Number, Decimal Digit.)
\D Match any character that is not a decimal digit.
\e Match an ESCAPE, \u001B.
\f Match a FORM FEED, \u000C.
\p{Unicode property expression} Match any character with the specified Unicode property.
\P{Unicode property expression} Match any character not having the specified Unicode property.
\s Match a white space character.
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION, \u0009.
\uhhhh Match the character with the hex codepoint value hhhh.
\Uhhhhhhhh Match the character with the 8-digit hex codepoint value hhhhhhhh.
\u{hhh} Match the character with the hex code point value hhh (1-6 hex digits)
\w Match a word character.
\W Match a non-word character.
\x{hhh} Match the character with the hex code point value hhh (1-6 hex digits)
\xhh Match the character with two digit hex value hh
\0ooo Match the character with the octal codepoint value ooo.
[class] Match one character from the character class expression
. Match any legal character.
Zero-width match at the beginning of a line.
$ Zero-width match at the end of a line.
\ Escape the following metacharacter, match it literally.

Regular Expression Operators

| Alternation. A|B matches either A or B.
* Match 0 or more times.
+ Match 1 or more times.
? Match zero or one times. Prefer one.
{n} Match exactly n times
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
( ... ) Matched the parenthesized subexpression.
(?: ... ) Matched the parenthesized subexpression.
(?= ... ) Zero-width look-ahead assertion (single Unicode character lookahead).
(?! ... ) Negative look-ahead assertion (single Unicode character lookahead).
(?<= ... ) Zero-width look-behind assertion. Ensure that the parenthesized subexpression also matches to this point (arbitrary subexpression).
(?<! ... ) Negative look-behind assertion. Zero-width match if the parenthesized subexpression does not match at this position.
(?i: ... ) Match the parenthesized expression in case-insensitive mode.
(?i) Turn on case-insensitive mode for the rest of the current subexpression.

ICgrep also recognizes the "non-greedy" repetition operators *?, +?, ??, {n,m}?, as semantically equivalent to the normal ("greedy") versions. Due to the underlying parallel method, there is no performance distinction.

Character Class Expressions

[abc] Class that matches any of the characters a, b or c
[abc] Negated class - match any character except a, b or c
[b-e] Range - match any character in the consecutive Unicode range from b to e.
[\p{Unicode property expression}] Match any character having the given Unicode property.
[\p{Letter}&&\p{script=cyrillic}] Character class intersection. Match the set of all Cyrillic letters.
[\p{Letter}--\p{script=latin}] Character class subtraction. Match all non-Latin letters.
[:script=Greek:] Alternative POSIX syntax for properties. Unicode rules apply.