wiki:IcGrepUnicodeLevel2

Unicode Level 2 Support in icGrep

Building on the full Unicode Level 1 support of icGrep 1.0, we're actively working towards full support of the Unicode Level 2 requirements of Unicode Technical Standard #18 in icGrep 2.0.

The active development version of icGrep can be found in icGREP/icgrep-devel.

RL2.2 Extended Grapheme Clusters

From r4852, icGrep supports the following extended graphme cluster features.

  • \X to match a single extended grapheme cluster.
  • \b{g} syntax as a zero-width assertion for extended grapheme cluster boundaries.
  • \B{g} syntax as a zero-width assertion for the internal codepoint boundaries within extended grapheme clusters.
  • (?g) syntax to enable grapheme cluster mode: regular expression elements must always match full grapheme clusters.

RL2.3 Default Word Boundaries

TODO.

RL2.4 Default Case Conversion

TODO.

RL2.5 Name Properties

From r4852, icGrep supports `\N{}' syntax for codepoint names.

RL2.6 Wildcards in Property Values

From r4852, icGrep supports arbitrary regular expressions within `\N{}' syntax for codepoint names. For example, \N{\bSMIL(E|ING)\b} denotes the set of all Unicode codepoints having one of the words SMILE or SMILING in their names (Emoji search!).

RL2.7 Full Properties

The development version of icGrep now provides substantially more coverage of Unicode properties than icGrep 1.0. See our separate page on Property Support in icgrep.

Last modified 20 months ago Last modified on Nov 1, 2015, 11:16:56 AM