Changes between Version 1 and Version 2 of IcGrepUnicodeLevel1


Ignore:
Timestamp:
Jan 5, 2015, 11:32:44 AM (4 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • IcGrepUnicodeLevel1

    v1 v2  
    1212consisting of 1 or 2 hex digits following {{{\x}}} or exactly 4 hex digits following {{{\u}}}.
    1313
    14 Also for compatibility, icgrep accepts octal notation.  An arbitrary codepoint may be
     14Also for compatibility with legacy implementations, icgrep accepts octal notation.  An arbitrary codepoint may be
    1515represented by 1 to 8 octal digits enclosed in braces following the {{{\o}}} escape.
    1616The short form consisting of 0 to 3 octal digits following {{{\0}}} (without braces) is also recognized.
    1717
     18== RL1.2 Properties ==
     19
     20icGrep implements the full set of Unicode properties required by RL1.2, using
     21full property names or their aliases or any variation thereof in accord with the matching rules
     22of [http://www.unicode.org/reports/tr44/ Unicode Standard Annex #44]. 
     23The following syntactic alternatives are supported.
     24
     25  - {{{\p{}}}property-name{{{}}}} for binary properties
     26  - {{{\p{}}}property-name{{{=}}}property-value{{{}}}}
     27  - {{{\p{}}}property-value{{{}}}} for values of the General_Category or Script properties.
     28
     29Following Perl syntactic conventions, negated forms of property expressions (matching all values not
     30having the specified property) use the {{{\P}}} syntax.
     31
     32=== 1.2.1 General_Category ===
     33
     34icGrep implements the General_Category property using full property-value names, or the standard one- or two-letter
     35codes.  For example, the following notations all represent expressions matching any codepoint
     36in the general category Letter: {{{\p{Letter}}}}, {{{\p{General_Category=Letter}}}}, {{{\p{L}}}}, {{{\p{generalcategory=l}}}}.
     37
     38In addition, icGrep implements {{{\p{ANY}}}}, {{{\p{ASCII}}}}, and {{{\p{ASSIGNED}}}} as equivalent to
     39{{{[\u{0}-\u{10FFFF}]}}}, {{{\p{[\u{0}-\u{7F}]}}}}, and {{{\P{GC=Unassigned}}}} respectively.
     40
     41=== 1.2.2 Script and Script Extensions Properties ===
     42
     43Codepoints having particular Script property values may be specified by the script name or its 4-letter code.
     44{{{\p{Arab}}}}, {{{\p{script=Arabic}}}}, {{{\p{sc=arab}}}} are all equivalent script designations.
     45
     46To specify codepoints whose Script_Extensions property includes a particular value, the property name
     47or its short form {{{scx}}} must be specified, for example {{{\p{scx=arab}}}}.
     48
     49
     50