Changes between Version 1 and Version 2 of IcGrepUnicodeLevel1

Jan 5, 2015, 11:32:44 AM (4 years ago)



  • IcGrepUnicodeLevel1

    v1 v2  
    1212consisting of 1 or 2 hex digits following {{{\x}}} or exactly 4 hex digits following {{{\u}}}.
    14 Also for compatibility, icgrep accepts octal notation.  An arbitrary codepoint may be
     14Also for compatibility with legacy implementations, icgrep accepts octal notation.  An arbitrary codepoint may be
    1515represented by 1 to 8 octal digits enclosed in braces following the {{{\o}}} escape.
    1616The short form consisting of 0 to 3 octal digits following {{{\0}}} (without braces) is also recognized.
     18== RL1.2 Properties ==
     20icGrep implements the full set of Unicode properties required by RL1.2, using
     21full property names or their aliases or any variation thereof in accord with the matching rules
     22of [ Unicode Standard Annex #44]. 
     23The following syntactic alternatives are supported.
     25  - {{{\p{}}}property-name{{{}}}} for binary properties
     26  - {{{\p{}}}property-name{{{=}}}property-value{{{}}}}
     27  - {{{\p{}}}property-value{{{}}}} for values of the General_Category or Script properties.
     29Following Perl syntactic conventions, negated forms of property expressions (matching all values not
     30having the specified property) use the {{{\P}}} syntax.
     32=== 1.2.1 General_Category ===
     34icGrep implements the General_Category property using full property-value names, or the standard one- or two-letter
     35codes.  For example, the following notations all represent expressions matching any codepoint
     36in the general category Letter: {{{\p{Letter}}}}, {{{\p{General_Category=Letter}}}}, {{{\p{L}}}}, {{{\p{generalcategory=l}}}}.
     38In addition, icGrep implements {{{\p{ANY}}}}, {{{\p{ASCII}}}}, and {{{\p{ASSIGNED}}}} as equivalent to
     39{{{[\u{0}-\u{10FFFF}]}}}, {{{\p{[\u{0}-\u{7F}]}}}}, and {{{\P{GC=Unassigned}}}} respectively.
     41=== 1.2.2 Script and Script Extensions Properties ===
     43Codepoints having particular Script property values may be specified by the script name or its 4-letter code.
     44{{{\p{Arab}}}}, {{{\p{script=Arabic}}}}, {{{\p{sc=arab}}}} are all equivalent script designations.
     46To specify codepoints whose Script_Extensions property includes a particular value, the property name
     47or its short form {{{scx}}} must be specified, for example {{{\p{scx=arab}}}}.