Ignore:
Timestamp:
May 14, 2015, 10:06:17 PM (4 years ago)
Author:
cameron
Message:

Refine UTF-8 equations

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/unicode-re.tex

    r4555 r4556  
    2626prefixes of two-, three-, or four-byte sequences, or as suffix bytes.
    2727In addition, we say that the {\em scope} bytes of a prefix are the
    28 immediately following byte positions at which a suffix byte is expected.
     28immediately following byte positions at which a suffix byte is
     29expected.
     30Mismatches between scope expectations and occurrences of suffix
     31bytes indicate errors.
    2932Two helper streams are also useful.
    3033The Initial stream marks ASCII bytes and prefixes of multibyte sequences,
     
    3336\begin{eqnarray*}
    3437\mbox{\rm ASCII} & = & \mbox{\rm CharClass}(\verb:[\x00-\x7F]:) \\
    35 \mbox{\rm Prefix2} & = & \mbox{\rm CharClass}(\verb:[\xC2-\xDF]:) \\
    36 \mbox{\rm Prefix3} & = & \mbox{\rm CharClass}(\verb:[\xE0-\xEF]:) \\
     38\mbox{\rm Prefix} & = & \mbox{\rm CharClass}(\verb:[\xC2-\F4]:) \\
     39\mbox{\rm Prefix3or4} & = & \mbox{\rm CharClass}(\verb:[\xE0-\xF4]:) \\
    3740\mbox{\rm Prefix4} & = & \mbox{\rm CharClass}(\verb:[\xF0-\xF4]:) \\
    3841\mbox{\rm Suffix} & = & \mbox{\rm CharClass}(\verb:[\x80-\xBF]:) \\
    39 \mbox{\rm Scope2} & = & \mbox{\rm Advance}(\mbox{Prefix2} \vee \mbox{Prefix3} \vee \mbox{Prefix4}) \\
    40 \mbox{\rm Scope3} & = & \mbox{\rm Advance}(\mbox{\rm Advance}(\mbox{Prefix3} \vee \mbox{Prefix4})) \\
    41 \mbox{\rm Scope4} & = & \mbox{\rm Advance}(\mbox{\rm Advance}(\mbox{\rm Advance}(\mbox{Prefix4}))) \\
    42 \mbox{\rm Initial} & = & \mbox{\rm ASCII} \vee \mbox{Prefix2} \vee \mbox{Prefix3} \vee \mbox{Prefix4} \\
    43 \mbox{\rm NonFinal} & = & \mbox{\rm Prefix} \vee \mbox{\rm Advance}(\mbox{Prefix3} \vee \mbox{Prefix4}) \vee \mbox{\rm Advance}(\mbox{\rm Advance}(\mbox{Prefix4}))
     42\mbox{\rm Scope} & = & \mbox{\rm Advance}(\mbox{Prefix}) \vee \mbox{\rm
     43  Advance}(\mbox{Prefix3or4},2) \vee \mbox{\rm
     44  Advance}(\mbox{Prefix4}, 3) \\
     45\mbox{\rm Mismatch} & = & \mbox{\rm Scope} \oplus \mbox{Suffix} \\
     46\mbox{\rm Initial} & = & \mbox{\rm ASCII} \vee \mbox{Prefix} \\
     47\mbox{\rm NonFinal} & = & \mbox{\rm Prefix} \vee \mbox{\rm
     48  Advance}(\mbox{Prefix3or4}) \vee \mbox{\rm Advance}(\mbox{Prefix4}, 2)
    4449\end{eqnarray*}
    4550
Note: See TracChangeset for help on using the changeset viewer.