Changes between Version 43 and Version 44 of WikiStart


Ignore:
Timestamp:
Aug 10, 2016, 10:25:33 AM (3 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v43 v44  
    2626See the [ParabixTransform Parabix Transform] page for details.
    2727
    28 See the [ParabixTechniques Parabix Techniques] page for information on how to use the Parabix transform representation for various applications.
    2928
     29== Alphabets, Character Classes, Unicode ==
     30
     31The Parabix framework contains many facilities for working with character representations of various kinds.
     32
     33A fundamental notion is the character class bitstream.   This is a stream of bits in one-to-one correspondence
     34with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate
     35characters outside of the class.  Often we use regular-expression notation to identify character classes,
     36such as {{{[abc]}}} for the class containing the three lower-case letters "a", "b", and "c", and {{{[0-9]}}}
     37as the class for decimal digits.   The following example shows an input character stream and the corresponding
     38bit streams for the {{{[abc]}}} and {{{[0-9]}}} streams, respectively.   We conventionally mark 0 bits with
     39periods (".") to make the 1 bits stand out.
     40
     41{{{
     42input:   This is just 1 abbreviated example of character stream input containing 25 instances of the [abc] class and 6 instances of the [0-9] class.
     43[abc]:   ........1......111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
     44[0-9]:   .............1..........................................................11..................................1...................1.1........
     45}}}
    3046
    3147