wiki:CharacterClassCompiler

Version 2 (modified by cameron, 4 years ago) (diff)

--

The Parabix Character Class Compiler

Character Class Bit Streams

Given an input stream of character code units, a character class bit stream is a stream of bits defined in one-to-one correspondence with the input stream such that one bits mark instances of character code units within the class, and zero bits mark instances of character code units outside the class.

For example, consider the ASCII character class expression [abc] standing for the class comprising ASCII bytes having code unit values 97 (ASCII value for a), 98 (ASCII value for b) or 99 (ASCII value for c). The following example shows the [abc] character class bit stream aligned with an example ASCII input stream.

input:  This is an example ASCII byte stream.  ASCII is an abbreviation.  
[abc]:  ........1....1...........1........1.............1..111..........

By convention, zero bits within a character class bit stream are marked with periods, so that the one bits (each marked with the digit 1) stand out.