source: docs/Working/multiplexed-CCs.tex @ 5365

Last change on this file since 5365 was 5365, checked in by cameron, 2 years ago

Multiplexed character classes note

File size: 2.4 KB
Line 
1\documentclass{article}
2\usepackage{amsmath}
3\usepackage{fullpage}
4\usepackage{amssymb}
5\newcommand\ceil[1]{\lceil#1\rceil}
6\DeclareMathOperator\shl{\texttt{<<}}
7
8\title{Multiplexed Character Classes}
9\author{Robert D. Cameron}
10
11\begin{document}
12\maketitle
13
14\paragraph*{Charaacter Class Multiplexing.}
15
16The process of character class multiplexing is to find a minimal set of
17character class basis streams from which all the character classes used in
18a particular regular expression can be computed.   For example, the regular expression
19\verb:[a-z]+ingly!: has seven nominal character classes: 
20\verb:[a-z]:, \verb:[i]:, \verb:[n]:, \verb:[g]:, \verb:[l]:, \verb:[y]:, and \verb:[!]:.
21By applying the multiplexing concept, however, a set of three character class bit
22streams can be computed which represent all the possibilities. 
23
24\paragraph*{Exclusive Character Class Set.}
25
26The first step in character class multiplexing is to determine a set of
27mutually-exclusive and collectively-exhaustive
28character classes that represent the nominal character classes as well as
29input characters that are not in any class.   In this case, the class
30of all characters not present in the expression is \verb:[^a-z!]:, while
31the minimal set of mutually exclusive classes that include all the characters
32in the expression comprises
33\verb:[!]:, \verb:[a-fhjkmo-xz]:, \verb:[g]:, \verb:[i]:, \verb:[l]:, \verb:[n]:, and \verb:[y]:.
34
35\paragraph*{Multiplexed Encoding.}
36
37Given a set of mutually exclusive and collectively exhaustive character classes of size $N$,
38a $\log_2 N$ bit encoding can be computed to represent the classes.
39Given the 8 classes of our running example, we can assign a 3-bit code to each class as follows.
40\begin{center}
41\begin{tabular}{c c}
42000  & \verb:[^a-z!]: \\
43001 & \verb:[!]: \\
44010 & \verb:[a-fhjkmo-xz]: \\
45011 & \verb:[g]: \\
46100 & \verb:[i]: \\
47101 & \verb:[l]: \\
48110 & \verb:[n]: \\
49111 & \verb:[y]: \\
50\end{tabular}
51\end{center}
52
53\paragraph*{Multiplexed Classes.}
54
55In order to compute the final set of multiplexed basis streams, we can determine
56the appropriate character classes for each bit of the multiplexed encoding.
57Our running example then reduces to the following 3 character classes for
58the multiplexed encoding (little-endian bit numbering).
59
60\begin{center}
61\begin{tabular}{c c}
62bit 2 & \verb:[inly]: \\
63bit 1 & \verb:[a-hjkm-z]: \\
64bit 0 & \verb:[!gly]: \\
65\end{tabular}
66\end{center}
67
68
69\end{document}
Note: See TracBrowser for help on using the repository browser.