source: docs/PACT14/compilation.tex @ 4556

Last change on this file since 4556 was 3880, checked in by cameron, 5 years ago

Various formatting items

File size: 3.8 KB
Line 
1\paragraph*{Compilation} Using the marker stream and MatchStar concept, we now
2outline our compilation algorithm.  This is implemented
3in a Java program.   First the regular expression is
4parsed and represented as an abstract syntax tree.
5Second, the various character classes used in the
6regular expression are extracted.  The character
7class compiler of the Parabix framework is invoked
8to generate the bit stream equations required for
9each character class.  Then the syntax tree is
10walked to generate code for each type of regular expression
11structure as follows.
12
13\begin{itemize}
14\item An initial marker stream $M_0$ is set to be all
15ones, indicating that every position in the input file
16is a potential  match if we have not yet examined any
17pattern elements.
18
19\item If we have a regular expression formed as an
20alternation of subexpressions, we compile each of these
21in turn, providing the current input marker stream as input
22to each of them.  The final marker streams of the
23compiled forms of each subexpression are then just combined
24using a bitwise-or to produce the overall final marker
25stream of the alternation.  That is, a match occurs at
26any position that can be reached by matching any one of the
27alternatives.
28
29\item If we have a marker stream formed as a concatenation of
30subexpressions, then we compile each of these in turn, providing
31the output marker stream of each compilation as the input
32marker stream for the compilation of the next pattern element.
33
34\item If a regular expression is a character class expression
35or a single character, then we form the bitwise-and of the
36current marker stream and the character class stream to
37filter out current marker positions that do not have
38a match to the class.   The result is then shifted forward
39one position to identify the successful matches.
40
41\item If a regular expression is an optional expression of the
42form {\tt R?} for some subexpression {\tt R}, then the
43output marker stream is simply formed as the bitwise-or
44of the input marker stream (zero occurrences of {\tt R}
45matched) and the output stream produced by compiling {\tt R}
46in the context of the current input marker stream (one
47occurrence matched).
48
49\item If a regular expression is a repetition of a character
50class of the form {\tt C*}, then the compiled form uses the MatchStar operation
51to produce the output marker stream from the input stream
52and the compiled stream for character class {\tt C}.
53
54\item If a regular expression is a repetition of a non
55character class of the form {\tt R*}, then a Pablo while loop
56is created conditioned on a control marker stream still
57having bits marking match positions to  be considered.   The body of the
58while consists of the compiled form of the expression {\tt R},
59taking as input the marker stream at the beginning of the iteration
60and producing as output all positions that can be reached from
61the input positions in a single step.   These output positions
62are candidates for further iteration, but positions that
63have already been considered are removed.  This guarantees
64termination of the loop; iteration continues only if a new
65marker position is reached that has not been previously
66considered as an output.
67The final output is the bitwise-or of matches determined in each loop
68iteration.
69
70\item If a regular expression is a bounded repetition of the form \verb:R{m,n}:,
71then it is compiled according to the equivalent form
72consisting of $m$ concatenations of \verb:R: followed
73by $n-m$ concatenations of \verb:R?:.
74
75\item If a regular expression is a bounded repetition of the form \verb:R{m,}:,
76then it is compiled according to the equivalent form
77consisting of $m$ concatenations of \verb:R: followed
78by \verb:R*:.
79\end{itemize}
80
81The output of the regular expression compiler is then fed
82as input to the Pablo compiler of the Parabix tool chain.
83The result is then compiled with a C++ compiler linked with
84the Parabix run-time libraries.
85
Note: See TracBrowser for help on using the repository browser.