source: docs/Working/icGrep/fig-compiler.tex @ 4451

Last change on this file since 4451 was 4451, checked in by lindanl, 5 years ago

Execution Diagram.

File size: 8.6 KB
Line 
1\def\RegularExpression{RegEx}
2\def\Pablo{Pablo}
3\def\CodeUnit{Code Unit}
4\def\REParser{\RegularExpression{} Parser}
5\def\CodeUnitCompiler{\CodeUnit{} Compiler}
6\def\RegularExpressionCompiler{\RegularExpression{} Compiler}
7\def\PabloCompiler{\Pablo{} Compiler}
8
9\begin{figure}[h] \label{fig:compiler}
10\begin{center}
11% Define block styles
12%\tikzstyle{decision} = [diamond, shape aspect=2, rotate=30, draw, text width=4.5em, text badly centered, inner sep=0pt, thick]
13\tikzstyle{block} = [rectangle, draw, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em]
14\tikzstyle{line} = [draw, ->, line width=1.4pt]
15\tikzstyle{seperator} = [draw, line width=0.125em, dashed]
16\tikzset{block/.append style={execute at begin node=\footnotesize}}   
17\begin{tikzpicture}[node distance=3cm, auto, >=stealth]
18
19    % Place nodes
20    \node [draw=none] (RE) {\RegularExpression{}};
21    \node [block, right of=RE, node distance=12em] (PropertyExtraction) {Property Extraction};
22    \node [block, below of=RE] (REParser) {\REParser{}};
23    \node [block, below of=REParser] (RETransform) {\RegularExpression{} Transformations};   
24    \coordinate[below of=RETransform, node distance=3em] (Point);   
25    \node [block, left of=Point, node distance=10em] (CUCompiler) {\CodeUnitCompiler{}};
26    \node [block, right of=Point, node distance=10em] (RECompiler) {\RegularExpressionCompiler{}};   
27    \node [block, below of=Point, node distance=3em] (PabloTransform) {\Pablo{} Transformations};   
28    \node [block, below of=PabloTransform] (PabloCompiler) {\PabloCompiler{}};
29    \node [block, below of=PabloCompiler] (LLVMCompiler) {LLVM Compiler};
30    \node [draw=none, below of=LLVMCompiler, node distance=3.5em] (Matcher) {Dynamically-Generated Match Function};
31   
32    % Draw edges
33    \path [line] (RE) -- (REParser);
34    \path [line] (RE) -- (PropertyExtraction);
35    \path [line] (REParser) -- (RETransform);
36    \path [line] (RETransform) -| (CUCompiler);
37    \path [line] (RETransform) -| (RECompiler);
38    \path [line] (CUCompiler) |- (PabloTransform);
39    \path [line] (RECompiler) |- (PabloTransform);
40    \path [line] (PabloTransform) -- (PabloCompiler);
41    \path [line] (PabloCompiler) -- (LLVMCompiler);
42    \path [line] (LLVMCompiler) -- (Matcher);
43   
44    % Draw seperators
45    \coordinate[right of=REParser, node distance=20em] (SR);
46    \coordinate[left of=REParser, node distance=20em] (SL);
47    \path [seperator] (SL) -- (REParser);
48    \path [seperator] (REParser) -- (SR);
49   
50    \coordinate[left of=Point, node distance=20em] (PL);
51    \coordinate[right of=Point, node distance=20em] (PR);
52    \path [seperator] (PL) -- (CUCompiler);
53    \path [seperator] (CUCompiler) -- (RECompiler);
54    \path [seperator] (RECompiler) -- (PR);
55
56    \coordinate[right of=PabloCompiler, node distance=20em] (LR);
57    \coordinate[left of=PabloCompiler, node distance=20em] (LL);
58    \path [seperator] (LL) -- (PabloCompiler);
59    \path [seperator] (PabloCompiler) -- (LR);   
60   
61    \coordinate[right of=LLVMCompiler, node distance=20em] (OR);
62    \coordinate[left of=LLVMCompiler, node distance=20em] (OL);
63    \path [seperator] (OL) -- (LLVMCompiler);
64    \path [seperator] (LLVMCompiler) -- (OR);       
65   
66    % Seperator text
67    \node [draw=none,anchor=west] at ($(SL)!0.5!(PL)$) {1)~\RegularExpression{}};
68    \node [draw=none,anchor=west] at ($(PL)!0.5!(LL)$) {2)~\Pablo{}};
69    \node [draw=none,anchor=west] at ($(LL)!0.5!(OL)$) {3)~LLVM};
70   
71   
72\end{tikzpicture}
73
74\end{center}
75\caption{icGrep Architectural Diagram.}
76\end{figure} 
77
78As show in Figure \ref{fig:compiler},
79icGrep is composed of three logical layers: \RegularExpression{}, \Pablo{} and the LLVM layer, each with their own intermediate representation
80(IR), transformation and compilation modules.
81%
82As we traverse the layers, the IR becomes significantly more complex as it begins to mirror the final machine code.
83%
84The \REParser{} validates and transforms the input \RegularExpression{} into an abstract syntax tree (AST).
85%
86The AST is a minimalistic representation that, unlike traditional \RegularExpression{}, is not converted into a NFA or DFA for further processing.
87%
88Instead, icGrep passes the AST into the transformation module, which includes a set of \RegularExpression{} specific optimization passes.
89%
90The initial \emph{Nullable} pass, determines whether the \RegularExpression{} contains any prefixes or suffixes that may be removed or
91modified whilst still providing the same number of matches as the original expression.
92%
93For example, ``\verb|a*bc+|'' is equivalent to ``\verb|bc|'' because the Kleene Star (Plus) operator matches zero (one) or more instances of a
94specific character.
95%
96The \emph{toUTF8} pass converts the characters in the input \RegularExpression{} into the equivalent expression(s) that represent the sequences
97of 8-bit code units necessary to identify the presence of a particular character.
98%
99Since some characters have multiple logically equivalent representations, such as \textcolor{red}{\textbf{????}}, this may produce nested sequences or alternations.
100%
101This is described in more detail in \S\ref{sec:Unicode:toUTF8}.
102%
103To alleviate this, the final \emph{Simplification} pass flattens nested sequences and alternations into their simplest legal form.
104%
105For example, ``\verb`a(b((c|d)|e))`'' would become ``\verb`ab(c|d|e)`'' and ``\verb`([0-9]{3,5}){3,5}`'', ``\verb`[0-9]{9,25}`''.
106%
107
108%% DISCUSS ANALYSIS MODULE?
109
110
111The \RegularExpression{} layer has two compilers: the \CodeUnit{} and \RegularExpressionCompiler{}, both of which produce \Pablo{} IR.
112%
113Recall that the \Pablo{} layer assumes a transposed view of the input data.
114%
115The \emph{\CodeUnitCompiler{}} transforms the input code unit classes, either extracted from the \RegularExpression{} or produced by the
116\emph{toUTF8} transformation, into a series of bit stream equations.
117%
118The \emph{\RegularExpressionCompiler{}} assumes that these have been calculated and transforms the \RegularExpression{} AST into
119a sequence of instructions.
120%
121For instance, it would convert any alternations into a sequence of calculations that are merged with \verb|OR|s.
122%
123The results of these passes are combined and transformed through a series of typical optimization passes, including dead code elimination
124(DCE), common subexpression elimination (CSE), and constant folding.
125%
126These are necessary at this stage because the \RegularExpression{} AST may include common subsequences that are costly to recognize in
127that form.
128%
129Similarly, to keep the \CodeUnitCompiler{} a linear time function, it may introduce redundant IR instructions as it applies traditional Boolean
130algebra transformations, such as de Morgan's law, to the computed streams.
131%
132An intended side-effect of these passes is that they eliminate the need to analyze the data-dependencies inherent in the carry-bit logic,
133which is necessary for some \Pablo{} instructions but problematic for optimizers to reason about non-conservatively.
134%
135The \PabloCompiler{} then converts the \Pablo{} IR into LLVM IR.
136%
137This is a relatively straightforward conversion:
138%
139the only complexities it introduces is the generation of Phi nodes, linking of statically-compiled functions, and assignment of carry variables.
140%
141It produces the dynamically-generated match function used by the icGrep.
142
143\begin{figure}[h] \label{fig:execution}
144\begin{center}
145\tikzstyle{block} = [rectangle, draw, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em]
146\tikzstyle{line} = [draw, ->, line width=1.4pt]
147\tikzstyle{seperator} = [draw, line width=0.125em, dashed]
148\tikzset{block/.append style={execute at begin node=\footnotesize}}   
149\begin{tikzpicture}[node distance=3cm, auto, >=stealth]
150
151    % Place nodes
152    \node [draw=none] (InputData) {Input Data};
153    \node [block, below of=InputData] (S2P) {S2P};
154    \node [block, below of=S2P] (RequiredStreamsGenerator) {Required Streams Generator};
155    \node [block, below of=RequiredStreamsGenerator] (JITFunction) {JIT Function};
156    \node [block, right of=JITFunction, node distance=20em] (NamedPropertyLibaray) {Named Property Library};
157    \node [block, below of=JITFunction] (MatchScanner) {Match Scanner};
158    \node [draw=none, below of=MatchScanner, node distance=3.5em] (OutputResult) {Output Result};
159   
160    % Draw edges
161    \path [line] (InputData) -- (S2P);
162    \path [line] (S2P) -- (RequiredStreamsGenerator);
163    \path [line] (RequiredStreamsGenerator) -- (JITFunction);
164    \path [line] (NamedPropertyLibaray) -- (JITFunction);
165    \path [line] (JITFunction) -- (MatchScanner);
166    \path [line] (MatchScanner) -- (OutputResult);
167   
168\end{tikzpicture}
169
170\end{center}
171\caption{icGrep Execution Diagram.}
172\end{figure} 
Note: See TracBrowser for help on using the repository browser.