source: docs/ICA3PP2015/architecture.tex @ 5924

Last change on this file since 5924 was 4782, checked in by cameron, 4 years ago

Figure placement according to Springer

File size: 3.9 KB
Line 
1\section{Architecture}\label{sec:architecture}
2
3\paragraph{Regular Expression Preprocessing.}
4As shown in Fig.~\ref{fig:compiler},
5compilation in \icGrep{} comprises three logical layers: \RegularExpression{},
6\Pablo{} and the LLVM layer, each with their own intermediate representation
7(IR), transformation and compilation modules.
8As we traverse the layers, the IR becomes more complex as it begins to mirror the final machine code.
9The layering enables further optimization based on information available at each stage.
10The \REParser{} validates and transforms the input \RegularExpression{} into an abstract syntax tree (AST).
11Successive \RegularExpression{} Transformations exploit domain
12knowledge to optimize the regular expressions.
13The aforementioned \texttt{toUTF8} transformation also applies during this phase to generate code unit classes.
14
15\input{fig-compiler}
16
17
18The next layer transforms this AST into the instructions in the \Pablo{} IR.
19Recall that the \Pablo{} layer assumes a transposed view of the input data.
20The \emph{\RegularExpressionCompiler{}} first transforms all input code unit classes,
21analogous to non-Unicode character classes,
22into a series of equations over these transposed bitstreams.
23It next transforms the AST into \Pablo{} instructions that use the
24results of these equations.
25For instance, it converts alternations into a sequence of calculations that
26are merged with \verb|OR|s.
27The results of these passes are combined and transformed through typical
28optimization passes including dead code elimination
29(DCE), common subexpression elimination (CSE) and constant folding.
30These optimizations exploit redundancies that are harder to recognize in
31the \RegularExpression{} AST itself.
32
33The \PabloCompiler{} then directly converts the \Pablo{} IR into LLVM IR.
34The LLVM Compiler framework provides flexible APIs for compilation and linking.
35Using these, \icGrep{} dynamically generates a match function for identifying
36occurrences of the \RegularExpression{}.
37
38\paragraph{Dynamic Grep Engine.}
39Figure~\ref{fig:execution} shows the structure of the \icGrep{} matching engine.
40The input data is transposed into 8 parallel bit streams through the Transposition module.
41Using the 8 basis bits streams, the Required Streams Generator computes the
42line break streams, UTF-8 validation streams and the Initial and NonFinal streams
43needed to support ScanThru and MatchStar with UTF-8 data.
44The Dynamic Matcher, dynamically compiled via LLVM, retrieves the 8 basis bits and the required streams from their memory addresses and starts the matching process.
45During the matching process, any references to named Unicode properties generate calls to the appropriate routine in the Named Property Library.
46The Dynamic Matcher returns one bitstream that marks all the positions that fully match the compiled regular expression.
47Finally, a Match Scanner scans through the returned bitstream to select the matching lines and generate the normal grep output.
48
49\input{fig-executor}
50
51We can also apply a pipeline parallelism strategy to further speed up the process of \icGrep{}.
52Transposition and the Required Streams Generator can be performed in a separate thread which can start even before the dynamic compilation starts.
53The output of Transposition and the Required Streams Generator, that is the 8 basis bits streams and the required streams,
54are stored in a shared memory buffer for subsequent processing by the Dynamic Matcher once compilation is complete.
55A single thread performs both compilation and matching using the computed basis and required streams.
56To avoid L2 cache contention, we allocate only a limited amount of space for the shared data in a circular buffer.
57The performance is dependent on the slowest thread.   
58In the case that the cost of transposition and required stream generation is more than the matching process,
59we can further divide up the work and assign two threads for Transposition and the Required Streams Generator.
60
61
Note: See TracBrowser for help on using the repository browser.