# Changeset 4466

Ignore:
Timestamp:
Feb 6, 2015, 9:37:37 AM (4 years ago)
Message:

Evaluation of optimization; diagram tweaks

Location:
docs/Working/icGrep
Files:
4 edited

### Legend:

Unmodified
 r4446 \section{Architecture}\label{sec:architecture} \include{fig-compiler} \subsection{Regular Expression Preprocessing} \input{fig-compiler} As show in Figure \ref{fig:compiler}, icGrep is composed of three logical layers: \RegularExpression{}, \Pablo{} and the LLVM layer, each with their own intermediate representation (IR), transformation and compilation modules. % As we traverse the layers, the IR becomes significantly more complex as it begins to mirror the final machine code. % The \REParser{} validates and transforms the input \RegularExpression{} into an abstract syntax tree (AST). % The AST is a minimalistic representation that, unlike traditional \RegularExpression{}, is not converted into a NFA or DFA for further processing. % Instead, icGrep passes the AST into the transformation module, which includes a set of \RegularExpression{} specific optimization passes. % The initial \emph{Nullable} pass, determines whether the \RegularExpression{} contains any prefixes or suffixes that may be removed or modified whilst still providing the same number of matches as the original expression. % For example, \verb|a*bc+|'' is equivalent to \verb|bc|'' because the Kleene Star (Plus) operator matches zero (one) or more instances of a specific character. % The \emph{toUTF8} pass converts the characters in the input \RegularExpression{} into the equivalent expression(s) that represent the sequences of 8-bit code units necessary to identify the presence of a particular character. % Since some characters have multiple logically equivalent representations, such as \textcolor{red}{\textbf{????}}, this may produce nested sequences or alternations. % This is described in more detail in \S\ref{sec:Unicode:toUTF8}. % To alleviate this, the final \emph{Simplification} pass flattens nested sequences and alternations into their simplest legal form. % For example, \verba(b((c|d)|e))'' would become \verbab(c|d|e)'' and \verb([0-9]{3,5}){3,5}'', \verb[0-9]{9,25}''. % %% DISCUSS ANALYSIS MODULE? The \RegularExpression{} layer has two compilers: the \CodeUnit{} and \RegularExpressionCompiler{}, both of which produce \Pablo{} IR. % Recall that the \Pablo{} layer assumes a transposed view of the input data. % The \emph{\CodeUnitCompiler{}} transforms the input code unit classes, either extracted from the \RegularExpression{} or produced by the \emph{toUTF8} transformation, into a series of bit stream equations. % The \emph{\RegularExpressionCompiler{}} assumes that these have been calculated and transforms the \RegularExpression{} AST into a sequence of instructions. % For instance, it would convert any alternations into a sequence of calculations that are merged with \verb|OR|s. % The results of these passes are combined and transformed through a series of typical optimization passes, including dead code elimination (DCE), common subexpression elimination (CSE), and constant folding. % These are necessary at this stage because the \RegularExpression{} AST may include common subsequences that are costly to recognize in that form. % Similarly, to keep the \CodeUnitCompiler{} a linear time function, it may introduce redundant IR instructions as it applies traditional Boolean algebra transformations, such as de Morgan's law, to the computed streams. % An intended side-effect of these passes is that they eliminate the need to analyze the data-dependencies inherent in the carry-bit logic, which is necessary for some \Pablo{} instructions but problematic for optimizers to reason about non-conservatively. % The \PabloCompiler{} then converts the \Pablo{} IR into LLVM IR. % This is a relatively straightforward conversion: % the only complexities it introduces is the generation of Phi nodes, linking of statically-compiled functions, and assignment of carry variables. % It produces the dynamically-generated match function used by the icGrep. \subsection{Dynamic Grep Engine} \input{fig-executor} As shown in Figure \ref{fig:execution}, icGrep takes the input data and transposed it into 8 parallel bit streams through S2P module. The required streams, e.g. line break stream, can then be generated using the 8 basis bits streams. The JIT function retrieves the 8 basis bits and the required streams from their memory addresses and starts the matching process. Named Property Library that includes all the predefined Unicode categories is installed into JIT function and can be called during the matching process. JIT function returns one bitstream that marks all the matching positions. A match scanner will scan through this bitstream and calculate the total counts or write the context of each match position. We can also apply a pipeline parallelism strategy to further speed up the process of icGrep. S2P and Required Streams Generator can be process in a separate thread and start even before the dynamic compilation starts. The output of S2P and Required Streams Generator, that is the 8 basis bits streams and the required streams, needs to be stored in a shared memory space so that the JIT function can read from it. To be more efficient of memory space usage, we only allocate limit amount of space for the shared data. When each chunk of the shared space is filled up with the bitstream data, the thread will start writing to the first chunk if it is released by JIT function. Otherwise, it will wait for JIT function until it finishes processing that chunk. Therefore, the performance is depended on the slowest thread. In the case that the cost of transposition and required stream generation is more than the matching process, we can further divide up the work and assign two threads for S2P and Required Streams Generator.
 r4461 \def\PabloCompiler{\Pablo{} Compiler} \begin{figure}[h] \label{fig:compiler} \begin{figure}[tbh] \begin{center} % Define block styles %\tikzstyle{decision} = [diamond, shape aspect=2, rotate=30, draw, text width=4.5em, text badly centered, inner sep=0pt, thick] \tikzstyle{block} = [rectangle, draw, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em] \tikzstyle{block} = [rectangle, draw, text width=12em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em] \tikzstyle{line} = [draw, ->, line width=1.4pt] \tikzstyle{seperator} = [draw, line width=0.125em, dashed] \tikzstyle{separator} = [draw, line width=0.125em, dashed] \tikzset{block/.append style={execute at begin node=\footnotesize}} \begin{tikzpicture}[node distance=3cm, auto, >=stealth] % Place nodes \node [draw=none] (RE) {\RegularExpression{}}; \node [block, right of=RE, node distance=12em] (PropertyExtraction) {Property Extraction}; \node [block, below of=RE] (REParser) {\REParser{}}; \node [block, below of=REParser] (RETransform) {\RegularExpression{} Transformations}; % Draw edges \path [line] (RE) -- (REParser); \path [line] (RE) -- (PropertyExtraction); %\path [line] (RE) -- (PropertyExtraction); \path [line] (REParser) -- (RETransform); \path [line] (RETransform) -| (CUCompiler); \path [line] (LLVMCompiler) -- (Matcher); % Draw seperators \coordinate[right of=REParser, node distance=20em] (SR); \coordinate[left of=REParser, node distance=20em] (SL); \path [seperator] (SL) -- (REParser); \path [seperator] (REParser) -- (SR); % Draw separators \coordinate[right of=REParser, node distance=15em] (SR); \coordinate[left of=REParser, node distance=15em] (SL); \path [separator] (SL) -- (REParser); \path [separator] (REParser) -- (SR); \coordinate[left of=Point, node distance=20em] (PL); \coordinate[right of=Point, node distance=20em] (PR); \path [seperator] (PL) -- (CUCompiler); \path [seperator] (CUCompiler) -- (RECompiler); \path [seperator] (RECompiler) -- (PR); \coordinate[left of=Point, node distance=15em] (PL); \coordinate[right of=Point, node distance=15em] (PR); %\path [separator] (PL) -- (CUCompiler); \path [separator] (CUCompiler) -- (RECompiler); %\path [separator] (RECompiler) -- (PR); \coordinate[right of=PabloCompiler, node distance=20em] (LR); \coordinate[left of=PabloCompiler, node distance=20em] (LL); \path [seperator] (LL) -- (PabloCompiler); \path [seperator] (PabloCompiler) -- (LR); \coordinate[right of=PabloCompiler, node distance=15em] (LR); \coordinate[left of=PabloCompiler, node distance=15em] (LL); \path [separator] (LL) -- (PabloCompiler); \path [separator] (PabloCompiler) -- (LR); \coordinate[right of=LLVMCompiler, node distance=20em] (OR); \coordinate[left of=LLVMCompiler, node distance=20em] (OL); \path [seperator] (OL) -- (LLVMCompiler); \path [seperator] (LLVMCompiler) -- (OR); \coordinate[right of=LLVMCompiler, node distance=15em] (OR); \coordinate[left of=LLVMCompiler, node distance=15em] (OL); \path [separator] (OL) -- (LLVMCompiler); \path [separator] (LLVMCompiler) -- (OR); % Seperator text \end{center} \caption{icGrep Architectural Diagram.} \caption{icGrep Architectural Diagram}\label{fig:compiler} \end{figure} As show in Figure \ref{fig:compiler}, icGrep is composed of three logical layers: \RegularExpression{}, \Pablo{} and the LLVM layer, each with their own intermediate representation (IR), transformation and compilation modules. % As we traverse the layers, the IR becomes significantly more complex as it begins to mirror the final machine code. % The \REParser{} validates and transforms the input \RegularExpression{} into an abstract syntax tree (AST). % The AST is a minimalistic representation that, unlike traditional \RegularExpression{}, is not converted into a NFA or DFA for further processing. % Instead, icGrep passes the AST into the transformation module, which includes a set of \RegularExpression{} specific optimization passes. % The initial \emph{Nullable} pass, determines whether the \RegularExpression{} contains any prefixes or suffixes that may be removed or modified whilst still providing the same number of matches as the original expression. % For example, \verb|a*bc+|'' is equivalent to \verb|bc|'' because the Kleene Star (Plus) operator matches zero (one) or more instances of a specific character. % The \emph{toUTF8} pass converts the characters in the input \RegularExpression{} into the equivalent expression(s) that represent the sequences of 8-bit code units necessary to identify the presence of a particular character. % Since some characters have multiple logically equivalent representations, such as \textcolor{red}{\textbf{????}}, this may produce nested sequences or alternations. % This is described in more detail in \S\ref{sec:Unicode:toUTF8}. % To alleviate this, the final \emph{Simplification} pass flattens nested sequences and alternations into their simplest legal form. % For example, \verba(b((c|d)|e))'' would become \verbab(c|d|e)'' and \verb([0-9]{3,5}){3,5}'', \verb[0-9]{9,25}''. % %% DISCUSS ANALYSIS MODULE? The \RegularExpression{} layer has two compilers: the \CodeUnit{} and \RegularExpressionCompiler{}, both of which produce \Pablo{} IR. % Recall that the \Pablo{} layer assumes a transposed view of the input data. % The \emph{\CodeUnitCompiler{}} transforms the input code unit classes, either extracted from the \RegularExpression{} or produced by the \emph{toUTF8} transformation, into a series of bit stream equations. % The \emph{\RegularExpressionCompiler{}} assumes that these have been calculated and transforms the \RegularExpression{} AST into a sequence of instructions. % For instance, it would convert any alternations into a sequence of calculations that are merged with \verb|OR|s. % The results of these passes are combined and transformed through a series of typical optimization passes, including dead code elimination (DCE), common subexpression elimination (CSE), and constant folding. % These are necessary at this stage because the \RegularExpression{} AST may include common subsequences that are costly to recognize in that form. % Similarly, to keep the \CodeUnitCompiler{} a linear time function, it may introduce redundant IR instructions as it applies traditional Boolean algebra transformations, such as de Morgan's law, to the computed streams. % An intended side-effect of these passes is that they eliminate the need to analyze the data-dependencies inherent in the carry-bit logic, which is necessary for some \Pablo{} instructions but problematic for optimizers to reason about non-conservatively. % The \PabloCompiler{} then converts the \Pablo{} IR into LLVM IR. % This is a relatively straightforward conversion: % the only complexities it introduces is the generation of Phi nodes, linking of statically-compiled functions, and assignment of carry variables. % It produces the dynamically-generated match function used by the icGrep. \begin{figure}[h] \label{fig:execution} \begin{center} \tikzstyle{block} = [rectangle, draw, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em] \tikzstyle{line} = [draw, ->, line width=1.4pt] \tikzstyle{seperator} = [draw, line width=0.125em, dashed] \tikzset{block/.append style={execute at begin node=\footnotesize}} \begin{tikzpicture}[node distance=3cm, auto, >=stealth] % Place nodes \node [draw=none] (InputData) {Input Data}; \node [block, below of=InputData] (S2P) {S2P}; \node [block, below of=S2P] (RequiredStreamsGenerator) {Required Streams Generator}; \node [block, below of=RequiredStreamsGenerator] (JITFunction) {JIT Function}; \node [block, right of=JITFunction, node distance=20em] (NamedPropertyLibaray) {Named Property Library}; \node [block, below of=JITFunction] (MatchScanner) {Match Scanner}; \node [draw=none, below of=MatchScanner, node distance=3.5em] (OutputResult) {Output Result}; % Draw edges \path [line] (InputData) -- (S2P); \path [line] (S2P) -- (RequiredStreamsGenerator); \path [line] (RequiredStreamsGenerator) -- (JITFunction); \path [line] (NamedPropertyLibaray) -- (JITFunction); \path [line] (JITFunction) -- (MatchScanner); \path [line] (MatchScanner) -- (OutputResult); \end{tikzpicture} \end{center} \caption{icGrep Execution Diagram.} \end{figure} As shown in Figure \ref{fig:execution}, icGrep takes the input data and transposed it into 8 parallel bit streams through S2P module. The required streams, e.g. line break stream, can then be generated using the 8 basis bits streams. The JIT function retrieves the 8 basis bits and the required streams from their memory addresses and starts the matching process. Named Property Library that includes all the predefined Unicode categories is installed into JIT function and can be called during the matching process. JIT function returns one bitstream that marks all the matching positions. A match scanner will scan through this bitstream and calculate the total counts or write the context of each match position. We can also apply a pipeline parallelism strategy to further speed up the process of icGrep. S2P and Required Streams Generator can be process in a separate thread and start even before the dynamic compilation starts. The output of S2P and Required Streams Generator, that is the 8 basis bits streams and the required streams, needs to be stored in a shared memory space so that the JIT function can read from it. To be more efficient of memory space usage, we only allocate limit amount of space for the shared data. When each chunk of the shared space is filled up with the bitstream data, the thread will start writing to the first chunk if it is released by JIT function. Otherwise, it will wait for JIT function until it finishes processing that chunk. Therefore, the performance is depended on the slowest thread. In the case that the cost of transposition and required stream generation is more than the matching process, we can further divide up the work and assign two threads for S2P and Required Streams Generator.