Changeset 4504 for docs/Working/icGrep


Ignore:
Timestamp:
Feb 11, 2015, 7:34:40 PM (4 years ago)
Author:
nmedfort
Message:

Small fix for unicode matching example along with formatting changes

Location:
docs/Working/icGrep
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/background.tex

    r4494 r4504  
    5252position of ``d'' characters and {\tt [a-z]} for the stream of lower case
    5353ASCII alphabetics are first computed in a fully data-parallel manner.
    54 Then the matching process proper begins taking advance of bitwise logic
     54Then the matching process proper begins taking advantage of bitwise logic
    5555and shifting operations as well as an operation for finding all matches to a
    5656character class repetition known as MatchStar.  At each step of the
     
    6060
    6161\begin{figure}
     62\vspace{-0.5em}
    6263\begin{center}
    6364\begin{tabular}{cr}\\
     
    6970\end{tabular}
    7071\end{center}
     72\vspace{-1em}
    7173\caption{Matching {\tt d[a-z]*ed} Using Bitwise Data Parallelism}\label{fig:bitwisematch}
     74\vspace{-0.5em}
    7275\end{figure}
    7376
     
    8386{\tt [a-z]*} ($M_2$).   This produces pending matches at many positions.  However, there
    8487is no need to consider these matches one at a time using lazy or greedy matching strategies.
    85 Rather, the full marker stream $M_3$ of remaining possibilites after matching {\tt [e]} is easily
     88Rather, the full marker stream $M_3$ of remaining possibilities after matching {\tt [e]} is easily
    8689computed using bitwise logic and shift.
    8790The final step produces marker stream $M_4$ indicating the single position
  • docs/Working/icGrep/fig-compiler.tex

    r4500 r4504  
    11
    22\begin{figure}[tbh]
     3\vspace{-2em}
    34\begin{center}
    45
     
    8182
    8283\end{center}
     84\vspace{-1em}
    8385\caption{icGrep compilation architecture}\label{fig:compiler}
     86\vspace{-1em}
    8487\end{figure}
  • docs/Working/icGrep/fig-executor.tex

    r4499 r4504  
    3131
    3232    \begin{pgfonlayer}{threads}
    33         \path (S2P.north west)+(-.1,.3) node (a) {};
     33        \path (S2P.north west)+(-.1,.5) node (a) {};
    3434        \path (S2P.south east)+(+1.7,-0.1) node (b) {};
    3535        \path[fill=green!20,rounded corners, draw=green, solid] (a) rectangle (b);
    36         \node [draw=none,above=-0.1cm of S2P.north east] (t1) {Transposition Thread};
     36        \node [draw=none,above=-0.04cm of S2P.north east] (t1) {Transposition Thread};
    3737
    38         \path (RequiredStreamsGenerator.north west)+(-.1,.3) node (a) {};
     38        \path (RequiredStreamsGenerator.north west)+(-.1,.38) node (a) {};
    3939        \path (RequiredStreamsGenerator.south east)+(+2,-0.1) node (b) {};
    4040        \path[fill=blue!20,rounded corners, draw=blue, solid] (a) rectangle (b);
    41         \node [draw=none,above=-0.1cm of RequiredStreamsGenerator.north east] (t1) {Stream Generator Thread};
     41        \node [draw=none,above=-0.04cm of RequiredStreamsGenerator.north east] (t1) {Stream Generator Thread};
    4242
    43         \path (JITFunction.north west)+(-.1,.3) node (a) {};
    44         \path (NamedPropertyLibrary.south east)+(+.1,-0.1) node (b) {};
     43        \path (JITFunction.north west)+(-.1,.38) node (a) {};
     44        \path (NamedPropertyLibrary.east |- MatchScanner.south)+(+.1,-0.1) node (b) {};
    4545        \path[fill=red!20,rounded corners, draw=red, solid] (a) rectangle (b);
    46         \node [draw=none,above=-0.1cm of JITFunction.north east] (t1) {Matcher Thread};
     46        \node [draw=none,above=-0.04cm of JITFunction.north east] (t1) {Matcher Thread};
    4747
    4848    \end{pgfonlayer}{threads}
  • docs/Working/icGrep/introduction.tex

    r4490 r4504  
    2727Using parallel methods to accelerate matching of a single pattern on a
    2828single input stream is more difficult.  Indeed, of the 13 dwarves identified in the Berkeley overview of parallel computing research,
    29 finite state machines (FSMs) are considered the hardest to parallelize (embarassingly sequential) \cite{asanovic2006landscape}.
     29finite state machines (FSMs) are considered the hardest to parallelize (embarrassingly sequential) \cite{asanovic2006landscape}.
    3030However, some success has been reported recently along two independent lines of
    3131research.   Mytkowicz et al \cite{mytkowicz2014data} use SIMD shuffle operations
  • docs/Working/icGrep/unicode-re.tex

    r4499 r4504  
    3030character to the next. 
    3131
    32 In order to address the requirements of Unicode advance, we use
    33 the $\mathit{ScanThru}$ \cite{cameron2011parallel} operation to move a set of markers
     32To address the requirements of Unicode advance, we use
     33the ScanThru~\cite{cameron2011parallel} operation to move a set of markers
    3434each through the nonfinal bytes of UTF-8 sequences to the final
    35 byte position.   Figure \ref{fig:multibytesequence} shows this
     35byte position.
     36Figure~\ref{fig:multibytesequence} shows this
    3637technique in operation in the case of advancing through byte
    37 sequences (each 3 bytes in length) corresponding to Chinese characters.   
    38 To better demonstrate the process, we use \emph{ni3}, \emph{hao} and \emph{men} to represent these characters.
    39 $CC_{\mbox{ni3}}$ is the bitstream that marks character \emph{ni3} and $CC_{\mbox{hao}}$ is the bitstream that marks character stream \emph{hao}.
    40 To match a two UTF-8 character sequence \emph{ni3hao}, we first create an \emph{Initial} steam that marks the first byte of all the valid characters.
    41 We also produce a \emph{NonFinal} stream that marks every byte of all the multibyte characters except for the last byte.
    42 Using \emph{Initial} to ScanThru \emph{NonFinal}, we get the bitstream $M_2$, which marks the positions of the last byte of every character.
    43 An overlap between $M_2$ and $CC_{\mbox{ni3}}$ gives the start position for matching the next character.
    44 As illustrated by \emph{Adv}, we find two matches for \emph{ni3} and from these two positions we can start the matching process for the next character \emph{hao}.
    45 The final result stream shows 1 match for the multibyte sequence ni3hao.
     38sequences (each 3 bytes in length) corresponding to Chinese characters.
     39To better demonstrate the process, we use \texttt{ni3}, \texttt{hao}
     40and \texttt{men} to represent these characters.
     41$\text{CC}_{\texttt{ni3}}$ is the bitstream that marks character
     42\texttt{ni3} and $\text{CC}_{\texttt{hao}}$ is the bitstream that
     43marks character \texttt{hao}.
     44To match a two UTF-8 character sequence \texttt{ni3hao}, we first
     45create an Initial steam that marks the first byte of all the valid characters.
     46We also produce a NonFinal stream that marks every byte of all
     47multibyte characters \emph{except for} the last byte.
     48Using Initial to ScanThru NonFinal, we construct bitstream $M_1$, which
     49marks the positions of the last byte of every character.
     50An overlap between $M_1$ and $\text{CC}_{\texttt{ni3}}$ gives the start
     51position for matching the next character.
     52As illustrated by $M_2$, we find two matches for \texttt{ni3},
     53and from these two positions we can start the matching process for
     54the next character \texttt{hao}.
     55The final result stream $M_4$ shows one match for the multibyte sequence
     56\texttt{ni3hao}.
    4657
    4758\begin{figure}[tbh]
     59\vspace{-0.5em}
    4860\begin{center}
    4961%\begin{tabular}{cr}\\
    50 
    51 \begin{tabular}{r@{\hspace{1em}}r}\\
    52 input data  & \verb`ni3hao(Hello),ni3men(You),`\\
    53 $CC_{\mbox{ni3}}$ & \verb`..1.............1.........`\\
    54 $CC_{\mbox{hao}}$ & \verb`.....1....................`\\
    55 \emph{Initial} & \verb`1..1..111111111..1..111111`\\
    56 \emph{NonFinal} & \verb`11.11.........11.11.......`\\
    57 $M_1 = \mathit{ScanThru}(\mathit{Initial}, \mathit{NonFinal})$ & \verb`..1..111111111..1..1111111`\\
    58 $\mathit{Adv} = \mathit{Advance}(M_1 \land CC_{\mbox{ni3}})$ & \verb`...1.............1........`\\
    59 $M_2 = \mathit{ScanThru}(\mathit{Initial} \land \mathit{Adv}, \mathit{NonFinal})$ & \verb`.....1.............1......`\\
    60 $\mathit{match} = M_2 \land CC_{\mbox{hao}}$ & \verb`.....1....................`
     62\begin{tabular}{c@{\hspace{1em}}r}\\
     63input data                                                         & \verb`ni3hao(Hello),ni3men(You),`\\
     64$\text{CC}_{\text{ni3}}$                                           & \verb`..1.............1.........`\\
     65$\text{CC}_{\text{hao}}$                                           & \verb`.....1....................`\\
     66Initial                                                            & \verb`1..1..111111111..1..111111`\\
     67NonFinal                                                           & \verb`11.11.........11.11.......`\\
     68$M_1 = \text{ScanThru}(\text{Initial}, \text{NonFinal})$           & \verb`..1..111111111..1..1111111`\\
     69$M_2 = \text{Advance}(M_1 \land \text{CC}_{\text{ni3}})$           & \verb`...1.............1........`\\
     70$M_3 = \text{ScanThru}(\text{Initial} \land M_2, \text{NonFinal})$ & \verb`.....1.............1......`\\
     71$M_4 = M_3 \land CC_{\text{hao}}$                                  & \verb`.....1....................`
    6172\end{tabular}
    6273\end{center}
     74\vspace{-1em}
    6375\caption{Processing of a Multibyte Sequence ni3hao}
    6476\label{fig:multibytesequence}
     77\vspace{-0.5em}
    6578\end{figure}
    6679
    6780\paragraph{Unicode MatchStar.}
    68 
    6981The $\mathit{MatchStar}(M, C)$ operation directly implements
    7082the operation of finding all positions reachable from a
     
    7486{\em final} positions.   Thus the one bits of a Unicode character
    7587class stream are not necessarily contiguous.  This in turn
    76 means that the carry propagation within the MatchStar
     88means that carry propagation within the MatchStar
    7789operation may terminate prematurely.
    7890
Note: See TracChangeset for help on using the changeset viewer.