1 | \section{Bitwise Data Parallel Paradigm and Methods}\label{sec:seqpar} |
---|
2 | |
---|
3 | The introduction of the method of bitwise data parallelism adds a |
---|
4 | fundamentally new paradigm for regular expression matching to |
---|
5 | complement the traditional approaches using DFAs, NFAs or backtracking. |
---|
6 | Whereas the traditional approaches are all sequential in nature, |
---|
7 | performing some form of state transition processing on one input |
---|
8 | code unit at a time, the bitwise data parallel approach takes a |
---|
9 | conceptually parallel view of the input stream. Rather than |
---|
10 | parallelizing existing sequential approaches, the method introduces |
---|
11 | parallelism from the ground up. |
---|
12 | |
---|
13 | As a new paradigm, there is much research to do in building upon |
---|
14 | the basic framework, characterizing performance depending on |
---|
15 | input patterns and texts, developing methods for special cases and |
---|
16 | so on. Here we make some small contributions to the general |
---|
17 | framework and methods before moving on to discuss Unicode issues. |
---|
18 | |
---|
19 | One important aspect of the bitwise data parallel approach |
---|
20 | is transposition of input data. In previous work, the Parabix transform |
---|
21 | has been reported as imposing an amoritized cost of 1 CPU cycle/input byte, |
---|
22 | when working with SSE2 \cite{lin2012parabix}. This is consistent with \icGrep{} |
---|
23 | results. However, the cost of transposition can |
---|
24 | be hidden through multithreading and pipeline parallelism, having one |
---|
25 | core perform work ahead performing transposition, while another |
---|
26 | comes behind to perform matching. We discuss this further in |
---|
27 | Section \ref{sec:architecture}. |
---|
28 | |
---|
29 | The treatment of character and character class recognition is |
---|
30 | another area of fundamental difference between the traditional |
---|
31 | sequential methods and the bitwise approach. It is here that |
---|
32 | the clearest separation of the sequential and parallel approaches |
---|
33 | occurs. In the sequential approaches, characters are processed |
---|
34 | sequentially with table lookups or jump tables used for each |
---|
35 | transition. In the bitwise data parallel approach, all calculations |
---|
36 | of character class bit streams are done completely in parallel using |
---|
37 | bitwise logic. |
---|
38 | |
---|
39 | In the bitwise paradigm, the MatchStar operation elegantly finds |
---|
40 | all possible matches for Kleene-* repetitions of characters or |
---|
41 | character classes using a single long-stream addition operation. |
---|
42 | Interestingly, the MatchStar operation also has application to |
---|
43 | parallelized long-stream addition\cite{cameron2014bitwise}, as well as use in Myers |
---|
44 | bit-parallel edit distance algorithm\cite{myers1999fast}. In the next section, |
---|
45 | we show how MatchStar can be extended for UTF-8 sequences. |
---|
46 | |
---|
47 | We have incorporated an elegant technique for bounded repetitions |
---|
48 | in \icGrep{}. This technique allows the matches to $C{m,n}$ for some |
---|
49 | character class $C$, lower bound $m$ and upper bound $n$ to be determined |
---|
50 | in $\lceil{\log_2 m}\rceil + \lceil{\log_2{n-m}}\rceil$ steps. |
---|
51 | Let $C_k$ be the bit stream identifying positions at which the $k$ |
---|
52 | prior input bytes are all in $C$. Then the observation that |
---|
53 | $C_{2k} = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower |
---|
54 | bound to be determined. An upper bound $k$ similarly involves excluding |
---|
55 | those positions not within $k$ of the pending markers |
---|
56 | (from the previous match step). |
---|
57 | |
---|
58 | A final general technique worth mentioning is that related to |
---|
59 | input skipping. For sequential matching, |
---|
60 | the Boyer-Moore method is well known for the |
---|
61 | possible skipping through input positions of up to the length of the |
---|
62 | search string for fixed-string search. NR-grep \cite{navarro2001nr} |
---|
63 | extends this skipping to regular expression search using the BNDM |
---|
64 | (backward non-determinstic dawg matching) method. Is there an |
---|
65 | input-skipping method for the bitwise parallel paradigm? The answer |
---|
66 | is yes: whenever the bit vector of match positions in play for the current |
---|
67 | input block reduce to all zero, the remainder of the pattern can be |
---|
68 | skipped for processing the block. This method has been implemented in \icGrep{}. |
---|
69 | |
---|
70 | |
---|
71 | |
---|
72 | |
---|