source: docs/ICA3PP2015/paradigm.tex @ 5789

Last change on this file since 5789 was 4474, checked in by nmedfort, 5 years ago

Minor typo + figure

File size: 3.8 KB
1\section{Bitwise Data Parallel Paradigm and Methods}\label{sec:seqpar}
3The introduction of the method of bitwise data parallelism adds a
4fundamentally new paradigm for regular expression matching to
5complement the traditional approaches using DFAs, NFAs or backtracking.
6Whereas the traditional approaches are all sequential in nature,
7performing some form of state transition processing on one input
8code unit at a time, the bitwise data parallel approach takes a
9conceptually parallel view of the input stream.   Rather than
10parallelizing existing sequential approaches, the method introduces
11parallelism from the ground up.   
13As a new paradigm, there is much research to do in building upon
14the basic framework, characterizing performance depending on
15input patterns and texts, developing methods for special cases and
16so on.   Here we make some small contributions to the general
17framework and methods before moving on to discuss Unicode issues.
19One important aspect of the bitwise data parallel approach
20is transposition of input data.   In previous work, the Parabix transform
21has been reported as imposing an amoritized cost of 1 CPU cycle/input byte,
22when working with SSE2 \cite{lin2012parabix}.  This is consistent with \icGrep{}
23results.  However, the cost of transposition can
24be hidden through multithreading and pipeline parallelism, having one
25core perform work ahead performing transposition, while another
26comes behind to perform matching.  We discuss this further in
27Section \ref{sec:architecture}.
29The treatment of character and character class recognition is
30another area of fundamental difference between the traditional
31sequential methods and the bitwise approach.   It is here that
32the clearest separation of the sequential and parallel approaches
33occurs.   In the sequential approaches, characters are processed
34sequentially with table lookups or jump tables used for each
35transition.   In the bitwise data parallel approach, all calculations
36of character class bit streams are done completely in parallel using
37bitwise logic. 
39In the bitwise paradigm, the MatchStar operation elegantly finds
40all possible matches for Kleene-* repetitions of characters or
41character classes using a single long-stream addition operation.
42Interestingly, the MatchStar operation also has application to
43parallelized long-stream addition\cite{cameron2014bitwise}, as well as use in Myers
44bit-parallel edit distance algorithm\cite{myers1999fast}.  In the next section,
45we show how MatchStar can be extended for UTF-8 sequences.
47We have incorporated an elegant technique for bounded repetitions
48in \icGrep{}.   This technique allows the matches to $C{m,n}$ for some
49character class $C$, lower bound $m$ and upper bound $n$ to be determined
50in $\lceil{\log_2 m}\rceil + \lceil{\log_2{n-m}}\rceil$ steps. 
51Let $C_k$ be the bit stream identifying positions at which the $k$
52prior input bytes are all in $C$.   Then the observation that
53$C_{2k} = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower
54bound to be determined.   An upper bound $k$ similarly involves excluding
55those positions not within $k$ of the pending markers
56(from the previous match step).
58A final general technique worth mentioning is that related to
59input skipping.  For sequential matching,
60the Boyer-Moore method is well known for the
61possible skipping through input positions of up to the length of the
62search string for fixed-string search.   NR-grep \cite{navarro2001nr} 
63extends this skipping to regular expression search using the BNDM
64(backward non-determinstic dawg matching) method.  Is there an
65input-skipping method for the bitwise parallel paradigm?  The answer
66is yes: whenever the bit vector of match positions in play for the current
67input block reduce to all zero, the remainder of the pattern can be
68skipped for processing the block.   This method has been implemented in \icGrep{}.
Note: See TracBrowser for help on using the repository browser.