\section{Bitwise Data Parallel Paradigm and Methods}\label{sec:seqpar}
The introduction of the method of bitwise data parallelism adds a
fundamentally new paradigm for regular expression matching to
complement the traditional approaches using DFAs, NFAs or backtracking.
Whereas the traditional approaches are all sequential in nature,
performing some form of state transition processing on one input
code unit at a time, the bitwise data parallel approach takes a
conceptually parallel view of the input stream. Rather than
parallelizing existing sequential approaches, the method introduces
parallelism from the ground up.
As a new paradigm, there is much research to do in building upon
the basic framework, characterizing performance depending on
input patterns and texts, developing methods for special cases and
so on. Here we make some small contributions to the general
framework and methods before moving on to discuss Unicode issues.
One important aspect of the bitwise data parallel approach
is transposition of input data. In previous work, the Parabix transform
has been reported as imposing an amoritized cost of 1 CPU cycle/input byte,
when working with SSE2 \cite{lin2012parabix}. This is consistent with {\tt icGrep}
results. However, the cost of the cost of this transposition can
be hidden through multithreading and pipeline parallelism, having one
core perform work ahead performing transposition, while another
comes behind to perform matching. We discuss this further in
Section \ref{sec:architecture}.
The treatment of character and character class recognition is
another area of fundamental difference between the traditional
sequential methods and the bitwise approach. It is here that
the clearest separation of the sequential and parallel approaches
occurs. In the sequential approaches, characters are processed
sequentially with table lookups or jump tables used for each
transition. In the bitwise data parallel approach, all calculations
of character class bit streams are done completely in parallel using
bitwise logic.
In the bitwise paradigm, the MatchStar operation elegantly finds
all possible matches for Kleene-* repetitions of characters or
character classes using a single long-stream addition operation.
Interestingly, the MatchStar operation also has application to
parallelized long-stream addition\cite{cameron2014bitwise}, as well as use in Myers
bit-parallel edit distance algorithm\cite{myers1999fast}. In the next section,
we show how MatchStar can be extended for UTF-8 sequences.
We have incorporated an elegant technique for bounded repetitions
in icGrep. This technique allows the matches to $C{m,n}$ for some
character class $C$, lower bound $m$ and upper bound $n$ to be determined
in $\lceil{\log_2 m}\rceil + \lceil{\log_2{n-m}}\rceil$ steps.
Let $C_k$ be the bit stream identifying positions at which the $k$
prior input bytes are all in $C$. Then the observation that
$C_{2k} = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower
bound to be determined. An upper bound $k$ similarly involves excluding
those positions not within $k$ of the pending markers
(from the previous match step).
A final general technique worth mentioning is that related to
input skipping. For sequential matching,
the Boyer-Moore method is well known for the
possible skipping through input positions of up to the length of the
search string for fixed-string search. NR-grep \cite{navarro2001nr}
extends this skipping to regular expression search using the BNDM
(backward non-determinstic dawg matching) method. Is there an
input-skipping method for the bitwise parallel paradigm? The answer
is yes: whenever the bit vector of match positions in play for the current
input block reduce to all zero, the remainder of the pattern can be
skipped for processing the block. This method has been implemented in icGrep.