Index: /docs/Working/re/ppoppre.tex
===================================================================
 /docs/Working/re/ppoppre.tex (revision 3521)
+++ /docs/Working/re/ppoppre.tex (revision 3522)
@@ 91,31 +91,43 @@
Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions
to nondeterministic finite automata (NFA).
Following Thompson's approach, a regular expression of length $m$ is first converted
to an NFA with $O(m)$ nodes. It is then possible to search a text of length $n$ using the
+Following Thompson's approach, a regular expression of length $m$ is converted
+to an NFA with $O(m)$ states. It is then possible to search a text of length $n$ using the
NFA in worst case $O(mn)$ time. Often, a more efficient choice
is to convert the NFA into a DFA. A DFA has only a single active state at any time
in the matching process and
hence it is possible to search a text at of length $n$ in $O(n)$ time.
However, it is well known that the conversion of an NFA to an equivalent DFA may result
in state explosion. That is, the number of resultant DFA states may increase exponentially.
In \cite{baeza1992new} text searching was
proposed based on bitparallelism.
The technique takes advantage of the intrinsic parallelism of bitwise operations
within a computer word. Given a $w$bit word, the ShiftOr algorithm \cite{Baezayates_anew} algorithm uses the
bitparallel approach to
simulate an NFA in $O(nm/w)$ worstcase time.

A disadvantage of the bitparallel ShiftOr pattern matching approach
in comparison to simple string matching algorithms is an inability to skip input characters.
For example, the BoyerMoore family of algorithms \cite{boyer1977fast} skip input characters
to achieve sublinear times in the average case. Backward Dawg Matching
(BDM) string matching algorithms \cite{crochemore1994text} based on suffix automata are able to skip characters.
+is to convert an NFA into a DFA. A DFA has a single active state at any time
+throughout the matching process and
+hence it is possible to search a text at of length $n$ in $O(n)$ time
+\footnote{It is well known that the conversion of an NFA to an equivalent DFA may result
+in state explosion. That is, the number of resultant DFA states may increase exponentially.}.
+
+A significant portion of the research in fast regular expression matching can be
+regarded as the ``quest for efficient automata'' \cite{navarro98fastand}.
+In \cite{baeza1992new}, BaezaYates and Gonnet
+presented a new approach to string search based on bitparallelism.
+This technique takes advantage of the intrinsic parallelism of bitwise operations
+within a computer word.
+Given a $w$bit word, the number of operations that a string search algorithms
+performs can be reduced by a factor $w$.
+Using this fact, the ShiftOr algorithm simulates an NFA using
+bitwise operations and achieves $O(nm/w)$ worstcase time \cite{navarro2000}.
+A disadvantage of the bitparallel ShiftOr pattern matching approach
+is an inability to skip input characters.
+Simple string matching algorithms,
+such as the BoyerMoore family of algorithms \cite{boyer1977fast, horspool1980practical} skip input characters
+to achieve sublinear times in the average case.
+Backward Dawg Matching (BDM) string matching algorithms \cite{crochemore1994text}
+based on suffix automata are able to skip characters.
The Backward Nondeterministic Dawg Matching (BNDM) pattern matching algorithm \cite{wu1992fast}
combines the bitparallel advantages of ShiftOr and with the character skipping advantages of the BDM algorithm.
The nrgrep pattern matching tool is built over the BNDM algorithm,
and hence the name nrgrep \cite{navarro2000}.

There has been considerable interest in using parallelization techniques
to improve the performance of regular expression matching on parallel hardware
+combines the bitparallel advantages of the ShiftOr approach
+with the character skipping property of BDM algorithms. The nrgrep pattern matching tool,
+is built over the BNDM algorithm. Prior to the dataparallel approach to simultaneous
+processing of data stream elements as discussed herein, nrgrep
+was by far the fastest grep tool
+for matching complex patterns and achieved similar performance
+to that of the fastest existing string
+matching tools for simple patterns \cite{navarro2000}.
+
+There has been considerable
+interest in accelerating regular expression matching
+on parallel hardware
such as multicore processors (CPUs), graphics processing units (GPUs),
fieldprogrammable gate arrays (FPGAs), and specialized architectures such as
@@ 128,23 +140,23 @@
performs smallruleset regular expression matching at a rate of
2.88 Gbps per chip on Intel Xeon E5472 hardware \cite{pasetto2010}.
Naghmouchi et al. demonstrated that the AhoCorasick (AC)
+Naghmouchi et al. \cite{scarpazza2011top, naghmouchi2010} demonstrated that the AhoCorasick (AC)
string matching algorithm \cite{aho1975} is well suited for parallel
implementation on multicore CPUs, GPUs and the Cell BE \cite{scarpazza2011top, naghmouchi2010}.
On each hardware, both threadlevel parallelism (additional cores) and datalevel parallelism
(wide SIMD units) are leveraged for performance.
Salapura et. al., advocated the use of vectorstyle processing for regular expressions
+implementation on multicore CPUs, GPUs and the Cell BE.
+On each hardware, both threadlevel parallelism (cores) and datalevel parallelism
+(SIMD units) were leveraged for performance.
+Salapura et. al. \cite{salapura2012accelerating} advocated the use of vectorstyle processing for regular expressions
in business analytics applications and leveraged the SIMD hardware available
on multicore processors to acheive a speedup of better than 1.8 over a
range of data sizes of interest \cite{salapura2012accelerating}.
+on multicore processors to acheive a speedup of greater than 1.8 over a
+range of data sizes of interest.
%Cell
In \cite{scarpazza2008}, Scarpazza and Russell presented a SIMD tokenizer
that delivered 1.00â1.78 Gbps on a single
+that delivered 1.001.78 Gbps on a single
Cell BE chip and extended this approach for emulation on the Intel Larrabee
instruction set \cite{scarpazza2009larrabee}.
On the Cell BE, Scarpazza \cite{scarpazza2009cell} described a pattern matching
implementation that delivered a throughput of 40
Gbps for a small dictionary of approximately 100 patterns, and a throughput of 3.33.4
+Gbps for a small dictionary of approximately 100 patterns and a throughput of 3.33.4
Gbps for a larger dictionary of thousands of patterns. Iorio and van Lunteren \cite{iorio2008}
presented a string matching implementation for automata that achieves
+presented a string matching implementation for automata that achieved
4 Gbps on the Cell BE.
% GPU
Index: /docs/Working/re/reference.bib
===================================================================
 /docs/Working/re/reference.bib (revision 3521)
+++ /docs/Working/re/reference.bib (revision 3522)
@@ 323,2 +323,22 @@
address = {Berlin, Heidelberg},
}
+
+@article{navarro98fastand,
+ author = {Gonzalo Navarro and Mathieu Raffinot},
+ title = {Fast and Flexible String Matching by Combining Bitparallelism and Suffix Automata},
+ journal = {ACM JOURNAL OF EXPERIMENTAL ALGORITHMICS (JEA)},
+ year = {1998},
+ volume = {5},
+ pages = {2000}
+}
+
+@article{horspool1980practical,
+ title={Practical fast searching in strings},
+ author={Horspool, R Nigel},
+ journal={Software: Practice and Experience},
+ volume={10},
+ number={6},
+ pages={501506},
+ year={1980},
+ publisher={Wiley Online Library}
+}