Index: docs/Working/re/abstract.tex
===================================================================
 docs/Working/re/abstract.tex (revision 3468)
+++ docs/Working/re/abstract.tex (revision 3469)
@@ 1,4 +1,4 @@
New parallel algorithms for the classical grep (get regular expression
pattern) problem are introduced together with implementations using
+New parallel algorithms for the classical grep (global regular expression
+print) problem are introduced together with implementations using
commodity SIMD and GPU technologies. Building on the bitwise
data parallelism underlying previous work in Unicode transcoding and
Index: docs/Working/re/ppoppre.tex
===================================================================
 docs/Working/re/ppoppre.tex (revision 3468)
+++ docs/Working/re/ppoppre.tex (revision 3469)
@@ 184,7 +184,10 @@
The remainder of this paper is organized as follows.
Section \ref{sec:unbounded} presents our basic algorithm using
a model of unbounded bitwise data parallelism.
Section \ref{sec:regexp}
+Section \ref{sec:bitwise} presents our basic algorithm and MatchStar
+using a model of arbitrarylength bitparallel data streams.
+Section \ref{sec:blockwise} discusses the blockbyblock
+implementation of our techniques including the long stream
+addition techniques for 256bit addition with AVX2 and
+4096bit additions with GPGPU SIMT.
Section \ref{sec:analysis}
Section \ref{sec:SSE2}
@@ 194,20 +197,31 @@
\section{An Unbounded Model for Bitwise Regular Expression Matching}\label{sec:unbounded}

Whereas all traditional regular expression matching engines are
fundamentally based on an elementatatime processing model,
the approach we introduce in this paper is based on quite a
different conceptual model: processing all elements of an input data
stream simultaneously. Depending on the available parallel
processing resources, an actual implementation may divide an
input stream into segments and process the segments
sequentially. Within each segment, all elements of the input
stream are processed in parallel. The approach is inherently
scalable and can be adapted to a variety of parallel architectures.
The fundamental primitives required are bitwise logic and
longstream addition.

A key concept in this approach is the derivation of bit streams
+\section{Matching with BitParallel Data Streams}\label{sec:bitwise}
+
+Whereas the traditional approaches to regular expression matching
+using NFAs, DFAs or backtracking all rely on a byteatatime
+processing model, the approach we introduce in this paper is based
+on quite a different concept: a dataparallel approach to simultaneous
+processing of data stream elements. Indeed, our most abstract model
+is that of unbounded data parallelism: processing all elements of
+the input data stream simultaneously. In essence, we view
+data streams as (very large) integers. The fundamental operations
+we apply are based on bitwise logic and longstream addition.
+
+Depending on the available parallel processing resources, an actual
+implementation may divide an input stream into blocks and process
+the blocks sequentially. Within each block all elements of the
+input stream are processed together, relying the availability of
+bitwise logic and addition scaled to the block size. On commodity
+Intel and AMD processors with 128bit SIMD capabilities (SSE2),
+we typically process input streams 128 bytes at a time. In this
+case, we rely on the Parabix tool chain \cite{lin2012parabix}
+to handle the details of compilation to blockbyblock processing.
+As weh show later, however, we have also adapted Parabix technology to processing
+blocks of 4K bytes at time in our GPGPU implementation, relying
+on our implementation of 4096bit addition implemented using
+SIMT processing of 64 threads.
+
+A key concept in this streaming approach is the derivation of bit streams
that are parallel to the input data stream, i.e., in onetoone
correspondence with the data element positions of the input
@@ 225,21 +239,9 @@
or not. {\bf perhaps a figure here.}
Using the SIMD capabilities of commodity processors,
the basis bit streams and character class bit streams may be
efficiently computed using the Parabix framework that
has been developed and used extensively for highperformance
XML parsing \cite{}. For example, working with the 128bit
SIMD registers on commodity Intel and AMD processors,
the Parabix framework can be employed to process input
streams in blocks of 128 bytes at a time. However, the
general model is not dependent on any particular data block size
and can be scaled for arbitrary sized blocks depending on
available resources.

\section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp}


\subsection{Bitwise Data Parallelism vs. BitParallel State Transition}
The bitap algorithms use bitwise operations in an orthogonal way.
+
+\section{BlockataTime Processing}\label{sec:blockwise}
+
+
+
\section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis}