Sep 13, 2013, 6:56:06 AM (6 years ago)


1 edited


  • docs/Working/re/ppopp-re.tex

    r3468 r3469  
    185185The remainder of this paper is organized as follows.
    186 Section \ref{sec:unbounded} presents our basic algorithm using
    187 a model of unbounded bitwise data parallelism. 
    188 Section \ref{sec:regexp}
     186Section \ref{sec:bitwise} presents our basic algorithm and MatchStar
     187using a model of arbitrary-length bit-parallel data streams.
     188Section \ref{sec:blockwise} discusses the block-by-block
     189implementation of our techniques including the long stream
     190addition techniques for 256-bit addition with AVX2 and
     1914096-bit additions with GPGPU SIMT.
    189192Section \ref{sec:analysis}
    190193Section \ref{sec:SSE2}
    196 \section{An Unbounded Model for Bitwise Regular Expression Matching}\label{sec:unbounded}
    198 Whereas all traditional regular expression matching engines are
    199 fundamentally based on an element-at-a-time processing model,
    200 the approach we introduce in this paper is based on quite a
    201 different conceptual model: processing all elements of an input data
    202 stream simultaneously.    Depending on the available parallel
    203 processing resources, an actual implementation may divide an
    204 input stream into segments and process the segments
    205 sequentially.   Within each segment, all elements of the input
    206 stream are processed in parallel.   The approach is inherently
    207 scalable and can be adapted to a variety of parallel architectures.
    208 The fundamental primitives required are bitwise logic and
    209 long-stream addition. 
    211 A key concept in this approach is the derivation of bit streams
     199\section{Matching with Bit-Parallel Data Streams}\label{sec:bitwise}
     201Whereas the traditional approaches to regular expression matching
     202using NFAs, DFAs or backtracking all rely on a byte-at-a-time
     203processing model, the approach  we introduce in this paper is based
     204on quite a different concept:  a data-parallel approach to simultaneous
     205processing of data stream elements.  Indeed, our most abstract model
     206is that of unbounded data parallelism: processing all elements of
     207the input data stream simultaneously.   In essence, we view
     208data streams as (very large) integers.   The fundamental operations
     209we apply are based on bitwise logic and long-stream addition.
     211Depending on the available parallel processing resources, an actual
     212implementation may divide an input stream into blocks  and process
     213the blocks sequentially.   Within each block  all elements of the
     214input stream are processed together, relying the availability of
     215bitwise logic and addition scaled to the block size.   On commodity
     216Intel and AMD processors with 128-bit SIMD capabilities (SSE2),
     217we typically process input streams 128 bytes at a time.   In this
     218case, we rely on the Parabix tool chain \cite{lin2012parabix}
     219to handle the details of compilation to block-by-block processing.
     220As weh show later, however, we have also adapted Parabix technology to processing
     221blocks of 4K bytes at time in our GPGPU implementation, relying
     222on our implementation of 4096-bit addition implemented using
     223SIMT processing of 64 threads.
     225A key concept in this streaming approach is the derivation of bit streams
    212226that are parallel to the input data stream, i.e., in one-to-one
    213227correspondence with the data element positions of the input
    225239or not.   {\bf perhaps a figure here.}
    227 Using the SIMD capabilities of commodity processors,
    228 the basis bit streams and character class bit streams may be
    229 efficiently computed using the Parabix framework that
    230 has been developed and used extensively for high-performance
    231 XML parsing \cite{}.  For example, working with the 128-bit
    232 SIMD registers on commodity Intel and AMD processors,
    233 the Parabix framework can be employed to process input
    234 streams in blocks of 128 bytes at a time.   However, the
    235 general model is not dependent on any particular data block size
    236 and can be scaled for arbitrary sized blocks depending on
    237 available resources.   
    239 \section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp}
    242 \subsection{Bitwise Data Parallelism vs. Bit-Parallel State Transition}
    243 The bitap algorithms use bitwise operations in an orthogonal way.
     242\section{Block-at-a-Time Processing}\label{sec:blockwise}
    245247\section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis}
Note: See TracChangeset for help on using the changeset viewer.