Changeset 3463

Sep 11, 2013, 9:15:51 PM (5 years ago)

Unbounded parallelism paras

1 edited


  • docs/Working/re/ppopp-re.tex

    r3459 r3463  
    112112\section{Unbounded Bitwise Data Parallelism}\label{sec:bitwise}
     114Whereas all traditional regular expression matching engines are
     115fundamentally based on an element-at-a-time processing model,
     116the approach we introduce in this paper is based on quite a
     117different conceptual model: processing all elements of an input data
     118stream simultaneously.    Depending on the available parallel
     119processing resources, an actual implementation may divide an
     120input stream into segments and process the segments
     121sequentially.   Within each segment, all elements of the input
     122stream are processed in parallel.   The approach is inherently
     123scalable and can be adapted to a variety of parallel architectures.
     124The fundamental primitives required are bitwise logic and
     125long-stream addition. 
     127A key concept in this approach is the derivation of bit streams
     128that are parallel to the input data stream, i.e., in one-to-one
     129correspondence with the data element positions of the input
     130streams.   Typically, the input stream is a byte stream comprising
     131the 8-bit character code units of a particular encoding such
     132as extended ASCII, ISO-8859-1 or UTF-8.   However, the method may also
     133easily be used with wider code units such as the 16-bit code units of
     134UTF-16.   In the case of a byte stream, the first step is to transpose
     135the byte stream into eight parallel bit streams, such that bit stream
     136$i$ comprises the $i$\th bit of each byte.   These streams form
     137a set of basis bit streams from which many other parallel bit
     138streams can be calculated, such as character class bit
     139streams such that each bit $j$ of the stream specifies
     140whether character $j$ of the input stream is in the class
     141or not.   {\bf perhaps a figure here.}
     143Using the SIMD capabilities of commodity processors,
     144the basis bit streams and character class bit streams may be
     145efficiently computed using the Parabix framework that
     146has been developed and used extensively for high-performance
     147XML parsing \cite{}.  For example, working with the 128-bit
     148SIMD registers on commodity Intel and AMD processors,
     149the Parabix framework can be employed to process input
     150streams in blocks of 128 bytes at a time.   However, the
     151general model is not dependent on any particular data block size
     152and can be scaled for arbitrary sized blocks depending on
     153available resources.   
    114155\section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp}
Note: See TracChangeset for help on using the changeset viewer.