Changeset 3469
 Timestamp:
 Sep 13, 2013, 6:56:06 AM (5 years ago)
 Location:
 docs/Working/re
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/re/abstract.tex
r3465 r3469 1 New parallel algorithms for the classical grep (g etregular expression2 p attern) problem are introduced together with implementations using1 New parallel algorithms for the classical grep (global regular expression 2 print) problem are introduced together with implementations using 3 3 commodity SIMD and GPU technologies. Building on the bitwise 4 4 data parallelism underlying previous work in Unicode transcoding and 
docs/Working/re/ppoppre.tex
r3468 r3469 184 184 185 185 The remainder of this paper is organized as follows. 186 Section \ref{sec:unbounded} presents our basic algorithm using 187 a model of unbounded bitwise data parallelism. 188 Section \ref{sec:regexp} 186 Section \ref{sec:bitwise} presents our basic algorithm and MatchStar 187 using a model of arbitrarylength bitparallel data streams. 188 Section \ref{sec:blockwise} discusses the blockbyblock 189 implementation of our techniques including the long stream 190 addition techniques for 256bit addition with AVX2 and 191 4096bit additions with GPGPU SIMT. 189 192 Section \ref{sec:analysis} 190 193 Section \ref{sec:SSE2} … … 194 197 195 198 196 \section{An Unbounded Model for Bitwise Regular Expression Matching}\label{sec:unbounded} 197 198 Whereas all traditional regular expression matching engines are 199 fundamentally based on an elementatatime processing model, 200 the approach we introduce in this paper is based on quite a 201 different conceptual model: processing all elements of an input data 202 stream simultaneously. Depending on the available parallel 203 processing resources, an actual implementation may divide an 204 input stream into segments and process the segments 205 sequentially. Within each segment, all elements of the input 206 stream are processed in parallel. The approach is inherently 207 scalable and can be adapted to a variety of parallel architectures. 208 The fundamental primitives required are bitwise logic and 209 longstream addition. 210 211 A key concept in this approach is the derivation of bit streams 199 \section{Matching with BitParallel Data Streams}\label{sec:bitwise} 200 201 Whereas the traditional approaches to regular expression matching 202 using NFAs, DFAs or backtracking all rely on a byteatatime 203 processing model, the approach we introduce in this paper is based 204 on quite a different concept: a dataparallel approach to simultaneous 205 processing of data stream elements. Indeed, our most abstract model 206 is that of unbounded data parallelism: processing all elements of 207 the input data stream simultaneously. In essence, we view 208 data streams as (very large) integers. The fundamental operations 209 we apply are based on bitwise logic and longstream addition. 210 211 Depending on the available parallel processing resources, an actual 212 implementation may divide an input stream into blocks and process 213 the blocks sequentially. Within each block all elements of the 214 input stream are processed together, relying the availability of 215 bitwise logic and addition scaled to the block size. On commodity 216 Intel and AMD processors with 128bit SIMD capabilities (SSE2), 217 we typically process input streams 128 bytes at a time. In this 218 case, we rely on the Parabix tool chain \cite{lin2012parabix} 219 to handle the details of compilation to blockbyblock processing. 220 As weh show later, however, we have also adapted Parabix technology to processing 221 blocks of 4K bytes at time in our GPGPU implementation, relying 222 on our implementation of 4096bit addition implemented using 223 SIMT processing of 64 threads. 224 225 A key concept in this streaming approach is the derivation of bit streams 212 226 that are parallel to the input data stream, i.e., in onetoone 213 227 correspondence with the data element positions of the input … … 225 239 or not. {\bf perhaps a figure here.} 226 240 227 Using the SIMD capabilities of commodity processors, 228 the basis bit streams and character class bit streams may be 229 efficiently computed using the Parabix framework that 230 has been developed and used extensively for highperformance 231 XML parsing \cite{}. For example, working with the 128bit 232 SIMD registers on commodity Intel and AMD processors, 233 the Parabix framework can be employed to process input 234 streams in blocks of 128 bytes at a time. However, the 235 general model is not dependent on any particular data block size 236 and can be scaled for arbitrary sized blocks depending on 237 available resources. 238 239 \section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp} 240 241 242 \subsection{Bitwise Data Parallelism vs. BitParallel State Transition} 243 The bitap algorithms use bitwise operations in an orthogonal way. 241 242 \section{BlockataTime Processing}\label{sec:blockwise} 243 244 245 244 246 245 247 \section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis}
Note: See TracChangeset
for help on using the changeset viewer.