Changeset 3623 for docs


Ignore:
Timestamp:
Feb 15, 2014, 8:54:07 PM (5 years ago)
Author:
cameron
Message:

Explain while loop termination; tone down long-addition claims

Location:
docs/Working/re
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/abstract.tex

    r3513 r3623  
    1 New parallel algorithms for the classical grep (global regular expression
    2 print) problem are introduced together with implementations using
    3 commodity SIMD and GPU technologies.   Building on the bitwise
    4 data parallelism underlying previous work in Unicode transcoding and
    5 XML parsing, the new algorithms add the important element of
    6 nondeterminism for tackling the full generality of regular
    7 expression processing.               On widely-deployed commodity hardware using
     1A new parallel algorithm for regular expression matching is developed
     2and applied to the classical grep (global regular expression print)
     3problem.  Building on the bitwise data parallelism previously
     4applied to the manual implementation of token scanning in the
     5Parabix XML parser, the new algorithm represents a general
     6solution to the problem of regular expression matching using
     7parallel bit streams.  On widely-deployed commodity hardware using
    88128-bit SSE2 SIMD technology, our algorithm implementations can substantially
    99outperform traditional grep implementations based on NFAs, DFAs or
     
    1414of Intel AVX2 or future 512-bit extensions.   Our AVX2 implementation
    1515showed dramatic reduction in instruction count and significant
    16 improvement in speed.   Our GPU implementations show
    17 further acceleration limited primarily by data transfer speed.
    18 
    19 
     16improvement in speed.   Our GPU implementations show substantial
     17further acceleration. 
  • docs/Working/re/compilation.tex

    r3509 r3623  
    5757character class of the form {\tt R*}, then a Pablo while loop
    5858is created conditioned on a control marker stream still
    59 having bits marking match positions.   The body of the
     59having bits marking match positions to  be considered.   The body of the
    6060while consists of the compiled form of the expression {\tt R},
    61 taking as input the marker at the beginning of the iteration
    62 and producing output that becomes the input for the next iteration, if any.
     61taking as input the marker stream at the beginning of the iteration
     62and producing as output all positions that can be reached from
     63the input positions in a single step.   These output positions
     64are candidates for further iteration, but positions that
     65have already been considered are removed.  This guarantees
     66termination of the loop; iteration continues only if a new
     67marker position is reached that has not been previously
     68considered as an output.
    6369The final output is the bitwise-or of matches determined in each loop
    6470iteration.
  • docs/Working/re/re-main.tex

    r3617 r3623  
    313313bitwise logic and addition scaled to the block size.   On commodity
    314314Intel and AMD processors with 128-bit SIMD capabilities (SSE2),
    315 we typically process input streams 128 bytes at a time.   In this
     315we typically process input streams 128 bytes at a time.   
     316In this
    316317case, we rely on the Parabix tool chain \cite{lin2012parabix}
    317318to handle the details of compilation to block-by-block processing.
    318 For our GPGPU implementation, we have developed a long-stream
    319 addition technique that allows us to perform 4096-bit additions
    320 using 64 threads working in lock-step SIMT fashion.  Using scripts
    321 to modify the output of the Parabix tools, we effectively divide
    322 the input into blocks of 4K bytes processed in a fully data-parallel
    323 manner.
    324 
     319On the
     320latest processors supporting the 256-bit AVX2 SIMD operations,
     321we also use the Parabix tool chain, but substitute a parallelized
     322long-stream addition technique to avoid the sequential chaining
     323of 4 64-bit additions.
     324Our GPGPU implementation uses scripts to modify the output
     325of the Parabix tools, effectively dividing the input into blocks
     326of 4K bytes.   
     327We also have adapted our long-stream addition technique
     328to perform 4096-bit additions using 64 threads working in lock-step
     329SIMT fashion.  A similar technique is known to the GPU programming
     330community\cite{}.   
     331 
    325332\begin{figure}[tbh]
    326333\begin{center}
     
    464471bitstream calculations.   
    465472
    466 In the present work, our principal contribution to the block-at-a-time
    467 model is the technique of long-stream addition described below.
     473In the present work, our principal contribution to the Parabix tool
     474chain is to incorporate the technique of long-stream addition described below.
    468475Otherwise, we were able to use Pablo directly in compiling our
    469476SSE2 and AVX2 implementations.   Our GPU implementation required
     
    479486is far from ideal.
    480487
    481 We have developed a technique using SIMD or SIMT methods for constant-time
     488We have developed a technique using SIMD methods for constant-time
    482489long-stream addition up to 4096 bits.   
    483490We assume the availability of the following SIMD/SIMT operations
     
    582589set extensions to support long-stream addition could be added for
    583590future SIMD and GPU processors.   Given the fundamental nature
    584 of addition as a primitive and its novel application to regular
     591of addition as a primitive and its particular application to regular
    585592expression matching as shown herein, it seems reasonable to expect
    586593such instructions to become available.
Note: See TracChangeset for help on using the changeset viewer.