Ignore:
Timestamp:
Feb 15, 2014, 8:54:07 PM (5 years ago)
Author:
cameron
Message:

Explain while loop termination; tone down long-addition claims

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex

    r3617 r3623  
    313313bitwise logic and addition scaled to the block size.   On commodity
    314314Intel and AMD processors with 128-bit SIMD capabilities (SSE2),
    315 we typically process input streams 128 bytes at a time.   In this
     315we typically process input streams 128 bytes at a time.   
     316In this
    316317case, we rely on the Parabix tool chain \cite{lin2012parabix}
    317318to handle the details of compilation to block-by-block processing.
    318 For our GPGPU implementation, we have developed a long-stream
    319 addition technique that allows us to perform 4096-bit additions
    320 using 64 threads working in lock-step SIMT fashion.  Using scripts
    321 to modify the output of the Parabix tools, we effectively divide
    322 the input into blocks of 4K bytes processed in a fully data-parallel
    323 manner.
    324 
     319On the
     320latest processors supporting the 256-bit AVX2 SIMD operations,
     321we also use the Parabix tool chain, but substitute a parallelized
     322long-stream addition technique to avoid the sequential chaining
     323of 4 64-bit additions.
     324Our GPGPU implementation uses scripts to modify the output
     325of the Parabix tools, effectively dividing the input into blocks
     326of 4K bytes.   
     327We also have adapted our long-stream addition technique
     328to perform 4096-bit additions using 64 threads working in lock-step
     329SIMT fashion.  A similar technique is known to the GPU programming
     330community\cite{}.   
     331 
    325332\begin{figure}[tbh]
    326333\begin{center}
     
    464471bitstream calculations.   
    465472
    466 In the present work, our principal contribution to the block-at-a-time
    467 model is the technique of long-stream addition described below.
     473In the present work, our principal contribution to the Parabix tool
     474chain is to incorporate the technique of long-stream addition described below.
    468475Otherwise, we were able to use Pablo directly in compiling our
    469476SSE2 and AVX2 implementations.   Our GPU implementation required
     
    479486is far from ideal.
    480487
    481 We have developed a technique using SIMD or SIMT methods for constant-time
     488We have developed a technique using SIMD methods for constant-time
    482489long-stream addition up to 4096 bits.   
    483490We assume the availability of the following SIMD/SIMT operations
     
    582589set extensions to support long-stream addition could be added for
    583590future SIMD and GPU processors.   Given the fundamental nature
    584 of addition as a primitive and its novel application to regular
     591of addition as a primitive and its particular application to regular
    585592expression matching as shown herein, it seems reasonable to expect
    586593such instructions to become available.
Note: See TracChangeset for help on using the changeset viewer.