Changeset 3648 for docs


Ignore:
Timestamp:
Feb 24, 2014, 2:29:36 PM (5 years ago)
Author:
cameron
Message:

Tone down long-stream addition; discuss scalability

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/conclusion.tex

    r3513 r3648  
    1111good performance in contrast to available alternatives. 
    1212For moderately complex expressions, 10X or better
    13 performance advantages over GNU grep and 5X performance
     13performance advantages over DFA-based gre2p and 5X performance
    1414advantage over nrgrep were frequently seen.
    1515While lacking some
     
    1919in all cases. 
    2020
    21 A parallelized algorithm for long-stream addition has also
    22 been introduced in the paper making a key contribution
    23 to the scalability of the bit-parallel matching technique
    24 overall and that of MatchStar in particular.   This
    25 algorithm has enabled straightforward extension of the
    26 matching algorithm to the implementation using 256-bit
    27 AVX2 technology as well as 4096-bit SIMT implementation
    28 on an AMD GPU.
     21A model for parallelized long-stream addition has also been presented
     22in the paper, allowing our techniques to scale beyond
     23the blocks of 128 bytes we use with the SSE2 implementation.
     24This model allowed straightforward extension to the 256-byte
     25block size used in our AVX2 implementation and should
     26continue to scale well up for SIMD vectors up to 4096 bytes
     27in length based on 64-bit additions.    The model also
     28supports GPGPU implementation with some additional
     29effort, and suggests that direct GPGPU support for
     30the \verb#hsimd<64>::mask(X)# and
     31\verb#simd<64>::spread(x)#  operations could
     32be valuable.
     33
     34The principal overhead in this method is the transposition of
     35input data to parallel bit stream form.   However, this overhead reduces
     36as SIMD register widths increase.   It is also possible to introduce
     37new SIMD instructions that could dramatically reduce the
     38cost of transposition\cite{cameron2009architectural}.
    2939
    3040\paragraph*{Future Work}
Note: See TracChangeset for help on using the changeset viewer.