Jun 14, 2014, 4:46:37 AM (5 years ago)

Data updates, more analysis

1 edited


  • docs/Working/re/avx2.tex

    r3862 r3868  
    5959ylabel=AVX2 Instruction Reduction,
    60 xticklabels={@,Date,Email,URIorEmail,HexBytes},
    6161tick label style={font=\tiny},
    6262enlarge x limits=0.15,
    8686count achieved for each of the applications.   Working at a block
    8787size of 256 bytes at a time rather than 128 bytes at a time,
    88 the bitstreams implementation scaled dramatically well with reductions in
    89 instruction count over a factor of two in each case.   Although a factor
     88the bitstreams implementation scaled very well with reductions in
     89instruction count over a factor of two in every case except for StarHeight.   
     90Although a factor
    9091of two would seem an outside limit, we attribute the change to
    9192greater instruction efficiency. 
    110111ylabel=AVX2 Speedup,
    111 xticklabels={@,Date,Email,URIorEmail,HexBytes},
    112113tick label style={font=\tiny},
    113114enlarge x limits=0.15,
    135136As shown in Figure \ref{fig:AVXSpeedup} the reduction in
    136137instruction count was reflected in a significant speed-up
    137 in the bitstreams implementation.  However, the speed-up was
     138in the bitstreams implementation in all cases except
     139StarHeight.  However, the speed-up was
    138140considerably less than expected. 
    139141The bitstreams code  on AVX2 has suffered from a considerable
    140142reduction in instructions per cycle compared to the SSE2
    141 implementation, possibly indicating
     143implementation, likely indicating
    142144that our grep implementation has become memory-bound.
     145However, the performance of StarHeight deserves particular
     146comment, with an actual slowdown observed.   When moving
     147to 256 positions at a time, the controlling while loops may
     148require more iterations than working 128 positions at a time,
     149because the iteration must continue as long as there are any
     150pending markers in the block.   
    143151Nevertheless, the overall results on our AVX2 machine were quite encouraging,
    144152demonstrating very good scalability of the bitwise data-parallel approach.
Note: See TracChangeset for help on using the changeset viewer.