Changeset 3868 for docs


Ignore:
Timestamp:
Jun 14, 2014, 4:46:37 AM (5 years ago)
Author:
cameron
Message:

Data updates, more analysis

Location:
docs/Working/re
Files:
29 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/analysis.tex

    r3867 r3868  
    3838This assumption allows us to equate the number of bits in the encoding of a
    3939character (a parameter for the bitstream method) with $\log \sigma$.
    40 
    4140
    4241The bitstream method compiles a regular expression of size $m$ into
  • docs/Working/re/avx2.tex

    r3862 r3868  
    5858xtick=data,
    5959ylabel=AVX2 Instruction Reduction,
    60 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     60xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    6161tick label style={font=\tiny},
    6262enlarge x limits=0.15,
     
    8686count achieved for each of the applications.   Working at a block
    8787size of 256 bytes at a time rather than 128 bytes at a time,
    88 the bitstreams implementation scaled dramatically well with reductions in
    89 instruction count over a factor of two in each case.   Although a factor
     88the bitstreams implementation scaled very well with reductions in
     89instruction count over a factor of two in every case except for StarHeight.   
     90Although a factor
    9091of two would seem an outside limit, we attribute the change to
    9192greater instruction efficiency. 
     
    109110xtick=data,
    110111ylabel=AVX2 Speedup,
    111 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     112xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    112113tick label style={font=\tiny},
    113114enlarge x limits=0.15,
     
    135136As shown in Figure \ref{fig:AVXSpeedup} the reduction in
    136137instruction count was reflected in a significant speed-up
    137 in the bitstreams implementation.  However, the speed-up was
     138in the bitstreams implementation in all cases except
     139StarHeight.  However, the speed-up was
    138140considerably less than expected. 
    139141The bitstreams code  on AVX2 has suffered from a considerable
    140142reduction in instructions per cycle compared to the SSE2
    141 implementation, possibly indicating
     143implementation, likely indicating
    142144that our grep implementation has become memory-bound.
     145However, the performance of StarHeight deserves particular
     146comment, with an actual slowdown observed.   When moving
     147to 256 positions at a time, the controlling while loops may
     148require more iterations than working 128 positions at a time,
     149because the iteration must continue as long as there are any
     150pending markers in the block.   
    143151Nevertheless, the overall results on our AVX2 machine were quite encouraging,
    144152demonstrating very good scalability of the bitwise data-parallel approach.
  • docs/Working/re/data/avxtime.dat

    r3653 r3868  
    1 0 0.205018196989
    2 1 0.295246601196
    3 2 0.302237158564
    4 3 0.468256435493
    5 4 0.288130051020
     10       0.1956379337
     21       0.2828034926
     32       0.2889983903
     43       0.4765108042
     54       0.3396900899
     65       1.2091171236
  • docs/Working/re/data/branch-misses-bitstreams-avx2.dat

    r3658 r3868  
    1 0 0.001519363
    2 1 0.0008206642
    3 2 0.0010116166
    4 3 0.000738838
    5 4 0.0008082418
     10       0.0016774825
     21       0.0008357743
     32       0.001037251
     43       0.0007558463
     54       0.002390721
     65       0.0026046142
     7
  • docs/Working/re/data/branch-misses-bitstreams.dat

    r3658 r3868  
    1 0 0.0008936692
    2 1 0.0006330034
    3 2 0.0006860091
    4 3 0.0005290345
    5 4 0.0004363124
     10       0.0008842031
     21       0.000527084
     32       0.0006941816
     43       0.0005300792
     54       0.001923317
     65       0.002272245
  • docs/Working/re/data/branch-misses-gre2p.dat

    r3658 r3868  
    1 0 0.0015052715
    2 1 0.001427571
    3 2 0.0011283994
    4 3 0.0013149726
    5 4 0.0002026342
     10       0.0013837774
     21       0.0014431598
     32       0.0010984108
     43       0.0014717503
     54       0.0013722774
     65       0.0015046596
  • docs/Working/re/data/branch-misses-nrgrep112.dat

    r3658 r3868  
    1 0 0.0001772238
    2 1 0.0034354883
    3 2 0.0111118402
    4 3 0.0003465179
    5 4 0.0353144202
     10       0.0001779815
     21       0.0034672889
     32       0.0113438762
     43       0.0002536244
     54       0.0006715218
     65       0.0002039675
  • docs/Working/re/data/cycles-bitstreams-avx2.dat

    r3658 r3868  
    1 0 0.6355542658
    2 1 0.9019257089
    3 2 0.9204814913
    4 3 1.4404315802
    5 4 0.917466457
     10       0.6260413878
     21       0.9049711763
     32       0.9247948489
     43       1.5248345734
     54       1.0870082877
     65       3.8691747954
  • docs/Working/re/data/cycles-bitstreams.dat

    r3658 r3868  
    1 0 0.9531801511
    2 1 1.2232182173
    3 2 1.1755542988
    4 3 1.9287017271
    5 4 1.4906017068
     10       0.9501341335
     21       1.2300784685
     32       1.1884731335
     43       2.2140142113
     54       1.5846469019
     65       3.6150169368
  • docs/Working/re/data/cycles-gre2p.dat

    r3658 r3868  
    1 0 22.0020578302
    2 1 12.0522989323
    3 2 26.0779249611
    4 3 40.4431307258
    5 4 11.7247355513
     10       21.4808229914
     21       11.7626860483
     32       25.9036592798
     43       41.9343698898
     54       114.3925588425
     65       29.329930415
  • docs/Working/re/data/cycles-nrgrep112.dat

    r3658 r3868  
    1 0 2.1591544896
    2 1 0.6861738865
    3 2 8.4588007414
    4 3 7.5076760868
    5 4 3.9757488343
     10       2.1635344166
     21       0.6812105966
     32       8.7812098736
     43       9.9903646622
     54       8.7637658636
     65       7.412737803
     7
  • docs/Working/re/data/gputime.dat

    r3650 r3868  
    443 0.42
    554 0.18
     65 0
  • docs/Working/re/data/instructions-bitstreams-avx2.dat

    r3658 r3868  
    1 0 1.0068203106
    2 1 1.3957890123
    3 2 1.4819688345
    4 3 2.5283462159
    5 4 1.7364563359
     10       1.0088770471
     21       1.397859293
     32       1.4840285727
     43       2.8758951543
     54       1.7512372356
     65       5.3132736393
  • docs/Working/re/data/instructions-bitstreams.dat

    r3658 r3868  
    1 0 2.8837376644
    2 1 3.8441432034
    3 2 3.6697338968
    4 3 5.8888106051
    5 4 4.6447878671
     10       2.8857037983
     21       3.8460961522
     32       3.6717107684
     43       6.755662302
     54       4.4599754627
     65       8.4513797591
  • docs/Working/re/data/instructions-gre2p.dat

    r3658 r3868  
    1 0 48.3575171087
    2 1 20.373182051
    3 2 60.9252858517
    4 3 100.2840179133
    5 4 21.4078933887
     10       48.9373570373
     21       20.6017041692
     32       62.0332338488
     43       106.0809488109
     54       320.3433684947
     65       69.8179680888
  • docs/Working/re/data/instructions-nrgrep112.dat

    r3658 r3868  
    1 0 8.3244413762
    2 1 1.8579135488
    3 2 15.4674421115
    4 3 10.5666965019
    5 4 3.2337932487
     10       8.3303064783
     21       1.8521000503
     32       15.3481859353
     43       14.6365366359
     54       15.0976676339
     65       10.1374062489
  • docs/Working/re/data/ipc-bitstreams-avx2.dat

    r3658 r3868  
    1 0 1.5841610461
    2 1 1.5475653909
    3 2 1.6099930835
    4 3 1.755269914
    5 4 1.8926646557
     10       1.6115181309
     21       1.5446450998
     32       1.6047111145
     43       1.8860374788
     54       1.6110615304
     65       1.3732317407
  • docs/Working/re/data/ipc-bitstreams.dat

    r3658 r3868  
    1 0 3.0253857691
    2 1 3.1426471165
    3 2 3.1217051399
    4 3 3.0532510664
    5 4 3.116048939
     10       3.0371541203
     21       3.1267079709
     32       3.0894352298
     43       3.0513184006
     54       2.814491643
     65       2.3378534338
  • docs/Working/re/data/ipc-gre2p.dat

    r3658 r3868  
    1 0 2.1978633763
    2 1 1.6903980034
    3 2 2.3362781334
    4 3 2.479630437
    5 4 1.8258743061
     10       2.2781881801
     21       1.7514455529
     32       2.3947672095
     43       2.5296898246
     54       2.8003864214
     65       2.3804341538
  • docs/Working/re/data/ipc-nrgrep112.dat

    r3658 r3868  
    1 0 3.8554172091
    2 1 2.7076424583
    3 2 1.8285620603
    4 3 1.40745237
    5 4 0.8133796634
     10       3.8503230705
     21       2.7188362301
     32       1.7478441076
     43       1.4650653035
     54       1.7227374474
     65       1.3675657387
  • docs/Working/re/data/sse2-avx2-instr-red-bitstreams.dat

    r3659 r3868  
    1 0 2.864202911
    2 1 2.7541004905
    3 2 2.4762557831
    4 3 2.3291155966
    5 4 2.6748659158
     10       2.8603126681
     21       2.7514186668
     32       2.4741509941
     43       2.3490641833
     54       2.5467568711
     65       1.5906163192
  • docs/Working/re/data/sse2-avx2-instr-red-gre2p.dat

    r3659 r3868  
    443 1.0091594364
    554 1.001113479
     65 1.0
  • docs/Working/re/data/sse2-avx2-instr-red-nrgrep112.dat

    r3659 r3868  
    443 0.9980379239
    554 0.9992262994
     65 1.0
  • docs/Working/re/data/sse2-avx2-speedup-bitstreams.dat

    r3659 r3868  
    1 0 1.4997620225
    2 1 1.3562294601
    3 2 1.2771080244
    4 3 1.3389748973
    5 4 1.6246934104
     10       1.5176858145
     21       1.3592460188
     32       1.2851208405
     43       1.4519701022
     54       1.4578057222
     65       0.934312128
  • docs/Working/re/data/sse2-avx2-speedup-gre2p.dat

    r3659 r3868  
    443 1.0309592264
    554 1.0009049952
     65 1.0
  • docs/Working/re/data/sse2-avx2-speedup-nrgrep112.dat

    r3659 r3868  
    443 0.9959185424
    554 1.0045503071
     65 1.0
  • docs/Working/re/data/ssetime.dat

    r3653 r3868  
    1 0 0.293453733705
    2 1 0.368515295522
    3 2 0.363363795916
    4 3 0.585075632852
    5 4 0.470108648434
     10       0.2969169167
     21       0.3843995214
     32       0.3713978542
     43       0.691879441
     54       0.4952021569
     65       1.1296927928
  • docs/Working/re/re-main.tex

    r3862 r3868  
    637637xtick=data,
    638638ylabel=Running Time (ms per megabyte),
    639 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     639xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    640640tick label style={font=\tiny},
    641641enlarge x limits=0.15,
  • docs/Working/re/sse2.tex

    r3862 r3868  
    1111Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
    1212Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
    13 URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
    14 HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
     13URIOrEmail      & \verb`'(([a-zA-Z][a-zA-Z0-9]*)://|mailto:)([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)'`               \\ \hline     
     14HexBytes                & \verb`[ ](0x)?([a-fA-F0-9][a-fA-F0-9])+[.:,?! ]'`             \\ \hline
     15StarHeight              & \verb`'[A-Z]((([a-zA-Z]*a[a-zA-Z]*[ ])*[a-zA-Z]*e[a-zA-Z]*[ ])*[a-zA-Z]*s[a-zA-Z]*[ ])*[.?!]'`                \\ \hline
    1516\end{tabular}
    1617}
     
    2930in a regular expression, the Pablo bitstream equation compiler which
    3031converts equations to block-at-a-time C++ code for 128-bit SIMD, and
    31 gcc 4.6.3 to generate the binaries.   The Pablo output is combined
     32gcc 4.8.2 to generate the binaries.   The Pablo output is combined
    3233with a {\tt grep\_template.cpp} file that arranges to read input
    3334files, break them into segments, and print or count matches
     
    8586a while loop.
    8687All tests were run on a version
    87 of a \textit{Linux howto}
    88 file concatenated to a length of
    89 39,422,105 bytes. 
     88of a \textit{Linux 3Dfx howto}
     89file of
     9039,421,555 bytes. 
    9091
    9192
     
    9798xtick=data,
    9899ylabel=Cycles per Byte,
    99 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     100xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    100101tick label style={font=\tiny},
    101102enlarge x limits=0.15,
     
    150151a significant benefit from the character skipping optimization.
    151152
    152 
    153153The results for the Email expression illustrate the relative
    154154advantage of the Parabix method when the expression to be matched
     
    158158lines matching the Email regex.
    159159
    160 URI...
     160The URIorEmail expression illustrates the performance of the
     161grep programs with additional regular expression complexity.
     162As expressions get larger, the number of steps required by
     163the Parabix implementation increases, so the performance
     164advantage drops to about 4.5X over nrgrep and 19X over gre2p.
     16532557 lines are matched by the URIorEmail regex.
    161166
    162167The results for HexBytes expression illustrate Parabix performance
    163168involving a Kleene-+ operator compiled to a while loop. 
    164 Our implementation uses just 1.49 cycles per byte to find the
    165 130,243 matching lines.    Performance is again dramatically better
    166 than that of nrgrep or gre2p.
    167 
    168 StarHeight...
     169Our implementation uses just 1.6 cycles per byte to find the
     170130,243 matching lines.    The gre2p program performs
     171quite poorly here, slower than the Parabix implementation
     172by about 70X, while nrgrep is about 5.5X slower.
     173
     174StarHeight is an artificial expression created to stress the Parabix
     175implementation with a triply-nested while loop.   The performance
     176does drop off, maintaining only a 2X advantage over nrgrep.
    169177
    170178
     
    177185xtick=data,
    178186ylabel=Instructions per Byte,
    179 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     187xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    180188tick label style={font=\tiny},
    181189enlarge x limits=0.15,
     
    218226xtick=data,
    219227ylabel=Instructions per Cycle,
    220 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     228xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    221229tick label style={font=\tiny},
    222230enlarge x limits=0.15,
     
    248256xtick=data,
    249257ylabel=Branch Misses per Instruction,
    250 xticklabels={@,Date,Email,URIorEmail,HexBytes},
     258xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    251259tick label style={font=\tiny},
    252260enlarge x limits=0.15,
Note: See TracChangeset for help on using the changeset viewer.