- Timestamp:
- Jun 14, 2014, 4:46:37 AM (5 years ago)
- Location:
- docs/Working/re
- Files:
-
- 29 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Working/re/analysis.tex
r3867 r3868 38 38 This assumption allows us to equate the number of bits in the encoding of a 39 39 character (a parameter for the bitstream method) with $\log \sigma$. 40 41 40 42 41 The bitstream method compiles a regular expression of size $m$ into -
docs/Working/re/avx2.tex
r3862 r3868 58 58 xtick=data, 59 59 ylabel=AVX2 Instruction Reduction, 60 xticklabels={@,Date,Email,URIorEmail,HexBytes },60 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 61 61 tick label style={font=\tiny}, 62 62 enlarge x limits=0.15, … … 86 86 count achieved for each of the applications. Working at a block 87 87 size of 256 bytes at a time rather than 128 bytes at a time, 88 the bitstreams implementation scaled dramatically well with reductions in 89 instruction count over a factor of two in each case. Although a factor 88 the bitstreams implementation scaled very well with reductions in 89 instruction count over a factor of two in every case except for StarHeight. 90 Although a factor 90 91 of two would seem an outside limit, we attribute the change to 91 92 greater instruction efficiency. … … 109 110 xtick=data, 110 111 ylabel=AVX2 Speedup, 111 xticklabels={@,Date,Email,URIorEmail,HexBytes },112 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 112 113 tick label style={font=\tiny}, 113 114 enlarge x limits=0.15, … … 135 136 As shown in Figure \ref{fig:AVXSpeedup} the reduction in 136 137 instruction count was reflected in a significant speed-up 137 in the bitstreams implementation. However, the speed-up was 138 in the bitstreams implementation in all cases except 139 StarHeight. However, the speed-up was 138 140 considerably less than expected. 139 141 The bitstreams code on AVX2 has suffered from a considerable 140 142 reduction in instructions per cycle compared to the SSE2 141 implementation, possibly indicating143 implementation, likely indicating 142 144 that our grep implementation has become memory-bound. 145 However, the performance of StarHeight deserves particular 146 comment, with an actual slowdown observed. When moving 147 to 256 positions at a time, the controlling while loops may 148 require more iterations than working 128 positions at a time, 149 because the iteration must continue as long as there are any 150 pending markers in the block. 143 151 Nevertheless, the overall results on our AVX2 machine were quite encouraging, 144 152 demonstrating very good scalability of the bitwise data-parallel approach. -
docs/Working/re/data/avxtime.dat
r3653 r3868 1 0 0.205018196989 2 1 0.295246601196 3 2 0.302237158564 4 3 0.468256435493 5 4 0.288130051020 1 0 0.1956379337 2 1 0.2828034926 3 2 0.2889983903 4 3 0.4765108042 5 4 0.3396900899 6 5 1.2091171236 -
docs/Working/re/data/branch-misses-bitstreams-avx2.dat
r3658 r3868 1 0 0.001519363 2 1 0.0008206642 3 2 0.0010116166 4 3 0.000738838 5 4 0.0008082418 1 0 0.0016774825 2 1 0.0008357743 3 2 0.001037251 4 3 0.0007558463 5 4 0.002390721 6 5 0.0026046142 7 -
docs/Working/re/data/branch-misses-bitstreams.dat
r3658 r3868 1 0 0.0008936692 2 1 0.0006330034 3 2 0.0006860091 4 3 0.0005290345 5 4 0.0004363124 1 0 0.0008842031 2 1 0.000527084 3 2 0.0006941816 4 3 0.0005300792 5 4 0.001923317 6 5 0.002272245 -
docs/Working/re/data/branch-misses-gre2p.dat
r3658 r3868 1 0 0.0015052715 2 1 0.001427571 3 2 0.0011283994 4 3 0.0013149726 5 4 0.0002026342 1 0 0.0013837774 2 1 0.0014431598 3 2 0.0010984108 4 3 0.0014717503 5 4 0.0013722774 6 5 0.0015046596 -
docs/Working/re/data/branch-misses-nrgrep112.dat
r3658 r3868 1 0 0.0001772238 2 1 0.0034354883 3 2 0.0111118402 4 3 0.0003465179 5 4 0.0353144202 1 0 0.0001779815 2 1 0.0034672889 3 2 0.0113438762 4 3 0.0002536244 5 4 0.0006715218 6 5 0.0002039675 -
docs/Working/re/data/cycles-bitstreams-avx2.dat
r3658 r3868 1 0 0.6355542658 2 1 0.9019257089 3 2 0.9204814913 4 3 1.4404315802 5 4 0.917466457 1 0 0.6260413878 2 1 0.9049711763 3 2 0.9247948489 4 3 1.5248345734 5 4 1.0870082877 6 5 3.8691747954 -
docs/Working/re/data/cycles-bitstreams.dat
r3658 r3868 1 0 0.9531801511 2 1 1.2232182173 3 2 1.1755542988 4 3 1.9287017271 5 4 1.4906017068 1 0 0.9501341335 2 1 1.2300784685 3 2 1.1884731335 4 3 2.2140142113 5 4 1.5846469019 6 5 3.6150169368 -
docs/Working/re/data/cycles-gre2p.dat
r3658 r3868 1 0 22.0020578302 2 1 12.0522989323 3 2 26.0779249611 4 3 40.4431307258 5 4 11.7247355513 1 0 21.4808229914 2 1 11.7626860483 3 2 25.9036592798 4 3 41.9343698898 5 4 114.3925588425 6 5 29.329930415 -
docs/Working/re/data/cycles-nrgrep112.dat
r3658 r3868 1 0 2.1591544896 2 1 0.6861738865 3 2 8.4588007414 4 3 7.5076760868 5 4 3.9757488343 1 0 2.1635344166 2 1 0.6812105966 3 2 8.7812098736 4 3 9.9903646622 5 4 8.7637658636 6 5 7.412737803 7 -
docs/Working/re/data/gputime.dat
r3650 r3868 4 4 3 0.42 5 5 4 0.18 6 5 0 -
docs/Working/re/data/instructions-bitstreams-avx2.dat
r3658 r3868 1 0 1.0068203106 2 1 1.3957890123 3 2 1.4819688345 4 3 2.5283462159 5 4 1.7364563359 1 0 1.0088770471 2 1 1.397859293 3 2 1.4840285727 4 3 2.8758951543 5 4 1.7512372356 6 5 5.3132736393 -
docs/Working/re/data/instructions-bitstreams.dat
r3658 r3868 1 0 2.8837376644 2 1 3.8441432034 3 2 3.6697338968 4 3 5.8888106051 5 4 4.6447878671 1 0 2.8857037983 2 1 3.8460961522 3 2 3.6717107684 4 3 6.755662302 5 4 4.4599754627 6 5 8.4513797591 -
docs/Working/re/data/instructions-gre2p.dat
r3658 r3868 1 0 48.3575171087 2 1 20.373182051 3 2 60.9252858517 4 3 100.2840179133 5 4 21.4078933887 1 0 48.9373570373 2 1 20.6017041692 3 2 62.0332338488 4 3 106.0809488109 5 4 320.3433684947 6 5 69.8179680888 -
docs/Working/re/data/instructions-nrgrep112.dat
r3658 r3868 1 0 8.3244413762 2 1 1.8579135488 3 2 15.4674421115 4 3 10.5666965019 5 4 3.2337932487 1 0 8.3303064783 2 1 1.8521000503 3 2 15.3481859353 4 3 14.6365366359 5 4 15.0976676339 6 5 10.1374062489 -
docs/Working/re/data/ipc-bitstreams-avx2.dat
r3658 r3868 1 0 1.5841610461 2 1 1.5475653909 3 2 1.6099930835 4 3 1.755269914 5 4 1.8926646557 1 0 1.6115181309 2 1 1.5446450998 3 2 1.6047111145 4 3 1.8860374788 5 4 1.6110615304 6 5 1.3732317407 -
docs/Working/re/data/ipc-bitstreams.dat
r3658 r3868 1 0 3.0253857691 2 1 3.1426471165 3 2 3.1217051399 4 3 3.0532510664 5 4 3.116048939 1 0 3.0371541203 2 1 3.1267079709 3 2 3.0894352298 4 3 3.0513184006 5 4 2.814491643 6 5 2.3378534338 -
docs/Working/re/data/ipc-gre2p.dat
r3658 r3868 1 0 2.1978633763 2 1 1.6903980034 3 2 2.3362781334 4 3 2.479630437 5 4 1.8258743061 1 0 2.2781881801 2 1 1.7514455529 3 2 2.3947672095 4 3 2.5296898246 5 4 2.8003864214 6 5 2.3804341538 -
docs/Working/re/data/ipc-nrgrep112.dat
r3658 r3868 1 0 3.8554172091 2 1 2.7076424583 3 2 1.8285620603 4 3 1.40745237 5 4 0.8133796634 1 0 3.8503230705 2 1 2.7188362301 3 2 1.7478441076 4 3 1.4650653035 5 4 1.7227374474 6 5 1.3675657387 -
docs/Working/re/data/sse2-avx2-instr-red-bitstreams.dat
r3659 r3868 1 0 2.864202911 2 1 2.7541004905 3 2 2.4762557831 4 3 2.3291155966 5 4 2.6748659158 1 0 2.8603126681 2 1 2.7514186668 3 2 2.4741509941 4 3 2.3490641833 5 4 2.5467568711 6 5 1.5906163192 -
docs/Working/re/data/sse2-avx2-instr-red-gre2p.dat
r3659 r3868 4 4 3 1.0091594364 5 5 4 1.001113479 6 5 1.0 -
docs/Working/re/data/sse2-avx2-instr-red-nrgrep112.dat
r3659 r3868 4 4 3 0.9980379239 5 5 4 0.9992262994 6 5 1.0 -
docs/Working/re/data/sse2-avx2-speedup-bitstreams.dat
r3659 r3868 1 0 1.4997620225 2 1 1.3562294601 3 2 1.2771080244 4 3 1.3389748973 5 4 1.6246934104 1 0 1.5176858145 2 1 1.3592460188 3 2 1.2851208405 4 3 1.4519701022 5 4 1.4578057222 6 5 0.934312128 -
docs/Working/re/data/sse2-avx2-speedup-gre2p.dat
r3659 r3868 4 4 3 1.0309592264 5 5 4 1.0009049952 6 5 1.0 -
docs/Working/re/data/sse2-avx2-speedup-nrgrep112.dat
r3659 r3868 4 4 3 0.9959185424 5 5 4 1.0045503071 6 5 1.0 -
docs/Working/re/data/ssetime.dat
r3653 r3868 1 0 0.293453733705 2 1 0.368515295522 3 2 0.363363795916 4 3 0.585075632852 5 4 0.470108648434 1 0 0.2969169167 2 1 0.3843995214 3 2 0.3713978542 4 3 0.691879441 5 4 0.4952021569 6 5 1.1296927928 -
docs/Working/re/re-main.tex
r3862 r3868 637 637 xtick=data, 638 638 ylabel=Running Time (ms per megabyte), 639 xticklabels={@,Date,Email,URIorEmail,HexBytes },639 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 640 640 tick label style={font=\tiny}, 641 641 enlarge x limits=0.15, -
docs/Working/re/sse2.tex
r3862 r3868 11 11 Date & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)` \\ \hline 12 12 Email & \verb`([^ @]+)@([^ @]+)` \\ \hline 13 URIOrEmail & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)` \\ \hline 14 HexBytes & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])` \\ \hline 13 URIOrEmail & \verb`'(([a-zA-Z][a-zA-Z0-9]*)://|mailto:)([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)'` \\ \hline 14 HexBytes & \verb`[ ](0x)?([a-fA-F0-9][a-fA-F0-9])+[.:,?! ]'` \\ \hline 15 StarHeight & \verb`'[A-Z]((([a-zA-Z]*a[a-zA-Z]*[ ])*[a-zA-Z]*e[a-zA-Z]*[ ])*[a-zA-Z]*s[a-zA-Z]*[ ])*[.?!]'` \\ \hline 15 16 \end{tabular} 16 17 } … … 29 30 in a regular expression, the Pablo bitstream equation compiler which 30 31 converts equations to block-at-a-time C++ code for 128-bit SIMD, and 31 gcc 4. 6.3to generate the binaries. The Pablo output is combined32 gcc 4.8.2 to generate the binaries. The Pablo output is combined 32 33 with a {\tt grep\_template.cpp} file that arranges to read input 33 34 files, break them into segments, and print or count matches … … 85 86 a while loop. 86 87 All tests were run on a version 87 of a \textit{Linux howto}88 file concatenated to a lengthof89 39,42 2,105 bytes.88 of a \textit{Linux 3Dfx howto} 89 file of 90 39,421,555 bytes. 90 91 91 92 … … 97 98 xtick=data, 98 99 ylabel=Cycles per Byte, 99 xticklabels={@,Date,Email,URIorEmail,HexBytes },100 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 100 101 tick label style={font=\tiny}, 101 102 enlarge x limits=0.15, … … 150 151 a significant benefit from the character skipping optimization. 151 152 152 153 153 The results for the Email expression illustrate the relative 154 154 advantage of the Parabix method when the expression to be matched … … 158 158 lines matching the Email regex. 159 159 160 URI... 160 The URIorEmail expression illustrates the performance of the 161 grep programs with additional regular expression complexity. 162 As expressions get larger, the number of steps required by 163 the Parabix implementation increases, so the performance 164 advantage drops to about 4.5X over nrgrep and 19X over gre2p. 165 32557 lines are matched by the URIorEmail regex. 161 166 162 167 The results for HexBytes expression illustrate Parabix performance 163 168 involving a Kleene-+ operator compiled to a while loop. 164 Our implementation uses just 1.49 cycles per byte to find the 165 130,243 matching lines. Performance is again dramatically better 166 than that of nrgrep or gre2p. 167 168 StarHeight... 169 Our implementation uses just 1.6 cycles per byte to find the 170 130,243 matching lines. The gre2p program performs 171 quite poorly here, slower than the Parabix implementation 172 by about 70X, while nrgrep is about 5.5X slower. 173 174 StarHeight is an artificial expression created to stress the Parabix 175 implementation with a triply-nested while loop. The performance 176 does drop off, maintaining only a 2X advantage over nrgrep. 169 177 170 178 … … 177 185 xtick=data, 178 186 ylabel=Instructions per Byte, 179 xticklabels={@,Date,Email,URIorEmail,HexBytes },187 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 180 188 tick label style={font=\tiny}, 181 189 enlarge x limits=0.15, … … 218 226 xtick=data, 219 227 ylabel=Instructions per Cycle, 220 xticklabels={@,Date,Email,URIorEmail,HexBytes },228 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 221 229 tick label style={font=\tiny}, 222 230 enlarge x limits=0.15, … … 248 256 xtick=data, 249 257 ylabel=Branch Misses per Instruction, 250 xticklabels={@,Date,Email,URIorEmail,HexBytes },258 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight}, 251 259 tick label style={font=\tiny}, 252 260 enlarge x limits=0.15,
Note: See TracChangeset
for help on using the changeset viewer.