Changeset 4560


Ignore:
Timestamp:
May 15, 2015, 8:16:44 AM (4 years ago)
Author:
nmedfort
Message:

Edits

Location:
docs/Working/icGrep
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4559 r4560  
    277277%
    278278Table \ref{table:relperf} shows the speedups obtained with \icGrep{}
    279 icGrep{} on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
     279on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
    280280and both single-threaded and multi-threaded versions.
    281 All speedups are relative to the
    282 single-threaded performance on the i7-2600 machine = 1.0.
     281All speedups are relative to the base single-threaded SSE2 performance on the i7-2600 machine.
     282%
    283283The SSE2 results are again using the generic binaries compiled for compatibility
    284 with all 64-bit processors.   The AVX1 results are for Intel AVX instructions
     284with all 64-bit processors.   
     285%
     286The AVX1 results are for Intel AVX instructions
    285287in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form,
    286288which helps reduce register pressure.   The AVX2 results are for \icGrep{}
     
    288290
    289291
     292
     293% \begin{table}[h]\centering % requires booktabs,siunitx
     294% \small
     295% \vspace{-2em}
     296% \begin{tabular}{@{}p{3cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
     297% &\multicolumn{6}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
     298% \cmidrule[1pt](lr){2-7}
     299% \cmidrule[1pt](lr){8-13}
     300% \textbf{Expression}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
     301% \toprule
     302% Alphanumeric \#1&1.28&(.06)&1.35&(.05)&1.64&(.16)&1.41&(.06)&1.44&(.06)&1.96&(.18)\\
     303% Alphanumeric \#2&1.27&(.06)&1.32&(.05)&1.77&(.19)&1.39&(.07)&1.39&(.04)&2.18&(.22)\\
     304% Arabic&1.21&(.07)&1.28&(.08)&1.43&(.16)&1.30&(.05)&1.30&(.05)&1.63&(.13)\\
     305% Currency&1.01&(.05)&1.03&(.06)&1.06&(.12)&1.05&(.05)&1.06&(.05)&1.21&(.08)\\
     306% Cyrillic&1.18&(.06)&1.25&(.05)&1.13&(.10)&1.26&(.04)&1.33&(.04)&1.22&(.10)\\
     307% Email&1.32&(.04)&1.38&(.05)&1.86&(.21)&1.42&(.04)&1.46&(.05)&2.17&(.26)\\
     308% \midrule
     309% \textit{Geomean}&1.21&&1.26&&1.45&&1.30&&1.32&&1.68&\\
     310% \bottomrule
     311% \end{tabular}
     312% \caption{Speedups of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf}
     313% \vspace{-2em}
     314% \end{table}
    290315
    291316\begin{table}[h]\centering % requires booktabs,siunitx
     
    293318\vspace{-2em}
    294319\begin{tabular}{@{}p{3cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
    295 &\multicolumn{6}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
    296 \cmidrule[1pt](lr){2-7}
     320&\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
     321\cmidrule[1pt](lr){2-3}
     322\cmidrule[1pt](lr){4-7}
    297323\cmidrule[1pt](lr){8-13}
    298 \textbf{Expression}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
     324\textbf{Expression}&\multicolumn{2}{c}{\textbf{s/GB}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
    299325\toprule
    300 Alphanumeric \#1&1.28&(.06)&1.35&(.05)&1.64&(.16)&1.41&(.06)&1.44&(.06)&1.96&(.18)\\
    301 Alphanumeric \#2&1.27&(.06)&1.32&(.05)&1.77&(.19)&1.39&(.07)&1.39&(.04)&2.18&(.22)\\
    302 Arabic&1.21&(.07)&1.28&(.08)&1.43&(.16)&1.30&(.05)&1.30&(.05)&1.63&(.13)\\
    303 Currency&1.01&(.05)&1.03&(.06)&1.06&(.12)&1.05&(.05)&1.06&(.05)&1.21&(.08)\\
    304 Cyrillic&1.18&(.06)&1.25&(.05)&1.13&(.10)&1.26&(.04)&1.33&(.04)&1.22&(.10)\\
    305 Email&1.32&(.04)&1.38&(.05)&1.86&(.21)&1.42&(.04)&1.46&(.05)&2.17&(.26)\\
    306 \midrule
    307 \textit{Geomean}&1.21&&1.26&&1.45&&1.30&&1.32&&1.68&\\
     326Alphanumeric \#1&2.76&(.65)&1.05&(.03)&1.25&(.08)&1.18&(.02)&1.19&(.03)&1.59&(.10)\\
     327Alphanumeric \#2&2.69&(.66)&1.05&(.02)&1.36&(.09)&1.20&(.03)&1.19&(.04)&1.80&(.11)\\
     328Arabic&1.82&(.39)&1.05&(.03)&1.15&(.08)&1.37&(.03)&1.37&(.04)&1.66&(.10)\\
     329Currency&1.04&(.30)&1.03&(.02)&1.04&(.06)&1.59&(.15)&1.61&(.14)&1.78&(.21)\\
     330Cyrillic&2.10&(.44)&1.06&(.02)&0.96&(.06)&1.27&(.02)&1.33&(.04)&1.23&(.09)\\
     331Email&3.57&(.87)&1.05&(.03)&1.37&(.14)&1.13&(.03)&1.16&(.04)&1.67&(.18)\\
     332\midrule
     333\textit{Geomean}&\multicolumn{2}{c}{--}&1.04&&1.18&&1.28&&1.30&&1.61&\\
    308334\bottomrule
    309335\end{tabular}
     
    313339
    314340
    315 Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler
    316 improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
    317 %
    318 By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
    319 higher variation between datasets.
    320 %
    321 This appears to be a consequence of complex Kleene-* repetitions (i.e., those that cannot utilize the MatchStar operation)
    322 both resulting in increased register pressure and worse branch misprediction because of the characteristics in the datasets
    323 themselves.
    324 %
    325 
     341% Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler
     342% improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
     343% %
     344% By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
     345% higher variation between datasets.
     346% %
     347% This appears to be a consequence of complex Kleene-* repetitions (i.e., those that cannot utilize the MatchStar operation)
     348% both resulting in increased register pressure and worse branch misprediction because of the characteristics in the datasets
     349% themselves.
     350% %
     351%
    326352
    327353
Note: See TracChangeset for help on using the changeset viewer.