Changeset 4557
- Timestamp:
- May 15, 2015, 2:36:34 AM (4 years ago)
- Location:
- docs/Working/icGrep
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Working/icGrep/evaluation.tex
r4554 r4557 1 1 \section{Evaluation}\label{sec:evaluation} 2 2 3 In this section, we report on the evaluation of \icGrep{} performance, looking 4 at three aspects. First, we examine some performance aspects of 5 \icGrep{} internal methods, looking at the impact of optimizations discussed previously.6 Then we move on to a systematic performance study of \icGrep{} 7 with named Unicode property searches in comparison to two 8 contemporary competitors, namely, pcre2grep released in January 2015 9 and ugrep of the ICU 54.1 software distribution. Finally, we examine 10 both more complex expressions and also the impact 11 of multithreading \icGrep{}.3 In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects. 4 % 5 First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously. 6 % 7 Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors, 8 namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution. 9 % 10 Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an 11 Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor. 12 12 13 13 \subsection{Optimizations of Bitwise Methods} … … 80 80 and {\tt icgrep} provide systematic support for all property expressions 81 81 at Unicode Level 1 as well as set union, intersection and difference. 82 On the other hand, {\tt pcre2grep} does not support the set intersection and difference operators directly.83 However, these operators can insteadbe expressed using a regular expression82 Unfortunetly, {\tt pcre2grep} does not support the set intersection and difference operators directly. 83 However, these operators can be expressed using a regular expression 84 84 feature known as a lookbehind assertion. Set intersection involves a 85 85 regular expression formed with a one of the property expressions and a … … 134 134 For each program under test, we performed searches for each regular expression against each XML document. 135 135 % 136 Performance is reported in CPU cycles per byte on an Intel Core i7machine.136 Performance is reported in CPU cycles per byte on an Intel i7-2600 machine. 137 137 % 138 138 The results are presented in Figure~\ref{fig:property_test}. … … 158 158 159 159 \begin{table}\centering % requires booktabs 160 \small 160 \small\vspace{-2em} 161 161 \begin{tabular}{@{}p{2.7cm}p{10.8cm}@{}} 162 162 \textbf{Name}&\textbf{Regular Expression}\\ … … 176 176 \end{tabular} 177 177 \caption{Regular Expressions}\label{table:regularexpr} 178 \vspace{- 1em}178 \vspace{-2em} 179 179 \end{table} 180 180 181 182 We also examine the comparative performance of the matching engines on a 183 series of more complex expressions as shown in Table \ref{table:regularexpr}. 181 This study evaluates the comparative performance of the matching engines on a 182 series of more complex expressions, shown in Table \ref{table:regularexpr}. 183 % 184 184 The first two are alphanumeric expressions, differing only in that the first 185 185 one is anchored to match the entire line. 186 % 186 187 The third searches for lines consisting of text in Arabic script. 188 % 187 189 The fourth expression is a published currency expression taken from 188 190 Stewart and Uckelman~\cite{stewart2013unicode}. 191 % 189 192 An expression matching runs of six or more Cyrillic script characters enclosed 190 193 in initial/opening and final/ending punctuation is fifth in the list. 191 The final expression is an email expression that allows internationalized 192 names. 193 194 % 195 The final expression is an email expression that allows internationalized names. 196 % 197 In Table \ref{table:complexexpr}, we show the performance results obtained 198 from an Intel i7-2600 using precompiled binaries for each engine 199 that are suitable for any 64-bit architecture. 200 % 201 In each case, we report seconds taken per GB of input averaged over 10 202 runs each on our Wikimedia document collection. 203 % 204 Table \ref{table:relperf} further studies \icGrep{} on a newer Intel 205 i7-4700MQ architecture and evaluates the improvement gained by the 206 newer processor and improved SIMD instruction set architecture (ISA). 207 % 208 Both SSE2 and AVX1 use 128-bit registers. 209 % 210 The main advantage of AVX1 over SSE2 is its support for 3-operand form, 211 which helps reduce register pressure. 212 % 213 AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers. 214 % 215 However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32). 194 216 195 217 % \begin{table} … … 204 226 % \end{table} 205 227 206 207 \begin{table}\centering % requires booktabs 228 % \begin{table*}[htbp] 229 % \begin{center} 230 % \footnotesize 231 % \begin{tabular}{|l||l|l|} 232 % \hline 233 % Processor & i7-2600 (3.4GHz) & i7-4700MQ (2.4GHz) \\ \hline 234 % L1 Cache & 256KB & 256KB \\ \hline 235 % L2 Cache & 1MB & 1MB \\ \hline 236 % L3 Cache & 8MB & 8MB \\ \hline 237 % Bus & 1333Mhz & 1600Mhz \\ \hline 238 % Memory & 8GB & 8GB \\ \hline 239 % \end{tabular} 240 % \caption{Platform Hardware Specs} 241 % \label{hwinfo} 242 % \end{center} 243 % \vspace{-20pt} 244 % \end{table*} 245 246 \begin{table}[ht]\centering % requires booktabs 208 247 \newcolumntype{T}{c} 209 \small 248 \small\vspace{-2em} 210 249 \begin{tabular}{@{}p{3cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}} 211 &\multicolumn{6}{c}{\textbf{\icGrep{} }}\\250 &\multicolumn{6}{c}{\textbf{\icGrep{} (SSE2)}}\\ 212 251 \cmidrule[1pt](lr){2-7} 213 252 \cmidrule[1pt](lr){8-10} … … 224 263 \end{tabular} 225 264 \caption{Matching Times for Complex Expressions (Seconds Per GB)}\label{table:complexexpr} 265 \vspace{-2em} 226 266 \end{table} 227 267 228 The performance results are shown in Table \ref{table:complexexpr}. 229 In each case, we report seconds taken per GB of input averaged over 10 230 runs each on our Wikimedia document collection. 231 The most striking aspect of the results is that both ugrep and pcregrep 268 269 The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep 232 270 show dramatic slowdowns with ambiguities in regular expressions. 271 % 233 272 This is most clearly illustrated in the different performance figures 234 273 for the two Alphanumeric test expressions but is also evident in the 235 Arabic, Currency and Email expressions. By way of contrast, \icGrep{} 236 maintains consistently fast performance in all test scenarios. 237 238 The multithreaded \icGrep{} shows speedup in every case, but balancing 239 of the workload across multiple cores is clearly an area for further work. 240 Nevertheless, our three thread system shows a speedup over 241 the single threaded version by up to 40\%. 242 243 244 245 274 Arabic, Currency and Email expressions. 275 % 276 Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. 277 % 278 The multithreaded \icGrep{} shows speedup in every case but balancing 279 of the workload across multiple cores is clearly an area for further work. 280 % 281 Nevertheless, our three thread system shows up to a 40\% speedup. % over the single threaded version 282 283 284 \begin{table}[h]\centering % requires booktabs,siunitx 285 \small 286 \vspace{-2em} 287 \begin{tabular}{@{}p{3cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}} 288 &\multicolumn{6}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\ 289 \cmidrule[1pt](lr){2-7} 290 \cmidrule[1pt](lr){8-13} 291 \textbf{Expression}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\ 292 \toprule 293 Alphanumeric \#1&1.28&(.06)&1.35&(.05)&1.64&(.16)&1.41&(.06)&1.44&(.06)&1.96&(.18)\\ 294 Alphanumeric \#2&1.27&(.06)&1.32&(.05)&1.77&(.19)&1.39&(.07)&1.39&(.04)&2.18&(.22)\\ 295 Arabic&1.21&(.07)&1.28&(.08)&1.43&(.16)&1.30&(.05)&1.30&(.05)&1.63&(.13)\\ 296 Currency&1.01&(.05)&1.03&(.06)&1.06&(.12)&1.05&(.05)&1.06&(.05)&1.21&(.08)\\ 297 Cyrillic&1.18&(.06)&1.25&(.05)&1.13&(.10)&1.26&(.04)&1.33&(.04)&1.22&(.10)\\ 298 Email&1.32&(.04)&1.38&(.05)&1.86&(.21)&1.42&(.04)&1.46&(.05)&2.17&(.26)\\ 299 \midrule 300 \textit{Geomean}&1.21&&1.26&&1.45&&1.30&&1.32&&1.68&\\ 301 \bottomrule 302 \end{tabular} 303 \caption{Performance of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf} 304 \vspace{-2em} 305 \end{table} 306 307 308 Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware 309 improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. 310 % 311 By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits 312 higher variation between datasets. 313 % 314 This appears to be a consequence of complex Kleene-* repetitions (i.e., those that cannot utilize the MatchStar operation) 315 both resulting in increased register pressure and worse branch misprediction because of the characteristics in the datasets 316 themselves. 317 % 246 318 247 319
Note: See TracChangeset
for help on using the changeset viewer.