# source:docs/Working/icGrep/evaluation.tex@4782

Last change on this file since 4782 was 4782, checked in by cameron, 4 years ago

Figure placement according to Springer

File size: 12.3 KB
Line
1\section{Evaluation}\label{sec:evaluation}
2
3\def\AN{A.N.}
4
5In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects.
6First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously.
7Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors,
8namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.
9Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an
10Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor.
11
12\subsection{Optimizations of Bitwise Methods}
13
14In order to support evaluation of bitwise methods, as well as to support
15the teaching of those methods and ongoing research, \icGrep{} has an array
16of command-line options.   This makes it straightforward
17to report on certain performance aspects of \icGrep{}, while others require
18special builds.
19
20For example, the command-line switch \texttt{-disable-matchstar} can be used
21to eliminate the use of the MatchStar operation for handling
22Kleene-* repetition of character classes.   In this case, \icGrep{} substitutes
23a while loop that iteratively extends match results.
24Surprisingly, this
25does not change performance much in many practical cases.
26In each block,
27the maximum iteration count is the maximum length run encountered; the
28overall performance is based on the average of these maxima throughout the
29file.   But when search for XML tags using the regular expression
30\verb:<[^!?][^>]*>:, a slowdown of more than 2$\times$ may be found in files
31with many long tags.
32
33In order to short-circuit processing when no remaining matches
34are possible in a block, our regular expression compiler periodically inserts
35if-statements to check whether there are any marker bits still in play.
36To control this feature in dynamically
37generated code, the number of pattern elements between each if-test %non-nullable
38can be selected with the {\tt -if-insertion-gap=} option.
39The default value in \icGrep{} is 3; setting the gap to 100 effectively
40turns off if-insertion.
41Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredictions.
42For patterns with long strings, however, there can be a substantial slowdown.
43
44
45The precompiled calculations of the various Unicode properties
46are each placed in if-hierarchies as described previously.   To assess the
47impact of this strategy, we built a version of icGrep without such
48if-hierarchies.  In this case, when a Unicode property class is defined,
49bitwise logic equations are applied for all members of the class independent
50of the Unicode blocks represented in the input document.   For the classes
51covering the largest numbers of codepoints, we observed slowdowns of up to 5$\times$.
52
53\subsection{Simple Property Expressions}
54
55A key feature of Unicode level 1 support in regular expression engines
56the support that they provide for property expressions and combinations of property expressions
57using set union, intersection and difference operators.   Both {\tt ugrep}
58and {\tt icgrep} provide systematic support for all property expressions
59at Unicode Level 1 as well as set union, intersection and difference.
60Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
61However, these operators can be expressed using a regular expression
62feature known as a lookbehind assertion.   Set intersection involves a
63regular expression formed with a one of the property expressions and a
64positive lookbehind assertion on the other, while set difference uses
65a negative lookbehind assertion.
66
67We generated a set of regular expressions involving all Unicode values of
68the Unicode general category property ({\tt gc}) and all values of the Unicode
69script property ({\tt sc}).
70We then generated
71expressions involving random pairs of {\tt gc} and {\tt sc}
72values combined with a random set operator chosen from union, intersection and difference.
73All property values are represented at least once.
74A small number of
75expressions were removed because they involved properties not supported by pcre2grep.
76In the end 246 test expressions were constructed in this process.
77
78We selected a set of Wikimedia XML files in several major languages representing
79most of the world's major language families as a test corpus.
80For each program under test, we performed searches for each regular expression against each XML document.
81Performance is reported in CPU cycles per byte on an Intel i7-2600 machine.
82The results are presented in Fig.~\ref{fig:property_test}.
83They were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments.
84When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
85The performance of both pcre2grep and ugrep improves (has a reduction in CPU cycles per byte) as the percentage of matching lines increases.
86This occurs because each match allows them to bypass processing the rest of the line.
87On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.
88This is primarily due to property classes that include large numbers of codepoints.
89These classes require more bitstream equations for calculation and also have a greater probability of matching.
90Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
91
92\begin{figure}
93\vspace{-0.5em}
94\begin{center}
98
99\begin{tikzpicture}
100\begin{axis}[
101grid=both,
102x tick label style={ /pgf/number format/1000 sep=},
103ylabel={CPU Cycles Per Byte},
104xlabel={Percentage of Matching Lines},
105minor y tick num={1},
106xmax=100,
107height=0.5\textwidth,
108legend style={at={(1.05,0.5)},
109anchor=west,legend columns=1,
110align=left,draw=none,column sep=2ex}
111]
115\legend{icGrep,ugrep541,pcre2grep}
116\end{axis}
117\end{tikzpicture}
118\end{center}
119\vspace{-1em}
120\caption{Matching Performance for Simple Property Expressions}\label{fig:property_test}
121\vspace{-0.5em}
122\end{figure}
123
124\subsection{Complex Expressions}
125
126This study evaluates the comparative performance of the matching engines on a
127series of more complex expressions, shown in Table \ref{table:regularexpr}.
128The first two are alphanumeric (\AN{}) expressions, differing only in that the first
129one is anchored to match the entire line.
130The third searches for lines consisting of text in Arabic script.
131The fourth expression is a published currency expression taken from
132Stewart and Uckelman~\cite{stewart2013unicode}.
133An expression matching runs of six or more Cyrillic script characters enclosed
134in initial/opening and final/ending punctuation is fifth in the list.
135The final expression is an email expression that allows internationalized names.
136
137\begin{table}\centering % requires booktabs
138\small\vspace{-2em}
139\begin{tabular}{@{}p{2cm}p{9.8cm}@{}}
140\textbf{Name}&\textbf{Regular Expression}\\
141\toprule
142\AN{} \#1&\lstinline^[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*$\\ 143\midrule 144\AN{} \#2&\lstinline[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*\\ 145\midrule 146Arabic&\lstinline^[\p{Arabic}\p{Common}]*\p{Arabic}[\p{Arabic}\p{Common}]*$\\
147\midrule
148Currency&\lstinline(\p{Sc}\s*(\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?)|\\
149&\lstinline((\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?\s*\p{Sc})\\
150\midrule
151Cyrillic&\lstinline[\p{Pi}\p{Po}]\p{Cyrillic}{6,}[\p{Pf}\p{Pe}]\\
152\midrule
153Email &\lstinline([^\p{Z}<]+@[\p{L}\p{M}\p{N}.-]+\.(\p{L}\p{M}*){2,6})\\
154&\lstinline(>|\p{Z}|$)\\ 155\bottomrule 156\end{tabular} 157\caption{Regular expressions}\label{table:regularexpr} 158\vspace{-2em} 159\end{table} 160 161In Table \ref{table:complexexpr}, we show the performance results obtained 162from an Intel i7-2600 using generic 64-bit binaries for each engine. 163We limit the SIMD ISA within \icGrep{} to SSE2 which is available 164on all Intel/AMD 64-bit machines. 165In each case, we report seconds taken per GB of input averaged over 10 166runs each on our Wikimedia document collection. 167 168\begin{table}[ht]\centering % requires booktabs 169\newcolumntype{T}{c} 170\small\vspace{-2em} 171\begin{tabular}{@{}p{2cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}} 172&\multicolumn{6}{c}{\textbf{\icGrep{}}}\\ 173\cmidrule[1pt](lr){2-7} 174\cmidrule[1pt](lr){8-10} 175\cmidrule[1pt](lr){11-13} 176\textbf{Expression}&\multicolumn{3}{T}{\textbf{SEQ}}&\multicolumn{3}{T}{\textbf{MT}}&\multicolumn{3}{T}{\textbf{pcre2grep}}&\multicolumn{3}{T}{\textbf{ugrep541}}\\ 177\toprule 178\AN{} \#1&2.4&5.0&&2.1&4.4&&8.2&11.3&&8.8&11.3&\\ 179\AN{} \#2&2.3&4.9&&2.0&4.1&&209.9&563.5&&182.3&457.9&\\ 180Arabic&1.5&3.4&&1.2&2.6&&7.5&270.8&&8.9&327.8&\\ 181Currency&0.7&2.1&&0.4&1.4&&188.4&352.3&&52.8&152.8&\\ 182Cyrillic&1.6&3.9&&1.3&2.8&&30.5&49.7&&11.2&20.1&\\ 183Email&3.0&6.9&&2.7&6.4&&67.2&1442.0&&108.8&1022.3&\\ 184\bottomrule 185\end{tabular} 186\caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr} 187\vspace{-2em} 188\end{table} 189 190The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep 191show dramatic slowdowns with ambiguities in regular expressions. 192This is most clearly illustrated in the different performance figures 193for the two Alphanumeric test expressions but is also evident in the 194Arabic, Currency and Email expressions. 195Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. 196The multithreaded \icGrep{} shows speedup in every case but balancing 197of the workload across multiple cores is clearly an area for further work. 198Nevertheless, our three-thread system shows up to a 40\% speedup. % over the single threaded version 199 200Table \ref{table:relperf} shows the speedups obtained with \icGrep{} 201on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives 202and both single-threaded and multi-threaded versions. 203All speedups are relative to the base single-threaded SSE2 performance on this machine, 204which is given in seconds per GB in the first column. 205The SSE2 results are again using the generic binaries compiled for compatibility 206with all 64-bit processors. 207The AVX1 results are for Intel AVX instructions 208in 128-bit mode. The main advantage of AVX1 over SSE2 is its support for 3-operand form, 209which helps reduce register pressure. The AVX2 results are for \icGrep{} 210compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time. 211 212\begin{table}[h]\centering % requires booktabs,siunitx 213\small\vspace{-2em} 214\begin{tabular}{@{}p{2cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}} 215&\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\ 216\cmidrule[1pt](lr){2-3} 217\cmidrule[1pt](lr){4-7} 218\cmidrule[1pt](lr){8-13} 219\textbf{Expression}&\multicolumn{2}{c}{\textbf{s/GB}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\ 220\toprule 221\AN{} \#1&2.76&(.65)&1.05&(.03)&1.25&(.08)&1.18&(.02)&1.19&(.03)&1.59&(.10)\\ 222\AN{} \#2&2.69&(.66)&1.05&(.02)&1.36&(.09)&1.20&(.03)&1.19&(.04)&1.80&(.11)\\ 223Arabic&1.82&(.39)&1.05&(.03)&1.15&(.08)&1.37&(.03)&1.37&(.04)&1.66&(.10)\\ 224Currency&1.04&(.30)&1.03&(.02)&1.04&(.06)&1.59&(.15)&1.61&(.14)&1.78&(.21)\\ 225Cyrillic&2.10&(.44)&1.06&(.02)&0.96&(.06)&1.27&(.02)&1.33&(.04)&1.23&(.09)\\ 226Email&3.57&(.87)&1.05&(.03)&1.37&(.14)&1.13&(.03)&1.16&(.04)&1.67&(.18)\\ 227\midrule 228\textit{Geomean}&\multicolumn{2}{c}{--}&1.04&&1.18&&1.28&&1.30&&1.61&\\ 229\bottomrule 230\end{tabular} 231\caption{Speedup of complex expressions on i7-4700MQ$(\sigma)\$}\label{table:relperf}
232\vspace{-2em}
233\end{table}
234
235In each case, the use of three-operand form with AVX1 confers a slight
236speedup.  The change to use 256 bits with AVX2 gives a further overall improvement,
237but some mixed results due to the limitations of 256 bit addition.   Combining
238the AVX2 ISA with multithreading gives and average overall 61\% speedup compared to base.
Note: See TracBrowser for help on using the repository browser.