source: docs/ICA3PP2015/evaluation.tex @ 5789

Last change on this file since 5789 was 4786, checked in by cameron, 4 years ago

More formatting for LNCS requirements

File size: 12.2 KB
Line 
1\section{Evaluation}\label{sec:evaluation}
2
3\def\AN{A.N.}
4
5In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects.   
6First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously.
7Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors,
8namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.
9Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an
10Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor.
11
12\subsection{Optimizations of Bitwise Methods}
13
14In order to support evaluation of bitwise methods, as well as to support
15the teaching of those methods and ongoing research, \icGrep{} has an array
16of command-line options.   This makes it straightforward
17to report on certain performance aspects of \icGrep{}, while others require
18special builds. 
19
20For example, the command-line switch \texttt{-disable-matchstar} can be used
21to eliminate the use of the MatchStar operation for handling
22Kleene-* repetition of character classes.   In this case, \icGrep{} substitutes
23a while loop that iteratively extends match results.   
24Surprisingly, this
25does not change performance much in many practical cases.   
26In each block,
27the maximum iteration count is the maximum length run encountered; the
28overall performance is based on the average of these maxima throughout the
29file.   But when search for XML tags using the regular expression
30\verb:<[^!?][^>]*>:, a slowdown of more than 2$\times$ may be found in files
31with many long tags. 
32
33In order to short-circuit processing when no remaining matches
34are possible in a block, our regular expression compiler periodically inserts
35if-statements to check whether there are any marker bits still in play.
36To control this feature in dynamically
37generated code, the number of pattern elements between each if-test %non-nullable
38can be selected with the {\tt -if-insertion-gap=} option.   
39The default value in \icGrep{} is 3; setting the gap to 100 effectively
40turns off if-insertion. 
41Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredictions.
42For patterns with long strings, however, there can be a substantial slowdown.
43
44
45The precompiled calculations of the various Unicode properties
46are each placed in if-hierarchies as described previously.   To assess the
47impact of this strategy, we built a version of icGrep without such
48if-hierarchies.  In this case, when a Unicode property class is defined,
49bitwise logic equations are applied for all members of the class independent
50of the Unicode blocks represented in the input document.   For the classes
51covering the largest numbers of codepoints, we observed slowdowns of up to 5$\times$.
52
53\subsection{Simple Property Expressions}
54
55A key feature of Unicode level 1 support in regular expression engines
56the support that they provide for property expressions and combinations of property expressions
57using set union, intersection and difference operators.   Both {\tt ugrep}
58and {\tt icgrep} provide systematic support for all property expressions
59at Unicode Level 1 as well as set union, intersection and difference.
60However, in order to implement these operators with {\tt pcre2grep}, we
61translated them into an equivalent form using lookbehind assertions.
62%Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
63%However, these operators can be expressed using a regular expression
64%feature known as a lookbehind assertion.   Set intersection involves a
65%regular expression formed with a one of the property expressions and a
66%positive lookbehind assertion on the other, while set difference uses
67%a negative lookbehind assertion. 
68
69We generated a set of regular expressions involving all Unicode values of
70the Unicode general category property ({\tt gc}) and all values of the Unicode
71script property ({\tt sc}). 
72We then generated
73expressions involving random pairs of {\tt gc} and {\tt sc}
74values combined with a random set operator chosen from union, intersection and difference.
75All property values are represented at least once.   
76A small number of
77expressions were removed because they involved properties not supported by pcre2grep.
78In the end 246 test expressions were constructed in this process.
79
80We selected a set of Wikimedia XML files in several major languages representing
81most of the world's major language families as a test corpus.
82For each program under test, we performed searches for each regular expression against each XML document.
83Test cases were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. 
84Performance is reported in CPU cycles per byte on an Intel i7-2600 machine.   
85The results are presented in Fig.~\ref{fig:property_test}.
86\begin{figure}
87\vspace{-0.5em}
88\begin{center}
89\pgfplotstableread[col sep = comma]{data/icgrep-scatter.csv}\icgrep
90\pgfplotstableread[col sep = comma]{data/ugrep541-scatter.csv}\ugrep
91\pgfplotstableread[col sep = comma]{data/pcre2grep-scatter.csv}\pcre
92
93\begin{tikzpicture}
94\begin{axis}[
95grid=both,
96x tick label style={ /pgf/number format/1000 sep=},
97ylabel={CPU Cycles Per Byte},
98xlabel={Percentage of Matching Lines},
99minor y tick num={1},
100xmax=100,
101height=0.5\textwidth,
102legend style={at={(1.05,0.5)},
103anchor=west,legend columns=1,
104align=left,draw=none,column sep=2ex}
105]
106\addplot+[no markers,line width=2pt,color=blue!60,solid] table {\icgrep};
107\addplot+[no markers,line width=2pt,color=red!60,solid] table {\ugrep};
108\addplot+[no markers,line width=2pt,color=brown!60,solid] table {\pcre};
109\legend{icGrep,ugrep541,pcre2grep}
110\end{axis}
111\end{tikzpicture}
112\end{center}
113\vspace{-1em}
114\caption{Matching Performance for Simple Property Expressions}\label{fig:property_test}
115\vspace{-0.5em}
116\end{figure}
117
118When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
119The performance of both pcre2grep and ugrep improves (CPU cycles per byte decreases) as the percentage of matching lines increases.
120This occurs because each match allows them to bypass processing the rest of the line.
121On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.   
122This is primarily due to property classes that include large numbers of codepoints.   
123These classes require more bitstream equations for calculation and also have a greater probability of matching.   
124Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
125
126
127\subsection{Complex Expressions}
128
129This study evaluates the comparative performance of the matching engines on a set of
130more complex expressions, shown in Table \ref{table:regularexpr}.
131The first two are alphanumeric (\AN{}) expressions, differing only in that the first
132one is anchored to match the entire line.
133The third searches for lines consisting of text in Arabic script.
134The fourth expression is a published currency expression taken from
135Stewart and Uckelman\cite{stewart2013unicode}.
136An expression matching runs of six or more Cyrillic script characters enclosed
137in initial/opening and final/ending punctuation is fifth in the list.
138The last expression matches internationalized email names.
139
140\begin{table}\centering % requires booktabs
141\caption{Regular expressions}\label{table:regularexpr}
142\small%\vspace{-2em}
143\begin{tabular}{@{}p{2cm}p{9.8cm}@{}}
144\textbf{Name}&\textbf{Regular Expression}\\
145\toprule
146\AN{} \#1&\lstinline`^[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*$`\\
147\midrule
148\AN{} \#2&\lstinline`[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*`\\
149\midrule
150Arabic&\lstinline`^[\p{Arabic}\p{Common}]*\p{Arabic}[\p{Arabic}\p{Common}]*$`\\
151\midrule
152Currency&\lstinline`(\p{Sc}\s*(\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?)|`\\
153&\lstinline`((\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?\s*\p{Sc})`\\
154\midrule
155Cyrillic&\lstinline`[\p{Pi}\p{Po}]\p{Cyrillic}{6,}[\p{Pf}\p{Pe}]`\\
156\midrule
157Email &\lstinline`([^\p{Z}<]+@[\p{L}\p{M}\p{N}.-]+\.(\p{L}\p{M}*){2,6})`\\
158&\lstinline`(>|\p{Z}|$)`\\
159\bottomrule
160\end{tabular}
161\end{table}
162
163Table \ref{table:complexexpr} shows the performance results
164on our Intel i7-2600 test machine, reporting seconds taken per GB of input averaged over 10 
165runs each on our Wikimedia document collection.
166
167\begin{table}[ht]\centering % requires booktabs
168\caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr}
169\newcolumntype{T}{c}
170\small%\vspace{-2em}
171\begin{tabular}{@{}p{2cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}}
172&\multicolumn{6}{c}{\textbf{\icGrep{}}}\\
173\cmidrule[1pt](lr){2-7}
174\cmidrule[1pt](lr){8-10}
175\cmidrule[1pt](lr){11-13}
176\textbf{Expression}&\multicolumn{3}{T}{\textbf{SEQ}}&\multicolumn{3}{T}{\textbf{MT}}&\multicolumn{3}{T}{\textbf{pcre2grep}}&\multicolumn{3}{T}{\textbf{ugrep541}}\\
177\toprule
178\AN{} \#1&2.4&5.0&&2.1&4.4&&8.2&11.3&&8.8&11.3&\\
179\AN{} \#2&2.3&4.9&&2.0&4.1&&209.9&563.5&&182.3&457.9&\\
180Arabic&1.5&3.4&&1.2&2.6&&7.5&270.8&&8.9&327.8&\\
181Currency&0.7&2.1&&0.4&1.4&&188.4&352.3&&52.8&152.8&\\
182Cyrillic&1.6&3.9&&1.3&2.8&&30.5&49.7&&11.2&20.1&\\
183Email&3.0&6.9&&2.7&6.4&&67.2&1442.0&&108.8&1022.3&\\
184\bottomrule
185\end{tabular}
186\end{table}
187
188The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep
189show dramatic slowdowns with ambiguities in regular expressions.
190This is most clearly illustrated in the different performance figures
191for the two Alphanumeric test expressions but is also evident in the
192Arabic, Currency and Email expressions.   
193Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. 
194The multithreaded \icGrep{} shows speedup in every case but balancing
195of the workload across multiple cores is clearly an area for further work.
196Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version
197
198Table \ref{table:relperf} shows the speedups obtained with \icGrep{}
199on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
200and both single-threaded and multi-threaded versions.
201All speedups are relative to the base single-threaded SSE2 performance on this machine,
202which is given in seconds per GB in the first column.
203The SSE2 results are again using the generic binaries compiled for compatibility
204with all 64-bit processors.   
205The AVX1 results are for Intel AVX instructions
206in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form,
207which helps reduce register pressure.   The AVX2 results are for \icGrep{}
208compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time.
209
210\begin{table}[h]\centering % requires booktabs,siunitx
211\caption{Speedup of complex expressions on i7-4700MQ $(\sigma)$}\label{table:relperf}
212\small%\vspace{-2em}
213\begin{tabular}{@{}p{2cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
214&\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
215\cmidrule[1pt](lr){2-3}
216\cmidrule[1pt](lr){4-7}
217\cmidrule[1pt](lr){8-13}
218\textbf{Expression}&\multicolumn{2}{c}{\textbf{s/GB}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
219\toprule
220\AN{} \#1&2.76&(.65)&1.05&(.03)&1.25&(.08)&1.18&(.02)&1.19&(.03)&1.59&(.10)\\
221\AN{} \#2&2.69&(.66)&1.05&(.02)&1.36&(.09)&1.20&(.03)&1.19&(.04)&1.80&(.11)\\
222Arabic&1.82&(.39)&1.05&(.03)&1.15&(.08)&1.37&(.03)&1.37&(.04)&1.66&(.10)\\
223Currency&1.04&(.30)&1.03&(.02)&1.04&(.06)&1.59&(.15)&1.61&(.14)&1.78&(.21)\\
224Cyrillic&2.10&(.44)&1.06&(.02)&0.96&(.06)&1.27&(.02)&1.33&(.04)&1.23&(.09)\\
225Email&3.57&(.87)&1.05&(.03)&1.37&(.14)&1.13&(.03)&1.16&(.04)&1.67&(.18)\\
226\midrule
227\textit{Geomean}&\multicolumn{2}{c}{--}&1.04&&1.18&&1.28&&1.30&&1.61&\\
228\bottomrule
229\end{tabular}
230\end{table}
231
232In each case, the use of three-operand form with AVX1 confers a slight
233speedup.  The change to use 256 bits with AVX2 gives a further overall improvement,
234but some mixed results due to the limitations of 256 bit addition.   Combining
235the AVX2 ISA with multithreading gives and average overall 61\% speedup compared to base.
Note: See TracBrowser for help on using the repository browser.