source: docs/Working/icGrep/evaluation.tex @ 4780

Last change on this file since 4780 was 4780, checked in by nmedfort, 3 years ago

Figure and table alignment

File size: 14.7 KB
Line 
1\section{Evaluation}\label{sec:evaluation}
2
3\def\AN{A.N.}
4
5In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects.   
6%
7First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously.
8%
9Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors,
10namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.
11%
12Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an
13Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor.
14
15\subsection{Optimizations of Bitwise Methods}
16
17In order to support evaluation of bitwise methods, as well as to support
18the teaching of those methods and ongoing research, \icGrep{} has an array
19of command-line options.   This makes it straightforward
20to report on certain performance aspects of \icGrep{}, while others require
21special builds. 
22
23For example, the command-line switch \texttt{-disable-matchstar} can be used
24to eliminate the use of the MatchStar operation for handling
25Kleene-* repetition of character classes.   In this case, \icGrep{} substitutes
26a while loop that iteratively extends match results.   
27Surprisingly, this
28does not change performance much in many practical cases.   
29In each block,
30the maximum iteration count is the maximum length run encountered; the
31overall performance is based on the average of these maxima throughout the
32file.   But when search for XML tags using the regular expression
33\verb:<[^!?][^>]*>:, a slowdown of more than 2$\times$ may be found in files
34with many long tags. 
35
36%The {\tt -disable-log2-bounded-repetition} flag allows these
37%effectiveness of the special techniques for bounded repetition of
38%byte classes to be assessed.   A slowdown of 30\% was observed
39%with the searches using the regular expression
40%\verb:(^|[ ])[a-zA-Z]{11,33}([.!? ]|$):, for example.
41
42In order to short-circuit processing when no remaining matches
43are possible in a block, our regular expression compiler periodically inserts
44if-statements to check whether there are any marker bits still in play.
45To control this feature in dynamically
46generated code, the number of pattern elements between each if-test %non-nullable
47can be selected with the {\tt -if-insertion-gap=} option.   
48%
49The default value in \icGrep{} is 3; setting the gap to 100 effectively
50turns off if-insertion. 
51%
52Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredictions.
53%
54For patterns with long strings, however, there can be a substantial slowdown.
55
56%; searching for a pattern of length 40 slows down by more
57%than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim.
58
59% \comment{
60% Additionally, \icGrep{} provides options that allow
61% various internal representations to be printed out.   
62% %
63% These can aid in understanding and/or debugging performance issues.
64% For example, the option
65% {\tt -print-REs} shows the parsed regular expression as it goes
66% through various transformations.   The internal \Pablo{} code generated
67% may be displayed with {\tt -print-pablo}.  This can be quite useful in
68% helping understand the match process.   It also possible to print out the
69% generated LLVM IR code ({\tt -dump-generated-IR}), but this may be
70% less useful as it includes many
71% details of low-level carry-handling that obscures the core logic.
72% }
73
74The precompiled calculations of the various Unicode properties
75are each placed in if-hierarchies as described previously.   To assess the
76impact of this strategy, we built a version of icGrep without such
77if-hierarchies.  In this case, when a Unicode property class is defined,
78bitwise logic equations are applied for all members of the class independent
79of the Unicode blocks represented in the input document.   For the classes
80covering the largest numbers of codepoints, we observed slowdowns of up to 5$\times$.
81
82\subsection{Simple Property Expressions}
83
84A key feature of Unicode level 1 support in regular expression engines
85the support that they provide for property expressions and combinations of property expressions
86using set union, intersection and difference operators.   Both {\tt ugrep}
87and {\tt icgrep} provide systematic support for all property expressions
88at Unicode Level 1 as well as set union, intersection and difference.
89Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
90However, these operators can be expressed using a regular expression
91feature known as a lookbehind assertion.   Set intersection involves a
92regular expression formed with a one of the property expressions and a
93positive lookbehind assertion on the other, while set difference uses
94a negative lookbehind assertion. 
95
96We generated a set of regular expressions involving all Unicode values of
97the Unicode general category property ({\tt gc}) and all values of the Unicode
98script property ({\tt sc}). 
99We then generated
100expressions involving random pairs of {\tt gc} and {\tt sc}
101values combined with a random set operator chosen from union, intersection and difference.
102All property values are represented at least once.   
103A small number of
104expressions were removed because they involved properties not supported by pcre2grep.
105In the end 246 test expressions were constructed in this process.
106
107\begin{figure}
108\vspace{-0.5em}
109\begin{center}
110\pgfplotstableread[col sep = comma]{data/icgrep-scatter.csv}\icgrep
111\pgfplotstableread[col sep = comma]{data/ugrep541-scatter.csv}\ugrep
112\pgfplotstableread[col sep = comma]{data/pcre2grep-scatter.csv}\pcre
113
114\begin{tikzpicture}
115\begin{axis}[
116grid=both,
117x tick label style={ /pgf/number format/1000 sep=},
118ylabel={CPU Cycles Per Byte},
119xlabel={Percentage of Matching Lines},
120minor y tick num={1},
121xmax=100,
122height=0.5\textwidth,
123legend style={at={(1.05,0.5)},
124anchor=west,legend columns=1,
125align=left,draw=none,column sep=2ex}
126]
127\addplot+[no markers,line width=2pt,color=blue!60,solid] table {\icgrep};
128\addplot+[no markers,line width=2pt,color=red!60,solid] table {\ugrep};
129\addplot+[no markers,line width=2pt,color=brown!60,solid] table {\pcre};
130\legend{icGrep,ugrep541,pcre2grep}
131\end{axis}
132\end{tikzpicture}
133\end{center}
134\vspace{-1em}
135\caption{Matching Performance for Simple Property Expressions}\label{fig:property_test}
136\vspace{-0.5em}
137\end{figure}
138
139We selected a set of Wikimedia XML files in several major languages representing
140most of the world's major language families as a test corpus.
141%
142For each program under test, we performed searches for each regular expression against each XML document.
143%
144Performance is reported in CPU cycles per byte on an Intel i7-2600 machine.   
145%
146The results are presented in Figure~\ref{fig:property_test}.
147%
148They were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. 
149%
150When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
151%
152The performance of both pcre2grep and ugrep improves (has a reduction in CPU cycles per byte) as the percentage of matching lines increases.
153%
154This occurs because each match allows them to bypass processing the rest of the line.
155%
156On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.   
157%
158This is primarily due to property classes that include large numbers of codepoints.   
159%
160These classes require more bitstream equations for calculation and also have a greater probability of matching.   
161%
162Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
163
164
165\subsection{Complex Expressions}
166
167\begin{table}\centering % requires booktabs
168\small\vspace{-2em}
169\begin{tabular}{@{}p{2cm}p{9.8cm}@{}}
170\textbf{Name}&\textbf{Regular Expression}\\
171\toprule
172\AN{} \#1&\lstinline`^[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*$`\\
173\midrule
174\AN{} \#2&\lstinline`[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*`\\
175\midrule
176Arabic&\lstinline`^[\p{Arabic}\p{Common}]*\p{Arabic}[\p{Arabic}\p{Common}]*$`\\
177\midrule
178Currency&\lstinline`(\p{Sc}\s*(\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?)|`\\
179&\lstinline`((\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?\s*\p{Sc})`\\
180\midrule
181Cyrillic&\lstinline`[\p{Pi}\p{Po}]\p{Cyrillic}{6,}[\p{Pf}\p{Pe}]`\\
182\midrule
183Email &\lstinline`([^\p{Z}<]+@[\p{L}\p{M}\p{N}.-]+\.(\p{L}\p{M}*){2,6})`\\
184&\lstinline`(>|\p{Z}|$)`\\
185\bottomrule
186\end{tabular}
187\caption{Regular Expressions}\label{table:regularexpr}
188\vspace{-2em}
189\end{table}
190
191This study evaluates the comparative performance of the matching engines on a
192series of more complex expressions, shown in Table \ref{table:regularexpr}.
193%
194The first two are alphanumeric (\AN{}) expressions, differing only in that the first
195one is anchored to match the entire line.
196%
197The third searches for lines consisting of text in Arabic script.
198%
199The fourth expression is a published currency expression taken from
200Stewart and Uckelman~\cite{stewart2013unicode}.
201%
202An expression matching runs of six or more Cyrillic script characters enclosed
203in initial/opening and final/ending punctuation is fifth in the list.
204%
205The final expression is an email expression that allows internationalized names.
206%
207In Table \ref{table:complexexpr}, we show the performance results obtained
208from an Intel i7-2600 using generic 64-bit binaries for each engine.
209We limit the SIMD ISA within \icGrep{} to SSE2 which is available
210on all Intel/AMD 64-bit machines.
211%
212In each case, we report seconds taken per GB of input averaged over 10 
213runs each on our Wikimedia document collection.
214
215\begin{table}[ht]\centering % requires booktabs
216\newcolumntype{T}{c}
217\small\vspace{-2em}
218\begin{tabular}{@{}p{2cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}}
219&\multicolumn{6}{c}{\textbf{\icGrep{}}}\\
220\cmidrule[1pt](lr){2-7}
221\cmidrule[1pt](lr){8-10}
222\cmidrule[1pt](lr){11-13}
223\textbf{Expression}&\multicolumn{3}{T}{\textbf{SEQ}}&\multicolumn{3}{T}{\textbf{MT}}&\multicolumn{3}{T}{\textbf{pcre2grep}}&\multicolumn{3}{T}{\textbf{ugrep541}}\\
224\toprule
225\AN{} \#1&2.4&5.0&&2.1&4.4&&8.2&11.3&&8.8&11.3&\\
226\AN{} \#2&2.3&4.9&&2.0&4.1&&209.9&563.5&&182.3&457.9&\\
227Arabic&1.5&3.4&&1.2&2.6&&7.5&270.8&&8.9&327.8&\\
228Currency&0.7&2.1&&0.4&1.4&&188.4&352.3&&52.8&152.8&\\
229Cyrillic&1.6&3.9&&1.3&2.8&&30.5&49.7&&11.2&20.1&\\
230Email&3.0&6.9&&2.7&6.4&&67.2&1442.0&&108.8&1022.3&\\
231\bottomrule
232\end{tabular}
233\caption{Matching Times for Complex Expressions (s/GB)}\label{table:complexexpr}
234\vspace{-2em}
235\end{table}
236
237
238The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep
239show dramatic slowdowns with ambiguities in regular expressions.
240%
241This is most clearly illustrated in the different performance figures
242for the two Alphanumeric test expressions but is also evident in the
243Arabic, Currency and Email expressions.   
244%
245Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. 
246%
247The multithreaded \icGrep{} shows speedup in every case but balancing
248of the workload across multiple cores is clearly an area for further work.
249%
250Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version
251
252
253
254%
255Table \ref{table:relperf} shows the speedups obtained with \icGrep{}
256on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
257and both single-threaded and multi-threaded versions.
258All speedups are relative to the base single-threaded SSE2 performance on this machine,
259which is given in seconds per GB in the first column.
260%
261The SSE2 results are again using the generic binaries compiled for compatibility
262with all 64-bit processors.   
263%
264The AVX1 results are for Intel AVX instructions
265in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form,
266which helps reduce register pressure.   The AVX2 results are for \icGrep{}
267compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time.
268
269\begin{table}[h]\centering % requires booktabs,siunitx
270\small\vspace{-2em}
271\begin{tabular}{@{}p{2cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
272&\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
273\cmidrule[1pt](lr){2-3}
274\cmidrule[1pt](lr){4-7}
275\cmidrule[1pt](lr){8-13}
276\textbf{Expression}&\multicolumn{2}{c}{\textbf{s/GB}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
277\toprule
278\AN{} \#1&2.76&(.65)&1.05&(.03)&1.25&(.08)&1.18&(.02)&1.19&(.03)&1.59&(.10)\\
279\AN{} \#2&2.69&(.66)&1.05&(.02)&1.36&(.09)&1.20&(.03)&1.19&(.04)&1.80&(.11)\\
280Arabic&1.82&(.39)&1.05&(.03)&1.15&(.08)&1.37&(.03)&1.37&(.04)&1.66&(.10)\\
281Currency&1.04&(.30)&1.03&(.02)&1.04&(.06)&1.59&(.15)&1.61&(.14)&1.78&(.21)\\
282Cyrillic&2.10&(.44)&1.06&(.02)&0.96&(.06)&1.27&(.02)&1.33&(.04)&1.23&(.09)\\
283Email&3.57&(.87)&1.05&(.03)&1.37&(.14)&1.13&(.03)&1.16&(.04)&1.67&(.18)\\
284\midrule
285\textit{Geomean}&\multicolumn{2}{c}{--}&1.04&&1.18&&1.28&&1.30&&1.61&\\
286\bottomrule
287\end{tabular}
288\caption{Speedup of Complex Expressions on i7-4700MQ $(\sigma)$}\label{table:relperf}
289\vspace{-2em}
290\end{table}
291
292In each case, the use of three-operand form with AVX1 confers a slight
293speedup.  The change to use 256 bits with AVX2 gives a further overall improvement,
294but some mixed results due to the limitations of 256 bit addition.   Combining
295the AVX2 ISA with multithreading gives and average overall 61\% speedup compared to base.
296
297
298
299
300
301
302
303
304
305% \subsection{Single vs. Multithreaded Performance}
306%
307%
308% \begin{figure}
309% \begin{center}
310% \pgfplotstableread[col sep = comma]{data/icgrep-scatter-mt.csv}\base
311% \pgfplotstableread[col sep = comma]{data/icgrep-mt-scatter-mt.csv}\mt
312% \pgfplotstableread[col sep = comma]{data/icgrep-mt3-scatter-mt.csv}\mtt
313% \pgfplotstableread[col sep = comma]{data/icgrep-flat-scatter-mt.csv}\flat
314% \begin{tikzpicture}
315% \begin{axis}[
316% grid=both,
317% x tick label style={ /pgf/number format/1000 sep=},
318% ylabel={Seconds Per GB ($1000^3$)},
319% xlabel={Percentage of Matching Lines},
320% minor y tick num={1},
321% ymin=0,ymax=3,
322% xmax=100,
323% height=0.5\textwidth,
324% legend style={at={(1.05,0.5)},
325% anchor=west,legend columns=1,
326% align=left,draw=none,column sep=2ex}
327% ]
328% \addplot+[sharp plot, no markers,line width=2pt,color=blue!60,solid] table {\base};
329% \addplot+[sharp plot, no markers,line width=2pt,color=red!60,solid] table {\mt};
330% \addplot+[sharp plot, no markers,line width=2pt,color=brown!60,solid] table {\mtt};
331% %\addplot+[no markers,line width=2pt,color=green!60,solid] table {\flat};
332% \legend{icGrep (Base),icGrep (MT2),icGrep (MT3), icGrep (Flat)}
333% \end{axis}
334%
335%
336% \end{tikzpicture}
337% \end{center}
338% \caption{Multithreading Performance}\label{fig:performance_test}
339% \end{figure}
Note: See TracBrowser for help on using the repository browser.