source: docs/Working/re/sse2.tex @ 3662

Last change on this file since 3662 was 3662, checked in by lindanl, 5 years ago

Update machine info

File size: 7.8 KB
Line 
1\section{SSE2 Implementation and Evaluation}\label{sec:SSE2}
2
3\begin{table*}[htbp]
4\begin{center}
5{
6\footnotesize
7\begin{tabular}{|l|l|}
8\hline
9Name            & Expression    \\ \hline   
10@               & \verb`@`              \\ \hline     
11Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
12Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
13URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
14HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
15\end{tabular}
16}
17\end{center}
18\caption{Regular Expressions}
19\label{RegularExpressions}
20\end{table*}
21
22
23\paragraph{Implementation Notes.}
24Our regular expression compiler directly uses the Parabix tool chain
25to compile regular expression into SSE2-based implementations.
26Our compiler essentially scripts three other compilers to perform
27this work: the Parabix character class compiler to determine basic
28bit stream equations for each of the character classes encountered
29in a regular expression, the Pablo bitstream equation compiler which
30converts equations to block-at-a-time C++ code for 128-bit SIMD, and
31gcc 4.6.3 to generate the binaries.   The Pablo output is combined
32with a {\tt grep\_template.cpp} file that arranges to read input
33files, break them into segments, and print out or count matches
34as they are encountered.
35
36\paragraph{Comparative Implementations.}
37We evaluate our bitwise data parallel implementation versus several
38alternatives.   
39We report data for two of these:
40gre2p
41%GNU grep version 2.10
42and nrgrep version 1.12.
43The gre2p program is a grep version implemented using the recently developed
44RE2 regular expression library, using a systematic DFA-based approach
45(as well as some NFA fallback techniques) \cite{cox2010RE2}.
46The NFA class is represented by nrgrep, one of the
47strongest competitors in regular expression matching performance.
48We also considered GNU grep 2.10, agrep 3.41 as and alternative NFA-based implementation
49and pcregrep 8.12 as a backtracking implementation, but do not
50report data for them. 
51GNU grep is a popular open-source implementation that is claimed to be
52primarily DFA-based with heuristics for important special cases.   
53The agrep implementation does not support
54some of the common regular expression syntax feature and is limited to
55patterns of at most 32 characters.   As a backtracking implementation,
56pcregrep supports more regular expression features, but is not
57competitive in performance in any example we tested.
58
59We performed our SSE2 performance study using an Intel Core i5-4570 (Haswell) processor
60(3.2 GHz, 4 physical cores, 32+32 kB (per core) L1 cache, 256 kB (per core) L2 cache, 6 MB L3 cache)
61running the 64-bit version of Ubuntu 12.04 (Linux).
62
63
64\paragraph{Test expressions.}
65Each grep implementation is tested with the five regular expressions
66in Table \ref{RegularExpressions}.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
67HexBytes matches delimited. They are taken from Benchmark of Regex
68Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml].
69HexBytes matches delimited byte strings in hexadecimal notation,
70enforcing the constraint that the number of hex digits is even.   This
71expression shows performance of a repetition operator implemented with
72a while loop.
73All tests are run on a concatenated version of the linux howto which is 39,422,105 bytes in length. 
74
75
76\paragraph{Results.}
77\begin{figure}
78\begin{center}
79\begin{tikzpicture}
80\begin{axis}[
81xtick=data,
82ylabel=Cycles per Byte,
83xticklabels={@,Date,Email,URIorEmail,HexBytes},
84tick label style={font=\tiny},
85enlarge x limits=0.15,
86%enlarge y limits={0.15, upper},
87ymin=0,
88legend style={at={(0.5,-0.15)},
89anchor=north,legend columns=-1},
90ymax=42,
91ybar,
92bar width=7pt,
93%visualization depends on=y \as \rawy,
94]
95\addplot
96file {data/cycles-bitstreams.dat};
97\addplot
98file {data/cycles-nrgrep112.dat};
99\addplot
100file {data/cycles-gre2p.dat};
101 
102\legend{bitstreams,nrgrep,gre2p,Annot}
103\end{axis}
104\end{tikzpicture}
105\end{center}
106\caption{Cycles per Byte}\label{fig:SSECyclesPerByte}
107\end{figure}
108
109Figure \ref{fig:SSECyclesPerByte} shows CPU cycles per input byte for
110each expression on each implementation.
111Both the bitstreams and the nrgrep implementation performed
112much better than the DFA-based gre2p, with the bitstreams
113implementation
114consistently an order of magnitude faster.
115
116For the Date expression, nrgrep outperforms our bitstreams
117implementation here.  In this case, nrgrep takes advantage of skipping:
118every time it encounters a character that can't appear in a date, it
119can skip past six characters. 
120For the more compicated expressions, it loses this advantage.
121
122The bitstreams implementation has fairly consistent performance.  As
123the regular expression complexity increases, the cycle count increases
124slowly.  The largest difference in cycles per byte for the bitstreams
125implementation is a ratio of 2 to 1.  The same cannot be said of gre2p
126or nrgrep, with gre2p exhibiting a 4 to 1 range over these expressions
127and
128nrgrep exhibiting a 10 to 1 range.
129
130\begin{figure}
131\begin{center}
132\begin{tikzpicture}
133\begin{axis}[
134xtick=data,
135ylabel=Instructions per Byte,
136xticklabels={@,Date,Email,URIorEmail,HexBytes},
137tick label style={font=\tiny},
138enlarge x limits=0.15,
139%enlarge y limits={0.15, upper},
140ymin=0,
141legend style={at={(0.5,-0.15)},
142anchor=north,legend columns=-1},
143ymax=110,
144ybar,
145bar width=7pt,
146]
147\addplot
148file {data/instructions-bitstreams.dat};
149\addplot
150file {data/instructions-nrgrep112.dat};
151\addplot
152file {data/instructions-gre2p.dat};
153 
154\legend{bitstreams,nrgrep,gre2p,Annot}
155\end{axis}
156\end{tikzpicture}
157\end{center}
158\caption{Instructions per Byte}\label{fig:SSEInstructionsPerByte}
159\end{figure}
160
161Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.
162The relative values mirror cycles per byte.  The bitstreams
163implementation continues to show consistent performance.  This is
164especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle},
165which shows instructions per cycle.  Both the bitstreams and
166gre2p implementations show relatively stable IPC characteristics,
167while nrgrep exhibits considerably more variation.
168 
169
170\begin{figure}
171\begin{center}
172\begin{tikzpicture}
173\begin{axis}[
174xtick=data,
175ylabel=Instructions per Cycle,
176xticklabels={@,Date,Email,URIorEmail,HexBytes},
177tick label style={font=\tiny},
178enlarge x limits=0.15,
179%enlarge y limits={0.15, upper},
180ymin=0,
181legend style={at={(0.5,-0.15)},
182anchor=north,legend columns=-1},
183ybar,
184bar width=7pt,
185]
186\addplot
187file {data/ipc-bitstreams.dat};
188\addplot
189file {data/ipc-nrgrep112.dat};
190\addplot
191file {data/ipc-gre2p.dat};
192
193\legend{bitstreams,nrgrep,gre2p,Annot}
194\end{axis}
195\end{tikzpicture}
196\end{center}
197\caption{Instructions per Cycle}\label{fig:SSEInstructionsPerCycle}
198\end{figure}
199
200\begin{figure}
201\begin{center}
202\begin{tikzpicture}
203\begin{axis}[
204xtick=data,
205ylabel=Branch Misses per Instruction,
206xticklabels={@,Date,Email,URIorEmail,HexBytes},
207tick label style={font=\tiny},
208enlarge x limits=0.15,
209%enlarge y limits={0.15, upper},
210ymin=0,
211legend style={at={(0.5,-0.15)},
212anchor=north,legend columns=-1},
213ybar,
214bar width=7pt,
215]
216\addplot
217file {data/branch-misses-bitstreams.dat};
218\addplot
219file {data/branch-misses-nrgrep112.dat};
220\addplot
221file {data/branch-misses-gre2p.dat};
222
223\legend{bitstreams,nrgrep,gre2p,Annot}
224\end{axis}
225\end{tikzpicture}
226\end{center}
227\caption{Branch Misses per Instruction}\label{fig:SSEBranchMisses}
228\end{figure}
229Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.
230The bitstreams and gre2p implementations remains consistent here.
231Significant variation is seen with nrgrep.
232
233Overall, our performance is considerably better than both nrgrep and gre2p for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
234
235
Note: See TracBrowser for help on using the repository browser.