source: docs/Working/re/sse2.tex @ 3642

Last change on this file since 3642 was 3642, checked in by cameron, 5 years ago

Substitute gre2p for grep; cite GPU long-stream add; remove excess figures

File size: 8.1 KB
Line 
1\section{SSE2 Implementation and Evaluation}\label{sec:SSE2}
2
3\begin{table*}[htbp]
4\begin{center}
5{
6\footnotesize
7\begin{tabular}{|l|l|}
8\hline
9Name            & Expression    \\ \hline   
10@               & \verb`@`              \\ \hline     
11Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
12Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
13URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
14HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
15\end{tabular}
16}
17\end{center}
18\caption{Regular Expressions}
19\label{RegularExpressions}
20\end{table*}
21
22
23\paragraph{Implementation Notes.}
24Our regular expression compiler directly uses the Parabix tool chain
25to compile regular expression into SSE2-based implementations.
26Our compiler essentially scripts three other compilers to perform
27this work: the Parabix character class compiler to determine basic
28bit stream equations for each of the character classes encountered
29in a regular expression, the Pablo bitstream equation compiler which
30converts equations to block-at-a-time C++ code for 128-bit SIMD, and
31gcc 4.6.3 to generate the binaries.   The Pablo output is combined
32with a {\tt grep\_template.cpp} file that arranges to read input
33files, break them into segments, and print out or count matches
34as they are encountered.
35
36\paragraph{Comparative Implementations.}
37We evaluate our bitwise data parallel implementation versus several
38alternatives.   
39We report data for two of these:
40gre2p
41%GNU grep version 2.10
42and nrgrep version 1.12.
43The gre2p program is a grep version implemented using the recently developed
44RE2 regular expression library, using a systematic DFA-based approach
45(as well as some NFA fallback techniques) \cite{cox2010RE2}.
46The NFA class is represented by nrgrep, one of the
47strongest competitors in regular expression matching performance.
48We also considered GNU grep 2.10, agrep 3.41 as and alternative NFA-based implementation
49and pcregrep 8.12 as a backtracking implementation, but do not
50report data for them. 
51GNU grep is a popular open-source implementation that is claimed to be
52primarily DFA-based with
53heuristics for important special cases.   
54The agrep implementation does not support
55some of the common regular expression syntax feature and is limited to
56patterns of at most 32 characters.   As a backtracking implementation,
57pcregrep supports more regular expression features, but is not
58competitive in performance in any example we tested.
59
60We performed our SSE2 performance study using an Intel Core i7 quad-core (Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core), 32+32 kB (per core) L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
61
62
63\paragraph{Test expressions.}
64Each grep implementation is tested with the five regular expressions
65in Table \ref{RegularExpressions}.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
66HexBytes matches delimited. They are taken from Benchmark of Regex
67Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml].
68HexBytes matches delimited byte strings in hexadecimal notation,
69enforcing the constraint that the number of hex digits is even.   This
70expression shows performance of a repetition operator implemented with
71a while loop.
72All tests are run on a concatenated version of the linux howto which is 39,422,105 bytes in length. 
73
74
75\paragraph{Results.}
76\begin{figure}
77\begin{center}
78\begin{tikzpicture}
79\begin{axis}[
80xtick=data,
81ylabel=Cycles per Byte,
82xticklabels={@,Date,Email,URIorEmail,HexBytes},
83tick label style={font=\tiny},
84enlarge x limits=0.15,
85%enlarge y limits={0.15, upper},
86ymin=0,
87legend style={at={(0.5,-0.15)},
88anchor=north,legend columns=-1},
89ymax=12,
90ybar,
91bar width=7pt,
92%visualization depends on=y \as \rawy,
93]
94\addplot
95file {data/cycles1.dat};
96\addplot
97file {data/cycles2.dat};
98\addplot
99file {data/cycles3.dat};
100 
101\legend{Bitstreams,NRGrep,Gre2p,Annot}
102\end{axis}
103\end{tikzpicture}
104\end{center}
105\caption{Cycles per Byte}\label{fig:SSECyclesPerByte}
106\end{figure}
107
108Figure \ref{fig:SSECyclesPerByte} shows CPU cycles per input byte for each expression on each implementation.
109For the three most complicated expressions, the bitstreams implementation had the lowest cycle count, while grep
110was an order of magnitude slower.
111
112For the @ expression, GNU grep had very slightly better performance than the bitstreams implementation.  The bitstreams implementation has a fixed overhead
113for transposition that hurts relative performance for very simple expressions.
114
115For the Date expression, nrgrep is able to skip large portions of the input file since every time it encounters a character that can't appear in a date, it can skip past six characters.  For the more compicated expressions, it loses this advantage.
116
117The bitstreams implementation has fairly consistent performance.  As the regular expression complexity increases, the cycle count increases slowly.  The largest difference in cycles per byte for the bitstreams implementation is a ratio of 2 to 1.  The same cannot be said of grep or nrgrep.  The
118latter uses more than 10 times the cycles per byte for Xquote than for Date.  The number of cycles per byte that gGrep uses for URIOrEmail is almost 900 times as many as it uses for @.
119
120\begin{figure}
121\begin{center}
122\begin{tikzpicture}
123\begin{axis}[
124xtick=data,
125ylabel=Instructions per Byte,
126xticklabels={@,Date,Email,URIorEmail,HexBytes},
127tick label style={font=\tiny},
128enlarge x limits=0.15,
129%enlarge y limits={0.15, upper},
130ymin=0,
131legend style={at={(0.5,-0.15)},
132anchor=north,legend columns=-1},
133ymax=25,
134ybar,
135bar width=7pt,
136]
137\addplot
138file {data/instructions1.dat};
139\addplot
140file {data/instructions2.dat};
141\addplot
142file {data/instructions3.dat};
143 
144\legend{Bitstreams,NRGrep,Gre2p,Annot}
145\end{axis}
146\end{tikzpicture}
147\end{center}
148\caption{Instructions per Byte}\label{fig:SSEInstructionsPerByte}
149\end{figure}
150
151Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.
152The relative values mirror cycles per byte.  The bitstreams
153implementation continues to show consistent performance.  This is
154especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle},
155which shows instructions per cycle.  The bitstreams implementation has
156almost no variation in the instructions per cycle.  Both grep and
157nrGrep have considerably more variation based on the input regular
158expression.   
159 
160
161\begin{figure}
162\begin{center}
163\begin{tikzpicture}
164\begin{axis}[
165xtick=data,
166ylabel=Instructions per Cycle,
167xticklabels={@,Date,Email,URIorEmail,HexBytes},
168tick label style={font=\tiny},
169enlarge x limits=0.15,
170%enlarge y limits={0.15, upper},
171ymin=0,
172legend style={at={(0.5,-0.15)},
173anchor=north,legend columns=-1},
174ybar,
175bar width=7pt,
176]
177\addplot
178file {data/ipc1.dat};
179\addplot
180file {data/ipc2.dat};
181\addplot
182file {data/ipc3.dat};
183
184\legend{Bitstreams,NRGrep,Gre2p,Annot}
185\end{axis}
186\end{tikzpicture}
187\end{center}
188\caption{Instructions per Cycle}\label{fig:SSEInstructionsPerCycle}
189\end{figure}
190
191\begin{figure}
192\begin{center}
193\begin{tikzpicture}
194\begin{axis}[
195xtick=data,
196ylabel=Branch Misses per Instruction,
197xticklabels={@,Date,Email,URIorEmail,HexBytes},
198tick label style={font=\tiny},
199enlarge x limits=0.15,
200%enlarge y limits={0.15, upper},
201ymin=0,
202legend style={at={(0.5,-0.15)},
203anchor=north,legend columns=-1},
204ybar,
205bar width=7pt,
206]
207\addplot
208file {data/branch-misses1.dat};
209\addplot
210file {data/branch-misses2.dat};
211\addplot
212file {data/branch-misses3.dat};
213
214\legend{Bitstreams,NRGrep,Gre2p,Annot}
215\end{axis}
216\end{tikzpicture}
217\end{center}
218\caption{Branch Misses per Instruction}\label{fig:SSEBranchMisses}
219\end{figure}
220Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.  The bitstreams implementation remains consistent here.  Each of nrgrep and grep have branch miss rates that vary significantly with different regular expressions. 
221
222Overall, our performance is considerably better than both nrgrep and grep for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
223
224
Note: See TracBrowser for help on using the repository browser.