source: docs/Working/re/sse2.tex @ 3637

Last change on this file since 3637 was 3637, checked in by cameron, 5 years ago

Replace xquote with HexBytes?; various cleanups

File size: 7.9 KB
Line 
1\section{SSE2 Implementation and Evaluation}\label{sec:SSE2}
2
3\paragraph{Implementation Notes.}
4Our regular expression compiler directly uses the Parabix tool chain
5to compile regular expression into SSE2-based implementations.
6Our compiler essentially scripts three other compilers to perform
7this work: the Parabix character class compiler to determine basic
8bit stream equations for each of the character classes encountered
9in a regular expression, the Pablo bitstream equation compiler which
10converts equations to block-at-a-time C++ code for 128-bit SIMD, and
11gcc 4.6.3 to generate the binaries.   The Pablo output is combined
12with a {\tt grep\_template.cpp} file that arranges to read input
13files, break them into segments, and print out or count matches
14as they are encountered.
15
16\paragraph{Comparative Implementations.}
17We evaluate our bitwise data parallel implementation versus several alternatives.   We report data for two of these: GNU grep version 2.10
18and nrgrep version 1.12. GNU grep is a popular open-source grep implementation that uses DFAs,
19as well as heuristics for important special cases.
20The NFA class is represented by nrgrep, one of the
21strongest competitors in regular expression matching performance.
22We also considered agrep 3.41 as and alternative NFA-based implementation
23and pcregrep 8.12 as a backtracking implementation, but do not
24report data for them.  The agrep implementation does not support
25some of the common regular expression syntax feature and is limited to
26patterns of at most 32 characters.   As a backtracking implementation,
27pcregrep supports more regular expression features, but is not
28competitive in performance in any example we tested.
29
30We performed our SSE2 performance study using an Intel Core i7 quad-core (Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core), 32+32 kB (per core) L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
31
32\begin{table*}[htbp]
33\begin{center}
34{
35\footnotesize
36\begin{tabular}{|l|l|}
37\hline
38Name            & Expression    \\ \hline   
39@               & \verb`@`              \\ \hline     
40Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
41Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
42URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
43HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
44\end{tabular}
45}
46\end{center}
47\caption{Regular Expressions}
48\label{RegularExpressions}
49\end{table*}
50
51
52\paragraph{Test expressions.}
53Each grep implementation is tested with the five regular expressions
54in Table \ref{RegularExpressions}.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
55HexBytes matches delimited. They are taken from Benchmark of Regex
56Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml].
57HexBytes matches delimited byte strings in hexadecimal notation,
58enforcing the constraint that the number of hex digits is even.   This
59expression shows performance of a repetition operator implemented with
60a while loop.
61All tests are run on a concatenated version of the linux howto which is 39,422,105 bytes in length. 
62
63
64\paragraph{Results.}
65\begin{figure}
66\begin{center}
67\begin{tikzpicture}
68\begin{axis}[
69xtick=data,
70ylabel=Cycles per Byte,
71xticklabels={@,Date,Email,URIorEmail,HexBytes},
72tick label style={font=\tiny},
73enlarge x limits=0.15,
74%enlarge y limits={0.15, upper},
75ymin=0,
76legend style={at={(0.5,-0.15)},
77anchor=north,legend columns=-1},
78ymax=12,
79ybar,
80bar width=7pt,
81%visualization depends on=y \as \rawy,
82]
83\addplot
84file {data/cycles1.dat};
85\addplot
86file {data/cycles2.dat};
87\addplot
88file {data/cycles3.dat};
89 
90\legend{Bitstreams,NRGrep,Grep,Annot}
91\end{axis}
92\end{tikzpicture}
93\end{center}
94\caption{Cycles per Byte}\label{fig:SSECyclesPerByte}
95\end{figure}
96
97Figure \ref{fig:SSECyclesPerByte} shows CPU cycles per input byte for each expression on each implementation.
98For the three most complicated expressions, the bitstreams implementation had the lowest cycle count, while grep
99was an order of magnitude slower.
100
101For the @ expression, GNU grep had very slightly better performance than the bitstreams implementation.  The bitstreams implementation has a fixed overhead
102for transposition that hurts relative performance for very simple expressions.
103
104For the Date expression, nrgrep is able to skip large portions of the input file since every time it encounters a character that can't appear in a date, it can skip past six characters.  For the more compicated expressions, it loses this advantage.
105
106The bitstreams implementation has fairly consistent performance.  As the regular expression complexity increases, the cycle count increases slowly.  The largest difference in cycles per byte for the bitstreams implementation is a ratio of 2 to 1.  The same cannot be said of grep or nrgrep.  The
107latter uses more than 10 times the cycles per byte for Xquote than for Date.  The number of cycles per byte that gGrep uses for URIOrEmail is almost 900 times as many as it uses for @.
108
109\begin{figure}
110\begin{center}
111\begin{tikzpicture}
112\begin{axis}[
113xtick=data,
114ylabel=Instructions per Byte,
115xticklabels={@,Date,Email,URIorEmail,HexBytes},
116tick label style={font=\tiny},
117enlarge x limits=0.15,
118%enlarge y limits={0.15, upper},
119ymin=0,
120legend style={at={(0.5,-0.15)},
121anchor=north,legend columns=-1},
122ymax=25,
123ybar,
124bar width=7pt,
125]
126\addplot
127file {data/instructions1.dat};
128\addplot
129file {data/instructions2.dat};
130\addplot
131file {data/instructions3.dat};
132 
133\legend{Bitstreams,NRGrep,Grep,Annot}
134\end{axis}
135\end{tikzpicture}
136\end{center}
137\caption{Instructions per Byte}\label{fig:SSEInstructionsPerByte}
138\end{figure}
139
140Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.
141The relative values mirror cycles per byte.  The bitstreams
142implementation continues to show consistent performance.  This is
143especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle},
144which shows instructions per cycle.  The bitstreams implementation has
145almost no variation in the instructions per cycle.  Both grep and
146nrGrep have considerably more variation based on the input regular
147expression.   
148 
149
150\begin{figure}
151\begin{center}
152\begin{tikzpicture}
153\begin{axis}[
154xtick=data,
155ylabel=Instructions per Cycle,
156xticklabels={@,Date,Email,URIorEmail,HexBytes},
157tick label style={font=\tiny},
158enlarge x limits=0.15,
159%enlarge y limits={0.15, upper},
160ymin=0,
161legend style={at={(0.5,-0.15)},
162anchor=north,legend columns=-1},
163ybar,
164bar width=7pt,
165]
166\addplot
167file {data/ipc1.dat};
168\addplot
169file {data/ipc2.dat};
170\addplot
171file {data/ipc3.dat};
172
173\legend{Bitstreams,NRGrep,Grep,Annot}
174\end{axis}
175\end{tikzpicture}
176\end{center}
177\caption{Instructions per Cycle}\label{fig:SSEInstructionsPerCycle}
178\end{figure}
179
180\begin{figure}
181\begin{center}
182\begin{tikzpicture}
183\begin{axis}[
184xtick=data,
185ylabel=Branch Misses per Instruction,
186xticklabels={@,Date,Email,URIorEmail,HexBytes},
187tick label style={font=\tiny},
188enlarge x limits=0.15,
189%enlarge y limits={0.15, upper},
190ymin=0,
191legend style={at={(0.5,-0.15)},
192anchor=north,legend columns=-1},
193ybar,
194bar width=7pt,
195]
196\addplot
197file {data/branch-misses1.dat};
198\addplot
199file {data/branch-misses2.dat};
200\addplot
201file {data/branch-misses3.dat};
202
203\legend{Bitstreams,NRGrep,Grep,Annot}
204\end{axis}
205\end{tikzpicture}
206\end{center}
207\caption{Branch Misses per Instruction}\label{fig:SSEBranchMisses}
208\end{figure}
209Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.  The bitstreams implementation remains consistent here.  Each of nrgrep and grep have branch miss rates that vary significantly with different regular expressions. 
210
211Overall, our performance is considerably better than both nrgrep and grep for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
212
213
Note: See TracBrowser for help on using the repository browser.