1 | %----------------------------------------------------------------------------- |
---|
2 | % |
---|
3 | % Template for sigplanconf LaTeX Class |
---|
4 | % |
---|
5 | % Name: sigplanconf-template.tex |
---|
6 | % |
---|
7 | % Purpose: A template for sigplanconf.cls, which is a LaTeX 2e class |
---|
8 | % file for SIGPLAN conference proceedings. |
---|
9 | % |
---|
10 | % Guide: Refer to "Author's Guide to the ACM SIGPLAN Class," |
---|
11 | % sigplanconf-guide.pdf |
---|
12 | % |
---|
13 | % Author: Paul C. Anagnostopoulos |
---|
14 | % Windfall Software |
---|
15 | % 978 371-2316 |
---|
16 | % paul@windfall.com |
---|
17 | % |
---|
18 | % Created: 15 February 2005 |
---|
19 | % |
---|
20 | %----------------------------------------------------------------------------- |
---|
21 | |
---|
22 | |
---|
23 | \documentclass{sigplanconf} |
---|
24 | |
---|
25 | % The following \documentclass options may be useful: |
---|
26 | |
---|
27 | % preprint Remove this option only once the paper is in final form. |
---|
28 | % 10pt To set in 10-point type instead of 9-point. |
---|
29 | % 11pt To set in 11-point type instead of 9-point. |
---|
30 | % authoryear To obtain author/year citation style instead of numeric. |
---|
31 | |
---|
32 | \usepackage{amsmath} |
---|
33 | |
---|
34 | |
---|
35 | \begin{document} |
---|
36 | |
---|
37 | \special{papersize=8.5in,11in} |
---|
38 | \setlength{\pdfpageheight}{\paperheight} |
---|
39 | \setlength{\pdfpagewidth}{\paperwidth} |
---|
40 | |
---|
41 | \conferenceinfo{PPoPP 2014}{February 15-19, 2013, Orland, Florida, United States} |
---|
42 | \copyrightyear{2013} |
---|
43 | \copyrightdata{978-1-nnnn-nnnn-n/yy/mm} |
---|
44 | \doi{nnnnnnn.nnnnnnn} |
---|
45 | |
---|
46 | % Uncomment one of the following two, if you are not going for the |
---|
47 | % traditional copyright transfer agreement. |
---|
48 | |
---|
49 | %\exclusivelicense % ACM gets exclusive license to publish, |
---|
50 | % you retain copyright |
---|
51 | |
---|
52 | %\permissiontopublish % ACM gets nonexclusive license to publish |
---|
53 | % (paid open-access papers, |
---|
54 | % short abstracts) |
---|
55 | |
---|
56 | \titlebanner{banner above paper title} % These are ignored unless |
---|
57 | \preprintfooter{short description of paper} % 'preprint' option specified. |
---|
58 | |
---|
59 | \title{Bitwise Data Parallelism in Regular Expression Matching} |
---|
60 | \subtitle{Subtitle Text, if any} |
---|
61 | |
---|
62 | \authorinfo{Robert D. Cameron \and Kenneth S. Herdy \and Dan Lin \and Meng Lin \and Ben Hull \and Thomas S. Shermer \and Arrvindh Shriraman} |
---|
63 | {Simon Fraser University} |
---|
64 | {\{cameron,ksherdy,lindanl,linmengl,bhull,shermer,ashriram\}@cs.sfu.ca} |
---|
65 | |
---|
66 | \maketitle |
---|
67 | |
---|
68 | \begin{abstract} |
---|
69 | \input{abstract} |
---|
70 | \end{abstract} |
---|
71 | \category{Theory of computation}{Formal languages and automata theory}{Regular languages} |
---|
72 | \category{Computer systems organization}{Parallel architectures}{Single instruction, multiple data} |
---|
73 | |
---|
74 | % general terms are not compulsory anymore, |
---|
75 | % you may leave them out |
---|
76 | %\terms |
---|
77 | %term1, term2 |
---|
78 | |
---|
79 | \keywords |
---|
80 | regular expression matching, grep, parallel bit stream technology |
---|
81 | |
---|
82 | \section{Introduction} |
---|
83 | |
---|
84 | The use of regular expressions to search texts for occurrences |
---|
85 | of string patterns has a long history and |
---|
86 | remains a pervasive technique throughout computing applications today. |
---|
87 | {\em a brief history} |
---|
88 | |
---|
89 | There has been considerable interest in using parallelization techniques |
---|
90 | to improve the performance of regular expression matching. |
---|
91 | {\em a brief review} \cite{scarpazza2011top} |
---|
92 | GPUs\cite{lin2013accelerating} |
---|
93 | FPGAs\cite{lee2007high} |
---|
94 | SIMD\cite{salapura2012accelerating} |
---|
95 | Cell BE\cite{scarpazza2011top} |
---|
96 | |
---|
97 | |
---|
98 | |
---|
99 | Whereas the existing approaches to parallelization have been |
---|
100 | based on adapting traditional sequential algorithms to emergent |
---|
101 | parallel architectures, we introduce both a new algorithmic |
---|
102 | approach and its implementation on SIMD and GPU architectures. |
---|
103 | This approach relies on a bitwise data parallel view of text |
---|
104 | streams as well as a surprising use of addition to match |
---|
105 | runs of characters in a single step. The closest previous |
---|
106 | work is that underlying bit-parallel XML parsing using 128-bit SSE2 SIMD |
---|
107 | technology together with a parallel scanning primitive also |
---|
108 | based on addition \cite{cameron2011parallel}. |
---|
109 | However, in contrast to the deterministic, longest-match |
---|
110 | scanning associated with the ScanThru primitive of that |
---|
111 | work, we introduce here a new primitive MatchStar |
---|
112 | that can be used in full generality for nondeterministic |
---|
113 | regular expression matching. We also introduce a long-stream |
---|
114 | addition technique involving a further application of MatchStar |
---|
115 | that enables us to scale the technique to $n$-bit addition |
---|
116 | in $log_{64} n$ steps. We ultimately apply this technique, |
---|
117 | for example, to perform |
---|
118 | synchronized 4096-bit addition on GPU warps of 64 threads. |
---|
119 | |
---|
120 | There is also a strong keyword match between the bit-parallel |
---|
121 | data streams used in our approach and the bit-parallelism |
---|
122 | used for NFA state transitions in the classical algorithms of |
---|
123 | Wu and Manber \cite{wu1992agrep}, Baez-Yates and Gonnet \cite{baeza1992new} |
---|
124 | and Navarro and Raffinot \cite{navarro1998bit}. |
---|
125 | However those algorithms use bit-parallelism in a fundamentally |
---|
126 | different way: representing all possible current NFA states |
---|
127 | as a bit vector and performing parallel transitions to a new |
---|
128 | set of states using table lookups and bitwise logic. Whereas |
---|
129 | our approach can match multiple characters per step, bit-parallel |
---|
130 | NFA algorithms proceed through the input one byte at a time. |
---|
131 | Nevertheless, the agrep \cite{wu1992agrep} and |
---|
132 | nrgrep \cite{navarro2000} programs implemented using these techniques remain |
---|
133 | among the strongest competitors in regular expression matching |
---|
134 | performance, so we include them in our comparative evaluation. |
---|
135 | |
---|
136 | |
---|
137 | The remainder of this paper is organized as follows. |
---|
138 | Section \ref{sec:unbounded} presents our basic algorithm using |
---|
139 | a model of unbounded bitwise data parallelism. |
---|
140 | Section \ref{sec:regexp} |
---|
141 | Section \ref{sec:analysis} |
---|
142 | Section \ref{sec:SSE2} |
---|
143 | Section \ref{sec:AVX2} |
---|
144 | Section \ref{sec:GPU} |
---|
145 | Section \ref{sec:Concl} concludes the paper with a discussion of areas for future work. |
---|
146 | |
---|
147 | |
---|
148 | \section{An Unbounded Model for Bitwise Regular Expression Matching}\label{sec:unbounded} |
---|
149 | |
---|
150 | Whereas all traditional regular expression matching engines are |
---|
151 | fundamentally based on an element-at-a-time processing model, |
---|
152 | the approach we introduce in this paper is based on quite a |
---|
153 | different conceptual model: processing all elements of an input data |
---|
154 | stream simultaneously. Depending on the available parallel |
---|
155 | processing resources, an actual implementation may divide an |
---|
156 | input stream into segments and process the segments |
---|
157 | sequentially. Within each segment, all elements of the input |
---|
158 | stream are processed in parallel. The approach is inherently |
---|
159 | scalable and can be adapted to a variety of parallel architectures. |
---|
160 | The fundamental primitives required are bitwise logic and |
---|
161 | long-stream addition. |
---|
162 | |
---|
163 | A key concept in this approach is the derivation of bit streams |
---|
164 | that are parallel to the input data stream, i.e., in one-to-one |
---|
165 | correspondence with the data element positions of the input |
---|
166 | streams. Typically, the input stream is a byte stream comprising |
---|
167 | the 8-bit character code units of a particular encoding such |
---|
168 | as extended ASCII, ISO-8859-1 or UTF-8. However, the method may also |
---|
169 | easily be used with wider code units such as the 16-bit code units of |
---|
170 | UTF-16. In the case of a byte stream, the first step is to transpose |
---|
171 | the byte stream into eight parallel bit streams, such that bit stream |
---|
172 | $i$ comprises the $i^\text{th}$ bit of each byte. These streams form |
---|
173 | a set of basis bit streams from which many other parallel bit |
---|
174 | streams can be calculated, such as character class bit |
---|
175 | streams such that each bit $j$ of the stream specifies |
---|
176 | whether character $j$ of the input stream is in the class |
---|
177 | or not. {\bf perhaps a figure here.} |
---|
178 | |
---|
179 | Using the SIMD capabilities of commodity processors, |
---|
180 | the basis bit streams and character class bit streams may be |
---|
181 | efficiently computed using the Parabix framework that |
---|
182 | has been developed and used extensively for high-performance |
---|
183 | XML parsing \cite{}. For example, working with the 128-bit |
---|
184 | SIMD registers on commodity Intel and AMD processors, |
---|
185 | the Parabix framework can be employed to process input |
---|
186 | streams in blocks of 128 bytes at a time. However, the |
---|
187 | general model is not dependent on any particular data block size |
---|
188 | and can be scaled for arbitrary sized blocks depending on |
---|
189 | available resources. |
---|
190 | |
---|
191 | \section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp} |
---|
192 | |
---|
193 | |
---|
194 | \subsection{Bitwise Data Parallelism vs. Bit-Parallel State Transition} |
---|
195 | The bitap algorithms use bitwise operations in an orthogonal way. |
---|
196 | |
---|
197 | \section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis} |
---|
198 | |
---|
199 | \begin{enumerate} |
---|
200 | \item Operations |
---|
201 | \item Memory behaviour per input byte: note tables of DFA/NFA. |
---|
202 | |
---|
203 | Bille and Throup \em{Faster regular expression matching}\cite{bille2009faster} |
---|
204 | |
---|
205 | \end{enumerate} |
---|
206 | |
---|
207 | |
---|
208 | |
---|
209 | \section{Commodity SIMD Implementation and Experimental Evaluation}\label{sec:SSE2} |
---|
210 | \subsection{Implementation Notes} |
---|
211 | \subsection{Evaluation Methodology} |
---|
212 | \subsection{Comparison} |
---|
213 | |
---|
214 | |
---|
215 | |
---|
216 | \section{SIMD Scalability}\label{sec:AVX2} |
---|
217 | \subsection{AVX Stream Addition} |
---|
218 | We use MatchStar for carry propagation. |
---|
219 | |
---|
220 | \section{GPU Implementation}\label{sec:GPU} |
---|
221 | |
---|
222 | \section{Miscellaneous} |
---|
223 | \subsection{Skipping} |
---|
224 | \subsection{Unicode} |
---|
225 | |
---|
226 | \section{Conclusion}\label{sec:Concl} |
---|
227 | \subsection{Contributions} |
---|
228 | \begin{enumerate} |
---|
229 | \item New Algorithm Class for Regular Expression Matching |
---|
230 | \item MatchStar for Character Class Repetition |
---|
231 | \item Long Stream Addition |
---|
232 | \item Implementations showing performance and scalability |
---|
233 | \end{enumerate} |
---|
234 | \subsection{Future Work} |
---|
235 | \begin{enumerate} |
---|
236 | \item Substring capture |
---|
237 | \item Unicode character classes |
---|
238 | \item Nonregular regexp features: zero-width assertions, backreferences |
---|
239 | \item Multicore for ruleset parallelism |
---|
240 | \end{enumerate} |
---|
241 | |
---|
242 | |
---|
243 | |
---|
244 | %\appendix |
---|
245 | %\section{Appendix Title} |
---|
246 | |
---|
247 | %This is the text of the appendix, if you need one. |
---|
248 | |
---|
249 | \acks |
---|
250 | |
---|
251 | This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and |
---|
252 | MITACS, Inc. |
---|
253 | |
---|
254 | % We recommend abbrvnat bibliography style. |
---|
255 | |
---|
256 | \bibliographystyle{abbrvnat} |
---|
257 | |
---|
258 | % The bibliography should be embedded for final submission. |
---|
259 | |
---|
260 | \bibliography{reference} |
---|
261 | |
---|
262 | %\begin{thebibliography}{} |
---|
263 | %\softraggedright |
---|
264 | |
---|
265 | %\bibitem[Smith et~al.(2009)Smith, Jones]{smith02} |
---|
266 | %P. Q. Smith, and X. Y. Jones. ...reference text... |
---|
267 | % |
---|
268 | %\end{thebibliography} |
---|
269 | |
---|
270 | |
---|
271 | \end{document} |
---|
272 | |
---|
273 | |
---|