source: docs/Working/re/ppopp-re.tex @ 3466

Last change on this file since 3466 was 3466, checked in by cameron, 6 years ago

Updates

File size: 9.7 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass{sigplanconf}
24
25% The following \documentclass options may be useful:
26
27% preprint      Remove this option only once the paper is in final form.
28% 10pt          To set in 10-point type instead of 9-point.
29% 11pt          To set in 11-point type instead of 9-point.
30% authoryear    To obtain author/year citation style instead of numeric.
31
32\usepackage{amsmath}
33
34
35\begin{document}
36
37\special{papersize=8.5in,11in}
38\setlength{\pdfpageheight}{\paperheight}
39\setlength{\pdfpagewidth}{\paperwidth}
40
41\conferenceinfo{PPoPP 2014}{February 15-19, 2013, Orland, Florida, United States} 
42\copyrightyear{2013} 
43\copyrightdata{978-1-nnnn-nnnn-n/yy/mm} 
44\doi{nnnnnnn.nnnnnnn}
45
46% Uncomment one of the following two, if you are not going for the
47% traditional copyright transfer agreement.
48
49%\exclusivelicense                % ACM gets exclusive license to publish,
50                                  % you retain copyright
51
52%\permissiontopublish             % ACM gets nonexclusive license to publish
53                                  % (paid open-access papers,
54                                  % short abstracts)
55
56\titlebanner{banner above paper title}        % These are ignored unless
57\preprintfooter{short description of paper}   % 'preprint' option specified.
58
59\title{Bitwise Data Parallelism in Regular Expression Matching}
60\subtitle{Subtitle Text, if any}
61
62\authorinfo{Robert D. Cameron \and Kenneth S. Herdy \and Dan Lin \and Meng Lin \and Ben Hull \and Thomas S. Shermer \and Arrvindh Shriraman}
63           {Simon Fraser University}
64           {\{cameron,ksherdy,lindanl,linmengl,bhull,shermer,ashriram\}@cs.sfu.ca}
65
66\maketitle
67
68\begin{abstract}
69\input{abstract}
70\end{abstract}
71\category{Theory of computation}{Formal languages and automata theory}{Regular languages}
72\category{Computer systems organization}{Parallel architectures}{Single instruction, multiple data}
73
74% general terms are not compulsory anymore,
75% you may leave them out
76%\terms
77%term1, term2
78
79\keywords
80regular expression matching, grep, parallel bit stream technology
81
82\section{Introduction}
83
84The use of regular expressions to search texts for occurrences
85of string patterns has a long history and
86remains a pervasive technique throughout computing applications today.
87{\em a brief history}
88
89There has been considerable interest in using parallelization techniques
90to improve the performance of regular expression matching.
91{\em a brief review}
92
93Whereas the existing approaches to parallelization have been
94based on adapting traditional sequential algorithms to emergent
95parallel architectures, we introduce both a new algorithmic
96approach and its implementation on SIMD and GPU architectures.
97This approach relies on a bitwise data parallel view of text
98streams as well as a surprising use of addition to match
99runs of characters in a single step.  The closest previous
100work is that underlying bit-parallel XML parsing using 128-bit SSE2 SIMD
101technology together with a parallel scanning primitive also
102based on addition \cite{cameron2011parallel}.   
103However, in contrast to the deterministic, longest-match
104scanning associated with the ScanThru primitive of that
105work, we introduce here a new primitive MatchStar
106that can be used in full generality for nondeterministic
107regular expression matching.   We also introduce a long-stream
108addition technique involving a further application of MatchStar
109that enables us to scale the technique to $n$-bit addition
110in $log_{64} n$ steps.   We ultimately apply this technique,
111for example, to perform
112synchronized 4096-bit addition on GPU warps of 64 threads.
113
114There is also a strong keyword match between the bit-parallel
115data streams used in our approach and the bit-parallelism
116used for NFA state transitions in the classical algorithms of
117Wu and Manber \cite{wu1992agrep}, Baez-Yates and Gonnet \cite{baeza1992new}
118and Navarro and Raffinot \cite{navarro1998bit}.
119However those algorithms use bit-parallelism in a fundamentally
120different way: representing all possible current NFA states
121as a bit vector and performing parallel transitions to a new
122set of states using table lookups and bitwise logic.    Whereas
123our approach can match multiple characters per step, bit-parallel
124NFA algorithms proceed through the input one byte at a time.
125Nevertheless, the agrep \cite{wu1992agrep} and
126nrgrep \cite{navarro2000} programs implemented using these techniques remain
127among the strongest competitors in regular expression matching
128performance, so we include them in our comparative evaluation.
129
130
131The remainder of this paper is organized as follows.
132Section \ref{sec:unbounded} presents our basic algorithm using
133a model of unbounded bitwise data parallelism. 
134Section \ref{sec:regexp} 
135Section \ref{sec:analysis} 
136Section \ref{sec:SSE2} 
137Section \ref{sec:AVX2} 
138Section \ref{sec:GPU} 
139Section \ref{sec:Concl} concludes the paper with a discussion of areas for future work.
140
141
142\section{An Unbounded Model for Bitwise Regular Expression Matching}\label{sec:unbounded}
143
144Whereas all traditional regular expression matching engines are
145fundamentally based on an element-at-a-time processing model,
146the approach we introduce in this paper is based on quite a
147different conceptual model: processing all elements of an input data
148stream simultaneously.    Depending on the available parallel
149processing resources, an actual implementation may divide an
150input stream into segments and process the segments
151sequentially.   Within each segment, all elements of the input
152stream are processed in parallel.   The approach is inherently
153scalable and can be adapted to a variety of parallel architectures.
154The fundamental primitives required are bitwise logic and
155long-stream addition. 
156
157A key concept in this approach is the derivation of bit streams
158that are parallel to the input data stream, i.e., in one-to-one
159correspondence with the data element positions of the input
160streams.   Typically, the input stream is a byte stream comprising
161the 8-bit character code units of a particular encoding such
162as extended ASCII, ISO-8859-1 or UTF-8.   However, the method may also
163easily be used with wider code units such as the 16-bit code units of
164UTF-16.   In the case of a byte stream, the first step is to transpose
165the byte stream into eight parallel bit streams, such that bit stream
166$i$ comprises the $i^\text{th}$ bit of each byte.   These streams form
167a set of basis bit streams from which many other parallel bit
168streams can be calculated, such as character class bit
169streams such that each bit $j$ of the stream specifies
170whether character $j$ of the input stream is in the class
171or not.   {\bf perhaps a figure here.}
172
173Using the SIMD capabilities of commodity processors,
174the basis bit streams and character class bit streams may be
175efficiently computed using the Parabix framework that
176has been developed and used extensively for high-performance
177XML parsing \cite{}.  For example, working with the 128-bit
178SIMD registers on commodity Intel and AMD processors,
179the Parabix framework can be employed to process input
180streams in blocks of 128 bytes at a time.   However, the
181general model is not dependent on any particular data block size
182and can be scaled for arbitrary sized blocks depending on
183available resources.   
184
185\section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp}
186
187
188\subsection{Bitwise Data Parallelism vs. Bit-Parallel State Transition}
189The bitap algorithms use bitwise operations in an orthogonal way.
190
191\section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis}
192
193\begin{enumerate}
194\item Operations
195\item Memory behaviour per input byte: note tables of DFA/NFA.
196
197Bille and Throup \em{Faster regular expression matching}\cite{bille2009faster}
198
199\end{enumerate}
200
201
202
203\section{Commodity SIMD Implementation and Experimental Evaluation}\label{sec:SSE2}
204\subsection{Implementation Notes}
205\subsection{Evaluation Methodology}
206\subsection{Comparison}
207
208
209
210\section{SIMD Scalability}\label{sec:AVX2}
211\subsection{AVX Stream Addition}
212  We use MatchStar for carry propagation.
213
214\section{GPU Implementation}\label{sec:GPU}
215
216\section{Miscellaneous}
217\subsection{Skipping}
218\subsection{Unicode}
219
220\section{Conclusion}\label{sec:Concl}
221\subsection{Contributions}
222\begin{enumerate}
223\item New Algorithm Class for Regular Expression Matching
224\item MatchStar for Character Class Repetition
225\item Long Stream Addition
226\item Implementations showing performance and scalability
227\end{enumerate}
228\subsection{Future Work}
229\begin{enumerate}
230\item Substring capture
231\item Unicode character classes
232\item Nonregular regexp features: zero-width assertions, backreferences
233\item Multicore for ruleset parallelism
234\end{enumerate}
235
236
237
238%\appendix
239%\section{Appendix Title}
240
241%This is the text of the appendix, if you need one.
242
243\acks
244
245This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and
246MITACS, Inc.
247
248% We recommend abbrvnat bibliography style.
249
250\bibliographystyle{abbrvnat}
251
252% The bibliography should be embedded for final submission.
253 
254\bibliography{reference}
255
256%\begin{thebibliography}{}
257%\softraggedright
258
259%\bibitem[Smith et~al.(2009)Smith, Jones]{smith02}
260%P. Q. Smith, and X. Y. Jones. ...reference text...
261%
262%\end{thebibliography}
263
264
265\end{document}
266
267
Note: See TracBrowser for help on using the repository browser.