source: docs/Working/re/ppopp-re.tex @ 3464

Last change on this file since 3464 was 3464, checked in by cameron, 6 years ago

Intro outline

File size: 9.1 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass{sigplanconf}
24
25% The following \documentclass options may be useful:
26
27% preprint      Remove this option only once the paper is in final form.
28% 10pt          To set in 10-point type instead of 9-point.
29% 11pt          To set in 11-point type instead of 9-point.
30% authoryear    To obtain author/year citation style instead of numeric.
31
32\usepackage{amsmath}
33
34
35\begin{document}
36
37\special{papersize=8.5in,11in}
38\setlength{\pdfpageheight}{\paperheight}
39\setlength{\pdfpagewidth}{\paperwidth}
40
41\conferenceinfo{PPoPP 2014}{February 15-19, 2013, Orland, Florida, United States} 
42\copyrightyear{2013} 
43\copyrightdata{978-1-nnnn-nnnn-n/yy/mm} 
44\doi{nnnnnnn.nnnnnnn}
45
46% Uncomment one of the following two, if you are not going for the
47% traditional copyright transfer agreement.
48
49%\exclusivelicense                % ACM gets exclusive license to publish,
50                                  % you retain copyright
51
52%\permissiontopublish             % ACM gets nonexclusive license to publish
53                                  % (paid open-access papers,
54                                  % short abstracts)
55
56\titlebanner{banner above paper title}        % These are ignored unless
57\preprintfooter{short description of paper}   % 'preprint' option specified.
58
59\title{Bitwise Data Parallelism in Regular Expression Matching}
60\subtitle{Subtitle Text, if any}
61
62\authorinfo{Robert D. Cameron \and Kenneth S. Herdy \and Dan Lin \and Meng Lin \and Ben Hull \and Thomas S. Shermer \and Arrvindh Shriraman}
63           {Simon Fraser University}
64           {\{cameron,ksherdy,lindanl,linmengl,bhull,shermer,ashriram\}@cs.sfu.ca}
65
66\maketitle
67
68\begin{abstract}
69\input{abstract}
70\end{abstract}
71\category{Theory of computation}{Formal languages and automata theory}{Regular languages}
72\category{Computer systems organization}{Parallel architectures}{Single instruction, multiple data}
73
74% general terms are not compulsory anymore,
75% you may leave them out
76%\terms
77%term1, term2
78
79\keywords
80regular expression matching, grep, parallel bit stream technology
81
82\section{Introduction}
83
84The use of regular expressions to search texts for occurrences
85of string patterns has a long history and
86remains a pervasive technique throughout computing applications today.
87{\em a brief history}
88
89There has been considerable interest in using parallelization techniques
90to improve the performance of regular expression matching.
91{\em a brief review}
92
93Whereas the existing approaches to parallelization have been
94based on adapting traditional sequential algorithms to emergent
95parallel architectures, we introduce both a new algorithmic
96approach and its implementation on SIMD and GPU architectures.
97This approach relies on a bitwise data parallel view of text
98streams as well as a surprising use of addition to implement
99matching operations.   The closest previous work is that
100underlying bit-parallel XML parsing using 128-bit SSE2 SIMD
101technology together with a parallel scanning primitive also
102based on addition \cite{cameron2011parallel}.   
103However, in contrast to the deterministic, longest-match
104scanning associated with the ScanThru primitive of that
105work, we introduce here a new primitive MatchStar
106that can be used in full generality for nondeterministic
107regular expression matching.   We also introduce a long-stream
108addition technique involving a further application of MatchStar
109that enables us to scale the technique to $n$-bit addition
110in $log_{64} n$ steps.
111
112There is also a strong keyword match between the bit-parallel
113data streams used in our approach and the bit-parallelism of
114regular expression using the bitap and Wu-Manber NFA
115(nondeterministic finite automata) algorithms \cite{}.
116However those algorithms use bit-parallelism in a fundamentally
117different way: to represent all possible current NFA states
118as a bit vector and to perform byte-at-a-time transitions
119using the bitwise operations.   Nevertheless, the agrep and
120nrgrep programs implemented using these techniques remain
121among the strongest competitors in overall performance
122to our implementations.
123
124
125
126The remainder of this paper is organized as follows.
127Section \ref{sec:bitwise} 
128Section \ref{sec:regexp} 
129Section \ref{sec:analysis} 
130Section \ref{sec:SSE2} 
131Section \ref{sec:AVX2} 
132Section \ref{sec:GPU} 
133Section \ref{sec:Concl} concludes the paper with a discussion of areas for future work.
134
135
136\section{Unbounded Bitwise Data Parallelism}\label{sec:bitwise}
137
138Whereas all traditional regular expression matching engines are
139fundamentally based on an element-at-a-time processing model,
140the approach we introduce in this paper is based on quite a
141different conceptual model: processing all elements of an input data
142stream simultaneously.    Depending on the available parallel
143processing resources, an actual implementation may divide an
144input stream into segments and process the segments
145sequentially.   Within each segment, all elements of the input
146stream are processed in parallel.   The approach is inherently
147scalable and can be adapted to a variety of parallel architectures.
148The fundamental primitives required are bitwise logic and
149long-stream addition. 
150
151A key concept in this approach is the derivation of bit streams
152that are parallel to the input data stream, i.e., in one-to-one
153correspondence with the data element positions of the input
154streams.   Typically, the input stream is a byte stream comprising
155the 8-bit character code units of a particular encoding such
156as extended ASCII, ISO-8859-1 or UTF-8.   However, the method may also
157easily be used with wider code units such as the 16-bit code units of
158UTF-16.   In the case of a byte stream, the first step is to transpose
159the byte stream into eight parallel bit streams, such that bit stream
160$i$ comprises the $i$\th bit of each byte.   These streams form
161a set of basis bit streams from which many other parallel bit
162streams can be calculated, such as character class bit
163streams such that each bit $j$ of the stream specifies
164whether character $j$ of the input stream is in the class
165or not.   {\bf perhaps a figure here.}
166
167Using the SIMD capabilities of commodity processors,
168the basis bit streams and character class bit streams may be
169efficiently computed using the Parabix framework that
170has been developed and used extensively for high-performance
171XML parsing \cite{}.  For example, working with the 128-bit
172SIMD registers on commodity Intel and AMD processors,
173the Parabix framework can be employed to process input
174streams in blocks of 128 bytes at a time.   However, the
175general model is not dependent on any particular data block size
176and can be scaled for arbitrary sized blocks depending on
177available resources.   
178
179\section{Bitwise Regular Expression Matching with MatchStar}\label{sec:regexp}
180
181
182\subsection{Bitwise Data Parallelism vs. Bit-Parallel State Transition}
183The bitap algorithms use bitwise operations in an orthogonal way.
184
185\section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis}
186
187\begin{enumerate}
188\item Operations
189\item Memory behaviour per input byte: note tables of DFA/NFA.
190
191Bille and Throup \em{Faster regular expression matching}\cite{bille2009faster}
192
193\end{enumerate}
194
195
196
197\section{Commodity SIMD Implementation and Experimental Evaluation}\label{sec:SSE2}
198\subsection{Implementation Notes}
199\subsection{Evaluation Methodology}
200\subsection{Comparison}
201
202
203
204\section{SIMD Scalability}\label{sec:AVX2}
205\subsection{AVX Stream Addition}
206  We use MatchStar for carry propagation.
207
208\section{GPU Implementation}\label{sec:GPU}
209
210\section{Miscellaneous}
211\subsection{Skipping}
212\subsection{Unicode}
213
214\section{Conclusion}\label{sec:Concl}
215\subsection{Contributions}
216\begin{enumerate}
217\item New Algorithm Class for Regular Expression Matching
218\item MatchStar for Character Class Repetition
219\item Long Stream Addition
220\item Implementations showing performance and scalability
221\end{enumerate}
222\subsection{Future Work}
223\begin{enumerate}
224\item Substring capture
225\item Unicode character classes
226\item Nonregular regexp features: zero-width assertions, backreferences
227\item Multicore for ruleset parallelism
228\end{enumerate}
229
230
231
232%\appendix
233%\section{Appendix Title}
234
235%This is the text of the appendix, if you need one.
236
237\acks
238
239This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and
240MITACS, Inc.
241
242% We recommend abbrvnat bibliography style.
243
244\bibliographystyle{abbrvnat}
245
246% The bibliography should be embedded for final submission.
247 
248\bibliography{reference}
249
250%\begin{thebibliography}{}
251%\softraggedright
252
253%\bibitem[Smith et~al.(2009)Smith, Jones]{smith02}
254%P. Q. Smith, and X. Y. Jones. ...reference text...
255%
256%\end{thebibliography}
257
258
259\end{document}
260
261
Note: See TracBrowser for help on using the repository browser.