# Changeset 3881

Ignore:
Timestamp:
Jun 17, 2014, 10:27:39 PM (5 years ago)
Message:

Editted introduction.

Location:
docs/Working/re
Files:
2 edited

### Legend:

Unmodified
 r3880 \section{Introduction} The use of regular expressions to search texts for occurrences of string patterns has a long history and remains a pervasive technique throughout computing applications today. The use of regular expressions to search texts for patterns has a long history and remains an important technique. % {\em a brief history} The origins of regular expression matching date back to automata theory developed by Kleene in the 1950s \cite{kleene1951}. Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions %The origins of regular expression matching date back to automata theory %studied by Kleene in the 1950s \cite{kleene1951}. Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions to nondeterministic finite automata (NFA). Following Thompson's approach, a regular expression of length $m$ is converted to an NFA with $O(m)$ states. It is then possible to search a text of length $n$ using the NFA in worst case $O(mn)$ time. Often, a more efficient choice is to convert an NFA into a DFA. A DFA has a single active state at any time Following Thompson's approach, a regular expression of length $m$ is converted to an NFA with $O(m)$ states. Using the NFA it is possible to search a text of length $n$ in $O(mn)$ time. Frequently, a more efficient choice is to convert an NFA into a deterministic finite automata (DFA). A DFA maintains a single active state throughout the matching process and hence it is possible to search a text of length $n$ in $O(n)$ time\footnote{It is well hence, using a DFA it is possible to search a text of length $n$ in $O(n)$ time\footnote{It is well known that the conversion of an NFA to an equivalent DFA may result in state explosion. That is, the number of resultant DFA states may increase exponentially.}. A significant proportion of the research in fast regular expression matching can be in \textit{state explosion}, i.e., the number of resultant DFA states may increase exponentially.}. A significant proportion of the research in fast regular expression matching can be regarded as the quest for efficient automata'' \cite{navarro98fastand}. In \cite{baeza1992new}, Baeza-Yates and Gonnet performs can be reduced by a factor $w$. Building on this observation, the Shift-Or algorithm simulates an NFA using bitwise operations and achieves $O(\frac{nm}{w})$ worst-case time \cite{navarro2000}. bitwise operations and achieves $O(\frac{nm}{w})$ time in the worst-case \cite{navarro2000}. A disadvantage of the Shift-Or approach is an inability to skip input characters. Simple string matching algorithms, such as the Boyer-Moore family of algorithms \cite{boyer1977fast,horspool1980practical}, skip input characters such as the Boyer-Moore family of algorithms \cite{boyer1977fast,horspool1980practical}, skip input characters to achieve sublinear times in the average case. % Backward Dawg Matching (BDM) string matching algorithms \cite{crochemore1994text} matching tools for simple patterns \cite{navarro2000}. There has been considerable recent interest in accelerating regular expression matching on parallel hardware such as multicore processors (CPUs), Recently, there has been considerable interest in the use of parallel hardware such as multicore processors (CPUs), general purpose graphics processing units (GPGPUs), field-programmable gate arrays (FPGAs), and specialized architectures such as the Cell Broadband Engine (Cell BE). % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory) %CPU or specialized architectures such as the Cell Broadband Engine (Cell BE) to accelerate regular expression matching. % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory) Generally, speedups are achieved by using parallel hardware features to improve the throughput of multiple instances of a matching problem at a time, i.e., by matching against sets of patterns or multiple input streams. In contrast, our approach uses parallelism to accelerate the throughput of a single problem instance, i.e., a single regular expression matched against a single input stream. %\paragraph{Multicore} In related work targeting multicore hardware, Scarpazza and Braudaway \cite{scarpazza2008fast} demonstrated that text processing algorithms that exhibit irregular memory access patterns can be efficiently executed on multicore hardware. In related work, Pasetto et al presented a flexible tool that can be efficiently executed. Pasetto et al \cite{pasetto2010} presented a flexible tool that performs small-ruleset regular expression matching at a rate of 2.88 Gbps per chip on Intel Xeon E5472 hardware \cite{pasetto2010}. 2.88 Gbps per chip on Intel Xeon E5472 hardware. Naghmouchi et al \cite{scarpazza2011top,naghmouchi2010} demonstrated that the Aho-Corasick (AC) string matching algorithm \cite{aho1975} is well-suited for parallel implementation on multicore CPUs, GPGPUs and the Cell BE. On each system, both thread-level parallelism (cores) and data-level parallelism (SIMD units) were leveraged for performance. Salapura et al \cite{salapura2012accelerating} advocated the use of vector-style processing for regular expressions % On each system, both thread-level parallelism (cores) and data-level parallelism % (SIMD units) were leveraged for performance. Salapura et al \cite{salapura2012accelerating} advocated the use of vector-style processing for regular expressions in business analytics applications and leveraged the SIMD hardware available on multicore processors to achieve a speedup of greater than 1.8 over a range of data sizes of interest. %Cell In \cite{scarpazza2008}, Scarpazza and Russell presented a SIMD tokenizer range of data sizes. %\paragraph{Cell Broadband Engine} On the Cell Broadband Engine, Scarpazza and Russell presented a SIMD tokenizer that delivered 1.00--1.78 Gbps on a single Cell BE chip and extended this approach for emulation on the Intel Larrabee chip \cite{scarpazza2008} and extended this approach for emulation on the Intel Larrabee instruction set \cite{scarpazza2009larrabee}. On the Cell BE, Scarpazza \cite{scarpazza2009cell} described a pattern matching Furthermore, in \cite{scarpazza2009cell} Scarpazza, described a pattern matching implementation that delivered a throughput of 40 Gbps for a small dictionary of approximately 100 patterns and a throughput of 3.3-3.4 presented a string matching implementation for automata that achieved 4 Gbps on the Cell BE. % GPU %\paragraph{GPGPUs} On GPGPUs, Tumeo et al \cite{tumeo2010efficient} presented a chunk-based implementation of the AC algorithm for although system throughput was limited to 15 Gbps \cite{lin2013accelerating}. Most recently, Mytkowicz et al have developed a method for combining SIMD parallellism and data parallelism Most recently, Mytkowicz et al \cite{DBLP:conf/asplos/MytkowiczMS14} have developed a method for combining SIMD parallelism and data parallelism on multicore hardware. Of each of these related works, this approach stands out since it also focuses on the acceleration of matching against a single input stream. Whereas the existing approaches to parallelization have been