Changeset 3881
 Timestamp:
 Jun 17, 2014, 10:27:39 PM (5 years ago)
 Location:
 docs/Working/re
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/re/pact051cameron.tex
r3880 r3881 64 64 \section{Introduction} 65 65 66 The use of regular expressions to search texts for occurrences67 of stringpatterns has a long history and68 remains a pervasive technique throughout computing applications today.66 The use of regular expressions to search texts 67 for patterns has a long history and 68 remains an important technique. 69 69 % {\em a brief history} 70 The origins of regular expression matching date back to automata theory 71 developed by Kleene in the 1950s \cite{kleene1951}. 72 Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions 70 %The origins of regular expression matching date back to automata theory 71 %studied by Kleene in the 1950s \cite{kleene1951}. 72 Thompson \cite{thompson1968} is credited with the 73 first construction to convert regular expressions 73 74 to nondeterministic finite automata (NFA). 74 Following Thompson's approach, a regular expression of length $m$ is converted 75 to an NFA with $O(m)$ states. It is then possible to search a text of length $n$ using the 76 NFA in worst case $O(mn)$ time. Often, a more efficient choice 77 is to convert an NFA into a DFA. A DFA has a single active state at any time 75 Following Thompson's approach, 76 a regular expression of length $m$ is converted 77 to an NFA with $O(m)$ states. Using the 78 NFA it is possible to search a text 79 of length $n$ in $O(mn)$ time. 80 Frequently, a more efficient choice 81 is to convert an NFA into a deterministic finite automata (DFA). 82 A DFA maintains a single active state 78 83 throughout the matching process and 79 hence it is possible to search a text of length $n$ in $O(n)$ time\footnote{It is well 84 hence, using a DFA it is possible to 85 search a text of length $n$ in $O(n)$ time\footnote{It is well 80 86 known that the conversion of an NFA to an equivalent DFA may result 81 in state explosion. That is, the number of resultant DFA states may increase exponentially.}. 82 83 A significant proportion of the research in fast regular expression matching can be 87 in \textit{state explosion}, i.e., the number of resultant DFA states may increase exponentially.}. 88 89 A significant proportion of the research 90 in fast regular expression matching can be 84 91 regarded as the ``quest for efficient automata'' \cite{navarro98fastand}. 85 92 In \cite{baeza1992new}, BaezaYates and Gonnet … … 90 97 performs can be reduced by a factor $w$. 91 98 Building on this observation, the ShiftOr algorithm simulates an NFA using 92 bitwise operations and achieves $O(\frac{nm}{w})$ worstcase time \cite{navarro2000}.99 bitwise operations and achieves $O(\frac{nm}{w})$ time in the worstcase \cite{navarro2000}. 93 100 A disadvantage of the ShiftOr approach 94 101 is an inability to skip input characters. 95 102 Simple string matching algorithms, 96 such as the BoyerMoore family of algorithms \cite{boyer1977fast,horspool1980practical}, skip input characters 103 such as the BoyerMoore family of algorithms \cite{boyer1977fast,horspool1980practical}, 104 skip input characters 97 105 to achieve sublinear times in the average case. 98 106 % Backward Dawg Matching (BDM) string matching algorithms \cite{crochemore1994text} … … 108 116 matching tools for simple patterns \cite{navarro2000}. 109 117 110 There has been considerable recent 111 interest in accelerating regular expression matching 112 on parallel hardware 113 such as multicore processors (CPUs), 118 Recently, there has been considerable 119 interest in the use of 120 parallel hardware such as multicore processors (CPUs), 114 121 general purpose graphics processing units (GPGPUs), 115 122 fieldprogrammable gate arrays (FPGAs), 116 and specialized architectures such as 117 the Cell Broadband Engine (Cell BE). % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory) 118 %CPU 123 or specialized architectures such as 124 the Cell Broadband Engine (Cell BE) to 125 accelerate regular expression matching. % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory) 126 Generally, speedups are achieved by 127 using parallel hardware features to improve the throughput of 128 multiple instances of a matching problem at a time, 129 i.e., by matching against sets of patterns or multiple input streams. 130 In contrast, our approach uses parallelism to accelerate the throughput of 131 a single problem instance, i.e., a single regular expression 132 matched against a single input stream. 133 134 %\paragraph{Multicore} 135 In related work targeting multicore hardware, 119 136 Scarpazza and Braudaway \cite{scarpazza2008fast} demonstrated that 120 137 text processing algorithms that exhibit irregular memory access patterns 121 can be efficiently executed on multicore hardware.122 In related work, Pasetto et alpresented a flexible tool that138 can be efficiently executed. 139 Pasetto et al \cite{pasetto2010} presented a flexible tool that 123 140 performs smallruleset regular expression matching at a rate of 124 2.88 Gbps per chip on Intel Xeon E5472 hardware \cite{pasetto2010}.141 2.88 Gbps per chip on Intel Xeon E5472 hardware. 125 142 Naghmouchi et al \cite{scarpazza2011top,naghmouchi2010} demonstrated that the AhoCorasick (AC) 126 143 string matching algorithm \cite{aho1975} is wellsuited for parallel 127 144 implementation on multicore CPUs, GPGPUs and the Cell BE. 128 On each system, both threadlevel parallelism (cores) and datalevel parallelism 129 (SIMD units) were leveraged for performance. 130 Salapura et al \cite{salapura2012accelerating} advocated the use of vectorstyle processing for regular expressions 145 % On each system, both threadlevel parallelism (cores) and datalevel parallelism 146 % (SIMD units) were leveraged for performance. 147 Salapura et al \cite{salapura2012accelerating} advocated 148 the use of vectorstyle processing for regular expressions 131 149 in business analytics applications and leveraged the SIMD hardware available 132 150 on multicore processors to achieve a speedup of greater than 1.8 over a 133 range of data sizes of interest.134 % Cell135 In \cite{scarpazza2008}, Scarpazza and Russell presented a SIMD tokenizer151 range of data sizes. 152 %\paragraph{Cell Broadband Engine} 153 On the Cell Broadband Engine, Scarpazza and Russell presented a SIMD tokenizer 136 154 that delivered 1.001.78 Gbps on a single 137 Cell BE chipand extended this approach for emulation on the Intel Larrabee155 chip \cite{scarpazza2008} and extended this approach for emulation on the Intel Larrabee 138 156 instruction set \cite{scarpazza2009larrabee}. 139 On the Cell BE, Scarpazza \cite{scarpazza2009cell}described a pattern matching157 Furthermore, in \cite{scarpazza2009cell} Scarpazza, described a pattern matching 140 158 implementation that delivered a throughput of 40 141 159 Gbps for a small dictionary of approximately 100 patterns and a throughput of 3.33.4 … … 143 161 presented a string matching implementation for automata that achieved 144 162 4 Gbps on the Cell BE. 145 % GPU163 %\paragraph{GPGPUs} 146 164 On GPGPUs, Tumeo et al \cite{tumeo2010efficient} presented a chunkbased 147 165 implementation of the AC algorithm for … … 152 170 although system throughput was limited to 15 Gbps 153 171 \cite{lin2013accelerating}. 154 Most recently, Mytkowicz et al have developed a method for combining 155 SIMD parallellism and data parallelism 172 Most recently, Mytkowicz et al \cite{DBLP:conf/asplos/MytkowiczMS14} 173 have developed a method for combining 174 SIMD parallelism and data parallelism on multicore hardware. 175 Of each of these related works, this approach stands 176 out since it also focuses on the acceleration 177 of matching against a single input stream. 156 178 157 179 Whereas the existing approaches to parallelization have been
Note: See TracChangeset
for help on using the changeset viewer.