Changeset 3468
 Timestamp:
 Sep 13, 2013, 1:16:01 AM (6 years ago)
 Location:
 docs/Working/re
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/re/ppoppre.tex
r3467 r3468 85 85 of string patterns has a long history and 86 86 remains a pervasive technique throughout computing applications today. 87 {\em a brief history} 87 % {\em a brief history} 88 The origins of regular expression matching date back to automata theory 89 developed by Kleene in the 1950s \cite{kleene1951}. 90 Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions 91 to nondeterministic finite automata (NFA). 92 Following Thompson's approach, a regular expression of length m is first converted 93 to an NFA with O(m) nodes. It is then possible to search a text of length n using the 94 NFA in worst case O(mn) time. Often, a more efficient choice 95 is to convert the NFA into a DFA. A DFA has only a single active state at any time 96 in the matching process and 97 hence it is possible to search a text at of length n in worstcase O(n) optimal. 98 However, it is well known that the conversion of an NFA to an equivalent DFA may result 99 in state explosion. That is, the number of resultant DFA states may increase exponentially. 100 In \cite{Baezayates_anew} a new approach to text searching was proposed based on bitparallelism \cite{baeza1992new}. 101 This technique takes advantage of the intrinsic parallelism of bitwise operations 102 within a computer word. Given a wbit word, the ShiftOr algorithm \cite{Baezayates_anew} algorithm uses the 103 bitparallel approach to 104 simulate an NFA in O($nm/w$) worstcase time. 105 106 A disadvantage of the bitparallel ShiftOr pattern matching approach 107 in comparison to simple string matching algorithms is an inability to skip input characters. 108 For example, the BoyerMoore family of algorithms \cite{boyer1977fast} skip input characters 109 to achieve sublinear times in the average case. Backward Dawg Matching 110 (BDM) string matching algorithms \cite{crochemore1994text} based on suffix automata are able to skip characters. 111 The Backward Nondeterministic Dawg Matching (BNDM) pattern matching algorithm \cite{wu1992fast} 112 combines the bitparallel advantages of ShiftOr and with the character skipping advantages of the BDM algorithm. 113 The nrgrep pattern matching tool is built over the BNDM algorithm, 114 and hence the name nrgrep \cite{navarro2000}. 88 115 89 116 There has been considerable interest in using parallelization techniques 90 117 to improve the performance of regular expression matching. 91 {\em a brief review} \cite{scarpazza2011top} 92 GPUs\cite{lin2013accelerating} 93 FPGAs\cite{lee2007high} 118 {\em a brief review} 119 120 Scarpazza and Braudaway demonstrate that irregular, 121 controlflow dominated text processing algorithms can be efficiently 122 executed on multicore hardware \cite{scarpazza2008fast}. 123 In related work, Pasetto et al. present a flexible tool that 124 performs smallruleset regular expression matching at a rate of 125 2.88 Gbps per chip on Intel Xeon E5472 hardware \cite{pasetto2010}. 126 Naghmouchi et al. demonstrate that the AhoCorasick 127 string matching algorithm is well suited for implementation on commodity multicore, the 128 Cell Broadband Engine and GPGPU hardware \cite{scarpazza2011top, naghmouchi2010}. In each case, both thread 129 level (additional cores) and datalevel (SIMD units) hardware features are leveraged for performance. 130 On the Cell Broadband Engine, Scarpazzaâs implementation delivers a throughput of 40 131 Gbps for a small dictionary size of approximately 100 patterns, and a throughput of 3.33.4 132 Gbps for a larger dictionary of tens of thousands of patterns. 133 Scarpazza and Russell present a SIMD tokenizer that delivers 1.00â1.78 Gbps on a single 134 Cell Broadband Engine chip \cite{scarpazza2008} and extend this approach for emulation on the Intel Larrabee 135 instruction set \cite{scarpazza2009larrabee}. Iorio and van Lunteren describe a string matching implementation for 136 automata that achieves 4 Gbps on the Cell Broadband Engine \cite{iorio2008}. 137 In more recent work, 138 139 TODO 140 94 141 SIMD\cite{salapura2012accelerating} 95 Cell BE\cite{scarpazza2011top} 96 97 142 143 and 144 % GPUs\cite{lin2013accelerating} 145 TODO 98 146 99 147 Whereas the existing approaches to parallelization have been 
docs/Working/re/reference.bib
r3467 r3468 204 204 publisher={ACM} 205 205 } 206 207 @book{crochemore1994text, 208 title={Text algorithms}, 209 author={Crochemore, Maxime and Rytter, Wojciech and Crochemore, Maxime}, 210 volume={698}, 211 year={1994}, 212 publisher={World Scientific} 213 } 214 215 @article{pasetto2010, 216 author={D. Pasetto and F. Petrini and V. Agarwal}, 217 year={2010}, 218 title={Tools for very fast regular expression matching}, 219 journal={Computer}, 220 volume={43}, 221 number={3}, 222 pages={5058} 223 } 224 225 @inproceedings{naghmouchi2010, 226 author={Jamin Naghmouchi and Daniele Paolo Scarpazza and Mladen Berekovic}, 227 year={2010}, 228 title={Smallruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization}, 229 booktitle={Proceedings of the 24th ACM International Conference on Supercomputing}, 230 series={ICS '10}, 231 publisher={ACM}, 232 address={New York, NY, USA}, 233 location={Tsukuba, Ibaraki, Japan}, 234 pages={337348}, 235 isbn={9781450300186}, 236 url={http://doi.acm.org/10.1145/1810085.1810130} 237 } 238 239 @inproceedings{iorio2008, 240 author={F. Iorio and J. Van Lunteren}, 241 year={2008}, 242 title={Fast pattern matching on the Cell Broadband Engine}, 243 booktitle={2008 Workshop on Cell Systems and Applications (WCSA), affiliated with the} 244 } 245 246 @article{scarpazza2009larrabee, 247 author={D. P. Scarpazza}, 248 year={2009}, 249 title={Is Larrabee For the Rest of Us?}, 250 journal={Dr.DobbÃ¢â¬â¢s J} 251 } 252 253 @inproceedings{scarpazza2008, 254 author={D. P. Scarpazza and G. F. Russell}, 255 year={2009}, 256 title={Highperformance regular expression scanning on the Cell/BE processor}, 257 booktitle={Proceedings of the 23rd international conference on Supercomputing}, 258 publisher={ACM}, 259 pages={1425} 260 } 261 262 @misc{scarpazza2008fast, 263 author={Daniele Paolo Scarpazza and Oreste Villa and Fabrizio Petrinni}, 264 year={2008}, 265 title={Fast String Searches \& Multicore Processors Mapping fundamental algorithms on parallel hardware}, 266 journal={Dr.Dobb's Journal}, 267 number={407}, 268 pages={20} 269 }
Note: See TracChangeset
for help on using the changeset viewer.