source: docs/Working/re/conclusion.tex @ 3513

Last change on this file since 3513 was 3513, checked in by cameron, 6 years ago

Final edits

File size: 2.0 KB
3A new class of regular expression matching algorithm has been
4introduced based on the concept of bit-parallel data streams
5together with the MatchStar operation.  The algorithm is
6fully general for nondeterministic regular expression matching;
7however it does not address the nonregular extensions
8found in Perl-compatible backtracking implementations.
9Taking advantage of the SIMD features available on commodity
10processors, its implementation in a grep offers consistently
11good performance in contrast to available alternatives. 
12For moderately complex expressions, 10X or better
13performance advantages over GNU grep and 5X performance
14advantage over nrgrep were frequently seen.
15While lacking some
16special optimizations found in other engines to deal with
17repeated substrings or to perform skipping actions based
18on fixed substrings, it nevertheless performs competitively
19in all cases. 
21A parallelized algorithm for long-stream addition has also
22been introduced in the paper making a key contribution
23to the scalability of the bit-parallel matching technique
24overall and that of MatchStar in particular.   This
25algorithm has enabled straightforward extension of the
26matching algorithm to the implementation using 256-bit
27AVX2 technology as well as 4096-bit SIMT implementation
28on an AMD GPU.
30\paragraph*{Future Work}
32An important area of future work is to develop and assess
33multicore versions of the algorithm to handle regular expression
34matching problems involving larger rulesets than are
35typically encountered in the grep problem.  Such implementations
36could have useful application in tokenization and network
37intrusion detection for example.
39Another area of interest is to extend the capabilities of
40the underlying method with addition features for substring
41capture, zero-width assertions and possibly
42backreference matching.  Extending Unicode support beyond
43basic Unicode character handling to include full Unicode
44character class support and normalization forms is also
45worth investigating.
Note: See TracBrowser for help on using the repository browser.