source: docs/Working/re/conclusion.tex @ 3497

Last change on this file since 3497 was 3486, checked in by cameron, 6 years ago


File size: 2.1 KB
3A new class of regular expression matching algorithm has been
4introduced based on the concept of bit-parallel data streams
5together with the MatchStar operation.  The algorithm is
6fully general for nondeterministic regular expression matching;
7however it does not address the nonregular extensions
8found in Perl-compatible backtracking implementations.
9Taking advantage of the SIMD features available on commodity
10processors, its implementation in a grep too offers consistently good performance in
11contrast to available alternatives.   While lacking some
12special optimizations found in other engines to deal with
13repeated substrings or to perform skipping actions based
14on fixed substrings, it nevertheless performs competitively
15in all cases.  The algorithm tends to scale very well with regular
16expression complexity, often with order-of-magnitude
17performance advantage over even the best of its competitors.
19A parallelized algorithm for long-stream addition has also
20been introduced in the paper making a key contribution
21to the scalability of the bit-parallel matching technique
22overall and that of MatchStar in particular.   This
23algorithm has enabled straightforward extension of the
24matching algorithm to the implementation using 256-bit
25AVX2 technology as well as 4096-bit SIMT implementation
26on an AMD GPU.
28\paragraph*{Future Work}
30An important area of future work is to develop and assess
31multicore versions of the algorithm to handle regular expression
32matching problems involving larger rulesets than are
33typically encountered in the grep problem.  Such implementations
34could have useful application in tokenization and network
35intrusion detection for example.
37Another area of interest is to extend the capabilities of
38the underlying method with addition features for substring
39capture, zero-width assertions and possibly
40backreference matching.  Extending Unicode support beyond
41basic Unicode character handling to include full Unicode
42character class support and normalization forms is also
43worth investigating.
Note: See TracBrowser for help on using the repository browser.