source: docs/Working/re/conclusion.tex @ 3878

Last change on this file since 3878 was 3870, checked in by cameron, 5 years ago

Formatting, PACT publication template

File size: 5.7 KB
3A new class of regular expression matching algorithm has been
4introduced based on the concept of bit-parallel data streams
5together with the MatchStar operation.  The algorithm is
6fully general for nondeterministic regular expression matching;
7however it does not address the nonregular extensions
8found in Perl-compatible backtracking implementations.
9Taking advantage of the SIMD features available on commodity
10processors, its implementation in grep offers consistently
11good performance in contrast to available alternatives. 
12For moderately complex expressions, 10X or better
13performance advantages over DFA-based gre2p and 5X performance
14advantage over nrgrep were frequently seen.
15While lacking some
16special optimizations found in other engines to deal with
17repeated substrings or to perform skipping actions based
18on fixed substrings, it nevertheless performs competitively
19in all cases. 
21A model for parallelized long-stream addition has also been presented
22in the paper, allowing our techniques to scale beyond
23the blocks of 128 bytes we use with the SSE2 implementation.
24This model allowed straightforward extension to the 256-byte
25block size used in our AVX2 implementation and should
26continue to scale well up for SIMD vectors up to 4096 bytes
27in length based on 64-bit additions.    The model also
28supports GPGPU implementation with some additional
31\paragraph{Related Work}
32Much of the previous work in parallelizing of regular processing
33has dealt with the problem of using parallel resources to handle
34multiple instances of a matching problem in parallel.   It is thus
35complementary to our approach which focuses on parallelization
36to accelerate the matching of a single instance.    From this
37perspective, the recent work of Mytkowicz et al \cite{DBLP:conf/asplos/MytkowiczMS14} stands
38out as an important comparator in that it also focusses
39on acceleration of matching for a single input stream.
40Mytkowicz use the SIMD byte-shuffle capabilities found, for example,
41in the SSE3 instruction sets to perform small-table parallel lookups for
42multiple potentially active states of a FSM.    Data parallelism is
43achieved by initially considering all possible states at the
44beginning of each data segment, but then relying on convergence
45and range-coalescing optimizations to quickly reduce the number of active states in
46play.    Examining a large collection of regular expressions used in
47practice, these techniques were found to be effective, allowing
48matching to proceed with just one or two shuffles per input
51However, the Mytkowicz approach is still fundamentally considering
52input elements one byte at a time and would be hard pressed to
53compete with our reported results of 1-3 CPU cycles per input byte.
54It is also dependent on the availability of the SIMD byte-shuffle
55operation, which is unavailable in SIMD instructions sets such as
56SSE2 and ARM Neon, for example.
57Our SIMD implementation relies only on the availability of SIMD pack
58operations to efficiently implement the Parabix transform;
59SIMD pack is widely
60available in current SIMD instruction sets.   It is also a special
61case of the more general shuffle operations and hence available
62on any processor that supports byte shuffle.   The Parabix
63approach also has the further advantage that performance scales
64with increasing SIMD instruction width, as illustrated
65by our AVX2 performance results in comparison to SSE2.
67It is perhaps surprising that the classic nrgrep application is still
68competitive in performance for expressions that allow the BNDM
69algorithm to perform significant character skipping.     
70Although the length of possible skipping reduces with the
71complexity of the input expression considered, many applications
72of grep searching tend to use simple expressions in practice.
73Nevertheless, the Parabix approach offers consistently high
74performance often faster than nrgrep by a factor
75of 5X or more.
77\paragraph*{Ongoing and Future Work}
79Based on the techniques presented here a fully integrated
80grep version with a dynamic code generator implemented with LLVM
81is being developed by another team working with the Parabix
82technology (Dale Denis, Nigel Medforth, Nick Sumner and Rob Cameron).
83An initial version is available at
86Further work includes the extending the algorithms
87to use MatchStar in Kleene-* repetitions beyond those
88of single characters (bytes).   Each such extension would
89replace while-loop iteration with addition and bitwise logic.
90The UTF-8 example shows this is possible for repetitions of
91variable-length byte sequences having particular synchronizing
92properties, for example.
94Future work also includes the dvelopment of multicore versions of the
95underlying algorithms to further accelerate performance and to
96handle regular expression matching problems involving larger rule sets than are
97typically encountered in the grep problem.  Such implementations
98could have useful application in tokenization and network
99intrusion detection for example.   Additional GPGPU implementation
100work could take advantage of specialized instructions available
101on particular platforms but not generally avaiable through OpenCL.
102For both multicore and GPGPU implementations,
103data-parallel division of input streams could benefit from
104techniques such as the principled speculation of Zhao et al
106for example.
108Other area of interest include extending the capabilities of
109the underlying method with addition features for substring
110capture, zero-width assertions and possibly
111backreference matching.  Adding Unicode support beyond
112basic Unicode character handling to include full Unicode
113character class support and normalization forms is also
114worth investigating.
Note: See TracBrowser for help on using the repository browser.