source: docs/PACT14/conclusion.tex @ 4645

Last change on this file since 4645 was 3898, checked in by cameron, 5 years ago

Clean out old style files; check in .bbl file with vfill/eject

File size: 6.1 KB
2\paragraph*{Contributions} A new class of regular expression matching algorithm has been
3introduced based on the concept of bit-parallel data streams
4together with the MatchStar operation.  The algorithm is
5fully general for nondeterministic regular expression matching;
6however it does not address the nonregular extensions
7found in Perl-compatible backtracking implementations.
8Taking advantage of the SIMD features available on commodity
9processors, its implementation in grep offers consistently
10good performance in contrast to available alternatives. 
11For moderately complex expressions, 10X or better
12performance advantages over DFA-based gre2p and 5X performance
13advantage over nrgrep were frequently seen.
14While lacking some
15special optimizations found in other engines to deal with
16repeated substrings or to perform skipping actions based
17on fixed substrings, it nevertheless performs competitively
18in all cases. 
20A model for parallelized long-stream addition has also been presented
21in the paper, allowing our techniques to scale beyond
22the blocks of 128 bytes we use with the SSE2 implementation.
23This model allowed straightforward extension to the 256-byte
24block size used in our AVX2 implementation and should
25continue to scale well up for SIMD vectors up to 4096 bytes
26in length based on 64-bit additions.    The model also
27supports GPU implementation with some additional
30\paragraph*{Related Work} Much of the previous work in parallelizing of regular processing
31has dealt with the problem of using parallel resources to handle
32multiple instances of a matching problem in parallel.   It is thus
33complementary to our approach which focuses on parallelization
34to accelerate the matching of a single instance.    From this
35perspective, the recent work of Mytkowicz et al \cite{DBLP:conf/asplos/MytkowiczMS14} stands
36out as an important comparator in that it also focusses
37on acceleration of matching for a single input stream.
38Mytkowicz use the SIMD byte-shuffle capabilities found, for example,
39in the SSSE3 instruction sets to perform small-table parallel lookups for
40multiple potentially active states of a FSM.    Data parallelism is
41achieved by initially considering all possible states at the
42beginning of each data segment, but then relying on convergence
43and range-coalescing optimizations to quickly reduce the number of active states in
44play.    Examining a large collection of regular expressions used in
45practice, these techniques were found to be effective, allowing
46matching to proceed with just one or two shuffles per input
49However, the Mytkowicz approach is still fundamentally considering
50input elements one byte at a time and would be hard pressed to
51compete with our reported results of 1-4 CPU cycles per input byte.
52It is also dependent on the availability of the SIMD byte-shuffle
53operation, which is unavailable in SIMD instructions sets such as
54SSE2 and ARM Neon, for example.
55Our SIMD implementation relies only on the availability of SIMD pack
56operations to efficiently implement the Parabix transform;
57SIMD pack is widely
58available in current SIMD instruction sets.   It is also a special
59case of the more general shuffle operations and hence available
60on any processor that supports byte shuffle.   The Parabix
61approach also has the further advantage that performance scales
62with increasing SIMD instruction width, as illustrated
63by our AVX2 performance results in comparison to SSE2.
65It is perhaps surprising that the classic nrgrep application is still
66competitive in performance for expressions that allow the BNDM
67algorithm to perform significant character skipping.     
68Although the length of possible skipping reduces with the
69complexity of the input expression considered, many applications
70of grep searching tend to use simple expressions in practice.
71Nevertheless, the Parabix approach offers consistently high
72performance often faster than nrgrep by a factor
73of 5X or more.
75\paragraph*{Ongoing and Future Work} Based on the techniques presented
76here a fully integrated
77grep version with a dynamic code generator implemented with LLVM
78is being developed by another team working with the Parabix
79technology (Dale Denis, Nick Sumner and Rob Cameron).
80An initial version is available at {\tt
82With icgrep-0.8, total compile-time overhead to translate our
83test expressions into executable x86 code ranges from 0.002
84seconds to 0.008 seconds for our test cases.   
85Although this represents a tolerable overhead
86of 0.64 cycles/byte for our 40 MB test file,
87we expect that a substantial reduction of this
88overhead is feasible.
90Further work on the compilation algorithms includes the extending the algorithms
91to use MatchStar in Kleene-* repetitions beyond those
92of single characters (bytes).   Each such extension would
93replace while-loop iteration with addition and bitwise logic.
94For example, 
95the repetitions of
96variable-length UTF-8 byte sequences
97can be compiled without while loops by taking advantage
98of synchronizing properties of UTF-8.
100Future work also includes the development of multicore versions of the
101underlying algorithms to further accelerate performance and to
102handle regular expression matching problems involving larger rule sets than are
103typically encountered in the grep problem.  Such implementations
104could have useful application in tokenization and network
105intrusion detection for example.   Additional GPU implementation
106work could take advantage of specialized instructions available
107on particular platforms but not generally avaiable through OpenCL.
108For both multicore and GPU implementations,
109data-parallel division of input streams could benefit from
110techniques such as the principled speculation of Zhao et al
112for example.
114Other area of interest include extending the capabilities of
115the underlying method with addition features for substring
116capture, zero-width assertions and possibly
117backreference matching.  Adding Unicode support beyond
118basic Unicode character handling to include full Unicode
119character class support and normalization forms is also
120worth investigating.
Note: See TracBrowser for help on using the repository browser.