Ignore:
Timestamp:
Jun 10, 2014, 7:29:28 PM (5 years ago)
Author:
cameron
Message:

Update conclusion with related work discussion, etc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/conclusion.tex

    r3665 r3865  
    1 \section{Conclusion}\label{sec:Concl}
     1\section{Discussion}\label{sec:Concl}
    22\paragraph*{Contributions}
    33A new class of regular expression matching algorithm has been
     
    2727in length based on 64-bit additions.    The model also
    2828supports GPGPU implementation with some additional
    29 effort, and suggests that direct GPGPU support for
    30 the \verb#hsimd<64>::mask(X)# and
    31 \verb#simd<64>::spread(x)#  operations could
    32 be valuable.
     29effort.
    3330
    34 The principal overhead in this method is the transposition of
    35 input data to parallel bit stream form.   However, this overhead reduces
    36 as SIMD register widths increase.   It is also possible to introduce
    37 new SIMD instructions that could dramatically reduce the
    38 cost of transposition\cite{cameron2009architectural}.
     31\paragraph{Related Work}
     32Much of the previous work in parallelizing of regular processing
     33has dealt with the problem of using parallel resources to handle
     34multiple instances of a matching problem in parallel.   It is thus
     35complementary to our approach which focuses on parallelization
     36to accelerate the matching of a single instance.    From this
     37perspective, the recent work of Mytkowicz et al \cite{DBLP:conf/asplos/MytkowiczMS14} stands
     38out as an important comparator in that it also focusses
     39on acceleration of matching for a single input stream.
     40Mytkowicz use the SIMD byte-shuffle capabilities found, for example,
     41in the SSE3 instruction sets to perform small-table parallel lookups for
     42multiple potentially active states of a FSM.    Data parallelism is
     43achieved by initially considering all possible states at the
     44beginning of each data segment, but then relying on convergence
     45and range-coalescing optimizations to quickly reduce the number of active states in
     46play.    Examining a large collection of regular expressions used in
     47practice, these techniques were found to be effective, allowing
     48matching to proceed with just one or two shuffles per input
     49symbol.
    3950
    40 \paragraph*{Future Work}
     51However, the Mytkowicz approach is still fundamentally considering
     52input elements one byte at a time and would be hard pressed to
     53compete with our reported results of 1-3 CPU cycles per input byte.
     54It is also dependent on the availability of the SIMD byte-shuffle
     55operation, which is unavailable in SIMD instructions sets such as
     56SSE2 and ARM Neon, for example.
     57Our SIMD implementation relies only on the availability of SIMD pack
     58operations to efficiently implement the Parabix transform;
     59SIMD pack is widely
     60available in current SIMD instruction sets.   It is also a special
     61case of the more general shuffle operations and hence available
     62on any processor that supports byte shuffle.   The Parabix
     63approach also has the further advantage that performance scales
     64with increasing SIMD instruction width, as illustrated
     65by our AVX2 performance results in comparison to SSE2.
    4166
    42 An important area of future work is to develop and assess
    43 multicore versions of the algorithm to handle regular expression
    44 matching problems involving larger rule sets than are
     67It is perhaps surprising that the classic nrgrep application is still
     68competitive in performance for expressions that allow the BNDM
     69algorithm to perform significant character skipping.     
     70Although the length of possible skipping reduces with the
     71complexity of the input expression considered, many applications
     72of grep searching tend to use simple expressions in practice.
     73Nevertheless, the Parabix approach offers consistently high
     74performance often faster than nrgrep by a factor
     75of 5X or more.
     76
     77\paragraph*{Ongoing and Future Work}
     78
     79Based on the techniques presented here a fully integrated
     80grep version with a dynamic code generator based on LLVM
     81is being developed by another team working with the Parabix
     82technology (Dale Denis, Nigel Medforth, Nick Sumner and Rob Cameron).
     83An initial version is available at
     84{\tt http://parabix.costar.sfu.ca/icGREP}.
     85
     86Future work includes the dvelopment of multicore versions of the
     87underlying algorithms to further accelerate performance and to
     88handle regular expression matching problems involving larger rule sets than are
    4589typically encountered in the grep problem.  Such implementations
    4690could have useful application in tokenization and network
    47 intrusion detection for example.
     91intrusion detection for example.   Further GPGPU implementation
     92work could take advantage of specialized instructions available
     93on particular platforms but not generally avaiable through CUDA.
     94For both multicore and GPGPU implementations, improved
     95support for data-parallel division of input streams using
     96techniques such as the principled speculation of Zhao et al
     97\cite{DBLP:conf/asplos/ZhaoWS14},
     98for example.
    4899
    49 Another area of interest is to extend the capabilities of
     100Other area of interest include extending the capabilities of
    50101the underlying method with addition features for substring
    51102capture, zero-width assertions and possibly
    52 backreference matching.  Extending Unicode support beyond
     103backreference matching.  Adding Unicode support beyond
    53104basic Unicode character handling to include full Unicode
    54105character class support and normalization forms is also
Note: See TracChangeset for help on using the changeset viewer.