Changeset 3879 for docs


Ignore:
Timestamp:
Jun 17, 2014, 5:50:31 PM (4 years ago)
Author:
shermer
Message:

Corrected detail about the cost of transposition. Retitled section more accurately. Note we analyze nonskipping NFA methods & how to extend to skipping. More rough approximation at end of section.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/analysis.tex

    r3868 r3879  
    1 \section{Running-time Comparison with DFA and NFA
     1\section{Running-time Comparison with Base NFA
    22Implementations}\label{sec:analysis}
    33
    44Our experimental results indicate that regular expression matching using
    5 bitstreams can outperform current implementations of NFA- and DFA-based
     5bitstreams can outperform current implementations of NFA- (and DFA-) based
    66matching.
    77It is worth exploring why this is so, and under what conditions one might
     
    5151Also inside this loop is the transposition step that converts character-encoded
    5252files into their bitstream representation;
    53 this transposition takes $O(\log w)$ work per loop iteration.
    54 
    55 In total, this is $O(m \log \sigma + \log w)$ work per iteration.
     53this transposition takes $O(\log \sigma \log \log \sigma)$ work per loop
     54iteration.
     55
     56In total, this is $O(m \log \sigma + \log w + \log \sigma \log \log \sigma)$
     57work per iteration.
    5658In current practice, we have $\log w$ around 8 (for 256-bit architectures),
    5759and $\log \sigma$ at least 7.
     
    5961with current and foreseeable technology--we do not expect to see $\log w$
    6062skyrocket.
    61 So we can absorb the $\log w$ term and state the work as $O(m \log \sigma)$ per
    62 iteration.
     63So we can absorb the $\log w$ term and state the work as $O(m \log \sigma + \log
     64\sigma \log \log \sigma)$ per iteration.
    6365We multiply this by $O(\frac{n}{w})$ iterations to give
    64 $O(\frac{n m \log \sigma}{w})$ work.
     66$O(\frac{n (m + \log \log \sigma) \log \sigma}{w})$ work.
    6567
    6668We further note that all of the work in the loop is
     
    8587we expect few cache misses with bitstream method.
    8688
    87 Compare this with NFA methods.
     89We compare this with base NFA methods; by ``base'' here we mean NFA methods
     90that do not skip input characters.
     91The performance of input-skipping methods can be approximated by first analyzing
     92the performance of the base method and then multiplying this by the
     93expected fraction of examined (non-skipped) input.
     94
    8895In the base NFA method, a state set of approximately $m$ states is kept
    8996as a bit
     
    192199$4*\frac{m}{4}$, or simply $m$, cycles.
    193200
    194 Our method, on the other hand, takes time $O(\frac{m \log \sigma}{w})$ per
    195 input character, where the constant inside the big-Oh is approximately 2.
     201Our method, on the other hand, takes time $O(\frac{m \log \sigma + \log \log
     202\sigma \log \sigma}{w})$ per input character, where the constant inside the
     203big-Oh is approximately 2 for the first part of the numerator and 6 for the
     204second part.
    196205Furthermore, we expect no cache misses due to the regular stride of our memory
    197206accesses. 
    198 For ASCII, this time becomes at most $2 \frac{7m}{w} = \frac{14m}{w}$ cycles
    199 per character.
    200 This is faster than the small-$k$ NFA methods by a factor of $w/14$.
    201 For processors with a 128-bit word, this is about one order of magnitude.
    202 
    203 
    204 
    205 
    206 
     207For UTF-8 (or ASCII), this time becomes at most $2 \frac{8m}{w} + 6 \frac{24}{w}
     208= \frac{16m + 144}{w}$ cycles per character.
     209
     210For processors with a 128-bit word, this is $\frac{16m + 144}{128} = \frac{m}{8}
     211+ \frac{9}{8}$ cycles per character.
     212Comparing this with the at least $m$ cycles per character of the base NFA
     213methods, we expect these NFA methods to be competitive with our method only when
     214the size of the regular expression is $1$.
     215As the size of the regular expression increases, we expect our method
     216to approach a factor-of-8 improvement over the base NFA methods.
     217
     218In theory, our improvement factor should scale closely with the word size; so
     219that for processors with a 256-bit word, we expect an 16x improvement, and for
     220processors with a 512-bit word, we expect a 32x improvement.
     221In practice, there is some reduction in these improvement factors due to
     222the instruction sets of larger-width processors not yet being as versatile as
     223those of 128-bit processors.  (For example, full-width addition is not yet
     224supported.)
     225
     226
     227
     228
     229
Note: See TracChangeset for help on using the changeset viewer.