Changeset 3879 for docs/Working

Ignore:
Timestamp:
Jun 17, 2014, 5:50:31 PM (5 years ago)
Message:

Corrected detail about the cost of transposition. Retitled section more accurately. Note we analyze nonskipping NFA methods & how to extend to skipping. More rough approximation at end of section.

File:
1 edited

Legend:

Unmodified
 r3868 \section{Running-time Comparison with DFA and NFA \section{Running-time Comparison with Base NFA Implementations}\label{sec:analysis} Our experimental results indicate that regular expression matching using bitstreams can outperform current implementations of NFA- and DFA-based bitstreams can outperform current implementations of NFA- (and DFA-) based matching. It is worth exploring why this is so, and under what conditions one might Also inside this loop is the transposition step that converts character-encoded files into their bitstream representation; this transposition takes $O(\log w)$ work per loop iteration. In total, this is $O(m \log \sigma + \log w)$ work per iteration. this transposition takes $O(\log \sigma \log \log \sigma)$ work per loop iteration. In total, this is $O(m \log \sigma + \log w + \log \sigma \log \log \sigma)$ work per iteration. In current practice, we have $\log w$ around 8 (for 256-bit architectures), and $\log \sigma$ at least 7. with current and foreseeable technology--we do not expect to see $\log w$ skyrocket. So we can absorb the $\log w$ term and state the work as $O(m \log \sigma)$ per iteration. So we can absorb the $\log w$ term and state the work as $O(m \log \sigma + \log \sigma \log \log \sigma)$ per iteration. We multiply this by $O(\frac{n}{w})$ iterations to give $O(\frac{n m \log \sigma}{w})$ work. $O(\frac{n (m + \log \log \sigma) \log \sigma}{w})$ work. We further note that all of the work in the loop is we expect few cache misses with bitstream method. Compare this with NFA methods. We compare this with base NFA methods; by base'' here we mean NFA methods that do not skip input characters. The performance of input-skipping methods can be approximated by first analyzing the performance of the base method and then multiplying this by the expected fraction of examined (non-skipped) input. In the base NFA method, a state set of approximately $m$ states is kept as a bit $4*\frac{m}{4}$, or simply $m$, cycles. Our method, on the other hand, takes time $O(\frac{m \log \sigma}{w})$ per input character, where the constant inside the big-Oh is approximately 2. Our method, on the other hand, takes time $O(\frac{m \log \sigma + \log \log \sigma \log \sigma}{w})$ per input character, where the constant inside the big-Oh is approximately 2 for the first part of the numerator and 6 for the second part. Furthermore, we expect no cache misses due to the regular stride of our memory accesses. For ASCII, this time becomes at most $2 \frac{7m}{w} = \frac{14m}{w}$ cycles per character. This is faster than the small-$k$ NFA methods by a factor of $w/14$. For processors with a 128-bit word, this is about one order of magnitude. For UTF-8 (or ASCII), this time becomes at most $2 \frac{8m}{w} + 6 \frac{24}{w} = \frac{16m + 144}{w}$ cycles per character. For processors with a 128-bit word, this is $\frac{16m + 144}{128} = \frac{m}{8} + \frac{9}{8}$ cycles per character. Comparing this with the at least $m$ cycles per character of the base NFA methods, we expect these NFA methods to be competitive with our method only when the size of the regular expression is $1$. As the size of the regular expression increases, we expect our method to approach a factor-of-8 improvement over the base NFA methods. In theory, our improvement factor should scale closely with the word size; so that for processors with a 256-bit word, we expect an 16x improvement, and for processors with a 512-bit word, we expect a 32x improvement. In practice, there is some reduction in these improvement factors due to the instruction sets of larger-width processors not yet being as versatile as those of 128-bit processors.  (For example, full-width addition is not yet supported.)