 Timestamp:
 Jun 17, 2014, 5:50:31 PM (4 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/re/analysis.tex
r3868 r3879 1 \section{Runningtime Comparison with DFA andNFA1 \section{Runningtime Comparison with Base NFA 2 2 Implementations}\label{sec:analysis} 3 3 4 4 Our experimental results indicate that regular expression matching using 5 bitstreams can outperform current implementations of NFA and DFAbased5 bitstreams can outperform current implementations of NFA (and DFA) based 6 6 matching. 7 7 It is worth exploring why this is so, and under what conditions one might … … 51 51 Also inside this loop is the transposition step that converts characterencoded 52 52 files into their bitstream representation; 53 this transposition takes $O(\log w)$ work per loop iteration. 54 55 In total, this is $O(m \log \sigma + \log w)$ work per iteration. 53 this transposition takes $O(\log \sigma \log \log \sigma)$ work per loop 54 iteration. 55 56 In total, this is $O(m \log \sigma + \log w + \log \sigma \log \log \sigma)$ 57 work per iteration. 56 58 In current practice, we have $\log w$ around 8 (for 256bit architectures), 57 59 and $\log \sigma$ at least 7. … … 59 61 with current and foreseeable technologywe do not expect to see $\log w$ 60 62 skyrocket. 61 So we can absorb the $\log w$ term and state the work as $O(m \log \sigma )$ per62 iteration.63 So we can absorb the $\log w$ term and state the work as $O(m \log \sigma + \log 64 \sigma \log \log \sigma)$ per iteration. 63 65 We multiply this by $O(\frac{n}{w})$ iterations to give 64 $O(\frac{n m\log \sigma}{w})$ work.66 $O(\frac{n (m + \log \log \sigma) \log \sigma}{w})$ work. 65 67 66 68 We further note that all of the work in the loop is … … 85 87 we expect few cache misses with bitstream method. 86 88 87 Compare this with NFA methods. 89 We compare this with base NFA methods; by ``base'' here we mean NFA methods 90 that do not skip input characters. 91 The performance of inputskipping methods can be approximated by first analyzing 92 the performance of the base method and then multiplying this by the 93 expected fraction of examined (nonskipped) input. 94 88 95 In the base NFA method, a state set of approximately $m$ states is kept 89 96 as a bit … … 192 199 $4*\frac{m}{4}$, or simply $m$, cycles. 193 200 194 Our method, on the other hand, takes time $O(\frac{m \log \sigma}{w})$ per 195 input character, where the constant inside the bigOh is approximately 2. 201 Our method, on the other hand, takes time $O(\frac{m \log \sigma + \log \log 202 \sigma \log \sigma}{w})$ per input character, where the constant inside the 203 bigOh is approximately 2 for the first part of the numerator and 6 for the 204 second part. 196 205 Furthermore, we expect no cache misses due to the regular stride of our memory 197 206 accesses. 198 For ASCII, this time becomes at most $2 \frac{7m}{w} = \frac{14m}{w}$ cycles 199 per character. 200 This is faster than the small$k$ NFA methods by a factor of $w/14$. 201 For processors with a 128bit word, this is about one order of magnitude. 202 203 204 205 206 207 For UTF8 (or ASCII), this time becomes at most $2 \frac{8m}{w} + 6 \frac{24}{w} 208 = \frac{16m + 144}{w}$ cycles per character. 209 210 For processors with a 128bit word, this is $\frac{16m + 144}{128} = \frac{m}{8} 211 + \frac{9}{8}$ cycles per character. 212 Comparing this with the at least $m$ cycles per character of the base NFA 213 methods, we expect these NFA methods to be competitive with our method only when 214 the size of the regular expression is $1$. 215 As the size of the regular expression increases, we expect our method 216 to approach a factorof8 improvement over the base NFA methods. 217 218 In theory, our improvement factor should scale closely with the word size; so 219 that for processors with a 256bit word, we expect an 16x improvement, and for 220 processors with a 512bit word, we expect a 32x improvement. 221 In practice, there is some reduction in these improvement factors due to 222 the instruction sets of largerwidth processors not yet being as versatile as 223 those of 128bit processors. (For example, fullwidth addition is not yet 224 supported.) 225 226 227 228 229
Note: See TracChangeset
for help on using the changeset viewer.