Changeset 3666 for docs/Working

Mar 4, 2014, 4:37:20 PM (5 years ago)

Section re-written, corrected values, tense, grammar.

1 edited


  • docs/Working/re/sse2.tex

    r3662 r3666  
    3131gcc 4.6.3 to generate the binaries.   The Pablo output is combined
    3232with a {\tt grep\_template.cpp} file that arranges to read input
    33 files, break them into segments, and print out or count matches
     33files, break them into segments, and print or count matches
    3434as they are encountered.
    64 \paragraph{Test expressions.}
    65 Each grep implementation is tested with the five regular expressions
    66 in Table \ref{RegularExpressions}.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
    67 HexBytes matches delimited. They are taken from Benchmark of Regex
    68 Libraries [].
    69 HexBytes matches delimited byte strings in hexadecimal notation,
    70 enforcing the constraint that the number of hex digits is even.   This
    71 expression shows performance of a repetition operator implemented with
     64\paragraph{Test Expressions.}
     65Each grep implementation was evaluated
     66against the five regular expressions shown
     67in Table \ref{RegularExpressions}. 
     68@ matches the ampersand character.
     69This expression demonstrates the overhead involved
     70in matching the simplest possible regular expression, a single character.
     71Date, Email, and URIOrEmail provide examples of commonly used regular expression.
     72This set of expressions were drawn from the \textit{Benchmark of Regex
     73Libraries} (
     74HexBytes matches delimited byte strings in hexadecimal notation, and
     75enforces the constraint that the number of hex digits is even. This
     76expression illustrates the performance
     77of a repetition operator implemented using
    7278a while loop.
    73 All tests are run on a concatenated version of the linux howto which is 39,422,105 bytes in length. 
     79All tests were run on a version
     80of a \textit{Linux howto}
     81file concatenated to a length of
     8239,422,105 bytes. 
    109 Figure \ref{fig:SSECyclesPerByte} shows CPU cycles per input byte for
    110 each expression on each implementation.
    111 Both the bitstreams and the nrgrep implementation performed
    112 much better than the DFA-based gre2p, with the bitstreams
    113 implementation
    114 consistently an order of magnitude faster.
    116 For the Date expression, nrgrep outperforms our bitstreams
    117 implementation here.  In this case, nrgrep takes advantage of skipping:
    118 every time it encounters a character that can't appear in a date, it
    119 can skip past six characters. 
    120 For the more compicated expressions, it loses this advantage.
    122 The bitstreams implementation has fairly consistent performance.  As
    123 the regular expression complexity increases, the cycle count increases
    124 slowly.  The largest difference in cycles per byte for the bitstreams
    125 implementation is a ratio of 2 to 1.  The same cannot be said of gre2p
    126 or nrgrep, with gre2p exhibiting a 4 to 1 range over these expressions
    127 and
    128 nrgrep exhibiting a 10 to 1 range.
     118Figure~\ref{fig:SSECyclesPerByte} compares
     119each of the grep implementations,
     120with relative performance reported in CPU cycles per byte.
     121Both the bitstreams and the nrgrep implementation
     122outperformed the DFA-based gre2p, with the bitstreams
     123implementation consistently an order of magnitude faster.
     124For the Date expression, nrgrep outperformed the bitstreams grep
     125implementation.  In this case, nrgrep
     126takes advantage of character skipping. That is,
     127each time nrgrep encounters a character
     128that cannot appear in a date
     129it jumps six character positions rather
     130than searching every character in the input text.   
     131For the more complicated expressions nrgrep loses
     132this performance advantage.
     134The bitstreams implementation demonstrated stable performance
     135across each of the test expressions, with increases in
     136regular expression complexity producing
     137relatively small increases in CPU cycle counts.
     138The largest ratio in performance for the bitstreams
     139implementation was 2 to 1. In comparison,
     140gre2p exhibited a ratio of 4 to 1,
     141whereas nrgrep exhibited a ratio of 10 to 1.
    161 Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.
    162 The relative values mirror cycles per byte.  The bitstreams
    163 implementation continues to show consistent performance.  This is
    164 especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle},
    165 which shows instructions per cycle.  Both the bitstreams and
    166 gre2p implementations show relatively stable IPC characteristics,
    167 while nrgrep exhibits considerably more variation.
     174Figure~\ref{fig:SSEInstructionsPerByte} illustrates
     175relative performance reported in instructions per byte.
     177mirrors Figure~\ref{fig:SSECyclesPerByte}. The bitstreams
     178implementation demonstrates stable performance
     179characteristics across each of the test expressions. The
     180gre2p implementation also shows
     181relatively stable characteristics, whereas,
     182nrgrep exhibits greater variability.
    227241\caption{Branch Misses per Instruction}\label{fig:SSEBranchMisses}
    229 Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.
    230 The bitstreams and gre2p implementations remains consistent here.
     243Figure \ref{fig:SSEBranchMisses} reports branch misses per kilobyte.
     244The bitstreams and gre2p implementations remain consistent.
    231245Significant variation is seen with nrgrep.
    233 Overall, our performance is considerably better than both nrgrep and gre2p for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
     247Overall, the bitstreams implementation significantly outperformed
     248both nrgrep and gre2p. In addition, the performance of bitstreams
     249scaled smoothly with regular expression complexity. As Section~\ref{sec:analysis}
     250describes, we anticipate that the performance of bitstreams
     251will to continue to scale smoothly with
     252regular expression complexity.
Note: See TracChangeset for help on using the changeset viewer.