Jun 24, 2014, 4:00:15 AM (5 years ago)

Various cleanups, drop instructions/byte, branch miss figures

1 edited


  • docs/Working/re/sse2.tex

    r3883 r3889  
    1111Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
    1212Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
    13 URIOrEmail      & \verb`(([a-zA-Z][a-zA-Z0-9]*)://|mailto:)([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`         \\ \hline     
    14 HexBytes                & \verb`[ ](0x)?([a-fA-F0-9][a-fA-F0-9])+[.:,?! ]`              \\ \hline
     13URI     & \verb`(([a-zA-Z][a-zA-Z0-9]*)://|mailto:)([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`         \\ \hline     
     14Hex             & \verb`[ ](0x)?([a-fA-F0-9][a-fA-F0-9])+[.:,?! ]`              \\ \hline
    1515StarHeight              & \verb`[A-Z]((([a-zA-Z]*a[a-zA-Z]*[ ])*[a-zA-Z]*e[a-zA-Z]*[ ])*[a-zA-Z]*s[a-zA-Z]*[ ])*[.?!]`          \\ \hline
    7474This expression demonstrates the overhead involved
    7575in matching the simplest possible regular expression, a single character.
    76 Date, Email, and URIOrEmail provide examples of commonly used regular expression.
     76Date, Email, and URI provide examples of commonly used regular expression.
    7777This set of expressions were modified from the \textit{Benchmark of Regex
    78 Libraries} (http://lh3lh3.users.sourceforge.net/reb.shtml).
    79 HexBytes matches delimited byte strings in hexadecimal notation, and
     79Hex matches delimited byte strings in hexadecimal notation, and
    8080enforces the constraint that the number of hex digits is even. This
    8181expression illustrates the performance
    8282of a repetition operator implemented using
    83 a while loop.  StarHeight is an expression designed to further
    84 stress our while loop implementation with 4 levels of Kleene closure.
     83a while loop in our system.  StarHeight is an artificial expression designed to further
     84stress while loop implementation with 4 levels of Kleene closure.
    8585All tests were run on a version
    8686of a \textit{Linux 3Dfx howto}
    9797ylabel=Cycles per Byte,
    98 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    9999tick label style={font=\tiny},
    100100enlarge x limits=0.15,
    156156lines matching the Email regex.
    158 The URIorEmail expression illustrates the performance of the
     158The URI expression illustrates the performance of the
    159159grep programs with additional regular expression complexity.
    160160As expressions get larger, the number of steps required by
    161161the Parabix implementation increases, so the performance
    162162advantage drops to about 4.5X over nrgrep and 19X over gre2p.
    163 32557 lines are matched by the URIorEmail regex.
    165 The results for HexBytes expression illustrate Parabix performance
    166 involving a Kleene-+ operator compiled to a while loop. 
    167 Our implementation uses just 1.6 cycles per byte to find the
     16332557 lines are matched by the URI regex.
     165The results for Hex expression illustrate the bitstreams performance
     166in the case of a Kleene-+ operator compiled to a while loop.
     167Performance is nevertheless quite good;
     168our implementation uses just 1.6 cycles per byte to find the
    168169130,243 matching lines.    The gre2p program performs
    169170quite poorly here, slower than the Parabix implementation
    170171by about 70X, while nrgrep is about 5.5X slower.
    172 StarHeight is an artificial expression created to stress the Parabix
    173 implementation with a triply-nested while loop.   The performance
    174 does drop off, maintaining only a 2X advantage over nrgrep.
    179 \begin{figure}
    180 \begin{center}
    181 \begin{tikzpicture}
    182 \begin{axis}[
    183 xtick=data,
    184 ylabel=Instructions per Byte,
    185 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    186 tick label style={font=\tiny},
    187 enlarge x limits=0.15,
    188 %enlarge y limits={0.15, upper},
    189 ymin=0,
    190 legend style={at={(0.5,-0.15)},
    191 anchor=north,legend columns=-1},
    192 ymax=110,
    193 ybar,
    194 bar width=7pt,
    195 ]
    196 \addplot
    197 file {data/instructions-bitstreams.dat};
    198 \addplot
    199 file {data/instructions-nrgrep112.dat};
    200 \addplot
    201 file {data/instructions-gre2p.dat};
    203 \legend{bitstreams,nrgrep,gre2p,Annot}
    204 \end{axis}
    205 \end{tikzpicture}
    206 \end{center}
    207 \caption{Instructions per Byte}\label{fig:SSEInstructionsPerByte}
    208 \end{figure}
    210 Figure~\ref{fig:SSEInstructionsPerByte} illustrates
    211 relative performance reported in instructions per byte.
    212 Figure~\ref{fig:SSEInstructionsPerByte}
    213 mirrors Figure~\ref{fig:SSECyclesPerByte}. The bitstreams
    214 implementation demonstrates stable performance
    215 characteristics across each of the test expressions. The
    216 gre2p implementation also shows
    217 relatively stable characteristics, whereas,
    218 nrgrep exhibits greater variability.
     173A more complex triply-nested repetition structure is required by
     174the bitstreams implementation of the StarHeight expression. 
     175In this case, there is a noticeable drop off in the
     176performance advantage of the bitstreams implementation over
     177the nrgrep and gre2p.   Nevertheless a 2X
     178advantage over nrgrep is maintained.
    225185ylabel=Instructions per Cycle,
    226 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    227187tick label style={font=\tiny},
    228188enlarge x limits=0.15,
    250 \begin{figure}
    251 \begin{center}
    252 \begin{tikzpicture}
    253 \begin{axis}[
    254 xtick=data,
    255 ylabel=Branch Misses per Instruction,
    256 xticklabels={@,Date,Email,URIorEmail,HexBytes,StarHeight},
    257 tick label style={font=\tiny},
    258 enlarge x limits=0.15,
    259 %enlarge y limits={0.15, upper},
    260 ymin=0,
    261 legend style={at={(0.5,-0.15)},
    262 anchor=north,legend columns=-1},
    263 ybar,
    264 bar width=7pt,
    265 ]
    266 \addplot
    267 file {data/branch-misses-bitstreams.dat};
    268 \addplot
    269 file {data/branch-misses-nrgrep112.dat};
    270 \addplot
    271 file {data/branch-misses-gre2p.dat};
    273 \legend{bitstreams,nrgrep,gre2p,Annot}
    274 \end{axis}
    275 \end{tikzpicture}
    276 \end{center}
    277 \caption{Branch Misses per Instruction}\label{fig:SSEBranchMisses}
    278 \end{figure}
    279 Figure \ref{fig:SSEBranchMisses} reports branch misses per kilobyte.
    280 The bitstreams and gre2p implementations remain consistent.
    281 Significant variation is seen with nrgrep.
     210Figure \ref{fig:SSEInstructionsPerCycle} shows the efficiency of
     211processor utilization
     212achieved by the three programs on each of the test expression in
     213terms of instructions per cycle (IPC).
     214For the first four expressions, in particular,
     215the bitstreams implementation uses the processor resources quite efficiently,
     216avoiding penalties due to cache misses and branch mispredictions.   
     217However, with the while loop
     218structures in processing the Hex and StarHeight expressions, branch
     219mispredictions increase considerably and there is a noticeable
     220drop-off in IPC.  Both the gre2p and nrgrep suffer from significant
     221penalties due to 
     222mispredictions in the character-skipping logic and cache misses in
     223table lookups.  The performance of nrgrep, in particular drops off
     224with the growth in regulare expression complexity.
    283226Overall, the bitstreams implementation significantly outperformed
    284227both nrgrep and gre2p. In addition, the performance of bitstreams
    285 scaled smoothly with regular expression complexity. As Section~\ref{sec:analysis}
    286 describes, we anticipate that the performance of bitstreams
    287 will to continue to scale smoothly with
    288 regular expression complexity.
     228scales well with regular expression complexity.
Note: See TracChangeset for help on using the changeset viewer.