Changeset 3509 for docs/Working


Ignore:
Timestamp:
Sep 15, 2013, 7:08:21 PM (6 years ago)
Author:
cameron
Message:

updates for SSE2 study

Location:
docs/Working/re
Files:
1 added
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/avx2.tex

    r3504 r3509  
    2121\begin{center}
    2222\begin{verbatim}
    23 void add_ci_co(bitblock256_t x, bitblock256_t y, carry_t carry_in, carry_t & carry_out, bitblock256_t & sum) {
    24   bitblock256_t all_ones = simd256<1>::constant<1>();
    25   bitblock256_t gen = simd_and(x, y);
    26   bitblock256_t prop = simd_xor(x, y);
    27   bitblock256_t partial_sum = simd256<64>::add(x, y);
    28   bitblock256_t carry = simd_or(gen, simd_andc(prop, partial_sum));
    29   bitblock256_t bubble = simd256<64>::eq(partial_sum, all_ones);
     23void add_ci_co(bitblock_t x, bitblock_t y, carry_t carry_in, carry_t & carry_out, bitblock_t & sum) {
     24  bitblock_t all_ones = simd256<1>::constant<1>();
     25  bitblock_t gen = simd_and(x, y);
     26  bitblock_t prop = simd_xor(x, y);
     27  bitblock_t partial_sum = simd256<64>::add(x, y);
     28  bitblock_t carry = simd_or(gen, simd_andc(prop, partial_sum));
     29  bitblock_t bubble = simd256<64>::eq(partial_sum, all_ones);
    3030  uint64_t carry_mask = hsimd256<64>::signmask(carry) * 2 + convert(carry_in);
    3131  uint64_t bubble_mask = hsimd256<64>::signmask(bubble);
  • docs/Working/re/sse2.tex

    r3508 r3509  
    1 \section{Commodity SIMD Implementation and Experimental Evaluation}\label{sec:SSE2}
    2 
    3 
    4 \subsection{Implementation Notes}
    5 \subsection{Evaluation Methodology}
    6 
    7 
    8 
    9 \paragraph{GREP Implementations:}
    10 We evaluate our Bitwise Data Parallel implementation against GNU grep version 2.10 and nrgrep version 1.12. GNU grep is a popular open-source grep implementation that uses DFAs. NR-grep is one of the strongest competitors in regular expression matching performance. It uses an NFA-based approach.
    11 
    12 \paragraph{Expressions:}
    13 Each GREP implementation is tested with the five regular expressions in Table \ref{RegularExpressions}.  Xquote matches any of the representations of a quote in xml.  It is run on roads-2.gml, a 11,861,751 byte gml data file. The other four expressions are taken from Benchmark of Regex Libraries[?] and are all run on a concatenated version of the linux howto[?] which is 39,422,105 bytes in length.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
    14 
    15 
    16 
    17 
    18        
     1\section{SSE2 Implementation and Evaluation}\label{sec:SSE2}
     2
     3\paragraph{Implementation Notes.}
     4Our regular expression compiler directly uses the Parabix tool chain
     5to compile regular expression into SSE2-based implementations.
     6Our compiler essentially scripts three other compilers to perform
     7this work: the Parabix character class compiler to determine basic
     8bit stream equations for each of the character classes encountered
     9in a regular expression, the Pablo bitstream equation compiler which
     10converts equations to block-at-a-time C++ code for 128-bit SIMD, and
     11gcc 4.6.3 to generate the binaries.   The Pablo output is combined
     12with a {\tt grep_template.cpp} file that arranges to read input
     13files, break them into segments, and print out or count matches
     14as they are encountered.
     15
     16\paragraph{Comparative Implementations.}
     17We evaluate our bitwise data parallel implementation versus several alternatives.   We report data for two of these: GNU grep version 2.10
     18and nrgrep version 1.12. GNU grep is a popular open-source grep implementation that uses DFAs,
     19as well as heuristics for important special cases.
     20The NFA class is represented by nrgrep, one of the
     21strongest competitors in regular expression matching performance.
     22We also considered agrep 3.41 as and alternative NFA-based implementation
     23and pcregrep 8.12 as a backtracking implementation, but do not
     24report data for them.  The agrep implementation does not support
     25some of the common regular expression syntax feature and is limited to
     26patterns of at most 32 characters.   As a backtracking implementation,
     27pcregrep supports more regular expression features, but is not
     28competitive in performance in any example we tested.
     29
     30We performed our SSE2 performance study using an Intel Core i7 quad-core (Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core), 32+32 kB (per core) L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
     31
    1932\begin{table*}[htbp]
    2033\begin{center}
     
    3750
    3851
    39 
    40 
    41 
    42 
    43 
    44 \subsection{Comparison}
    45 
    46 Figure \ref{fig:SSECyclesPerByte} shows cycles per byte.  For the three most complicated expressions, the bitstreams implementation had the lowest cycle count.
    47 
    48 For the @ expression, Grep had very slightly better performance than the bitstreams implementation.  The bitstreams implementation has a fair amount of overhead for transposition that hurts relative performance for very simple expressions.
    49 
    50 For the Date expression, NRGrep is able to skip large portions of the input file since every time it encounters a character that can't appear in a date, it can skip past six characters.  For the more compicated expressions, it loses this advantage.
    51 
    52 The bitstreams implementation has fairly consistent performance.  As the regular expression complexity increases, the cycle count increases slowly.  The largest difference in cycles per byte for the bitstreams implementation is a ratio of 2 to 1.  The same cannot be said of Grep or NRGrep.  NRGrep uses more than 10 times the cycles per byte for Xquote than for Date.  The number of cycles per byte that Grep uses for URIOrEmail is almost 900 times as many as it uses for @.
    53 
     52\paragraph{Test expressions.}
     53Each grep implementation is tested with the five regular expressions in Table \ref{RegularExpressions}.  Xquote matches any of the representations of a
     54single or double quote character occuring in XML content.  It is run on roads-2.gml, a 11,861,751 byte gml data file. The other four expressions are taken from Benchmark of Regex Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml] and are all run on a concatenated version of the linux howto which is 39,422,105 bytes in length.  @ simply matches the "@" character.  It demonstrates the overhead involved in matching the simplest regular expression.  Date, Email, and URIOrEmail provide examples of common uses for regular expression matching.
     55
     56
     57\paragraph{Results.}
    5458\begin{figure}
    5559\begin{center}
     
    8488\end{figure}
    8589
    86 Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.  The relative values mirror cycles per byte.  The bitstreams implementation continues to show consistent performance.  This is especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle}, which shows instructions per cycle.  The bitstreams implementation has almost no variation in the instructions per cycle.  Both Grep and NRGrep have considerably more variation based on the input regular expression.
    87  
     90Figure \ref{fig:SSECyclesPerByte} shows CPU cycles per input byte for each expression on each implementation.
     91For the three most complicated expressions, the bitstreams implementation had the lowest cycle count, while grep
     92was an order of magnitude slower.
     93
     94For the @ expression, GNU grep had very slightly better performance than the bitstreams implementation.  The bitstreams implementation has a fixed overhead
     95for transposition that hurts relative performance for very simple expressions.
     96
     97For the Date expression, nrgrep is able to skip large portions of the input file since every time it encounters a character that can't appear in a date, it can skip past six characters.  For the more compicated expressions, it loses this advantage.
     98
     99The bitstreams implementation has fairly consistent performance.  As the regular expression complexity increases, the cycle count increases slowly.  The largest difference in cycles per byte for the bitstreams implementation is a ratio of 2 to 1.  The same cannot be said of grep or nrgrep.  The
     100latter uses more than 10 times the cycles per byte for Xquote than for Date.  The number of cycles per byte that gGrep uses for URIOrEmail is almost 900 times as many as it uses for @.
     101
    88102\begin{figure}
    89103\begin{center}
     
    117131\end{figure}
    118132
     133Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte.  The relative values mirror cycles per byte.  The bitstreams implementation continues to show consistent performance.  This is especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle}, which shows instructions per cycle.  The bitstreams implementation has almost no variation in the instructions per cycle.  Both grep and nrGrep have considerably more variation based on the input regular expression.
     134 
    119135
    120136\begin{figure}
     
    148164\end{figure}
    149165
    150 Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.  The bitstreams implementation remains consistent here.  NRGrep and Grep have branch miss rates that vary significantly with different regular expressions. 
    151 
    152 Overall, our performance is considerably better than both NRGrep and Grep for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
    153 
    154166\begin{figure}
    155167\begin{center}
     
    182194\caption{Branch Misses per Kilobyte}\label{fig:SSEBranchMisses}
    183195\end{figure}
    184 
     196Figure \ref{fig:SSEBranchMisses} shows the branch misses per kilobyte.  The bitstreams implementation remains consistent here.  Each of nrgrep and grep have branch miss rates that vary significantly with different regular expressions. 
     197
     198Overall, our performance is considerably better than both nrgrep and grep for the more complicated expressions that were tested.  Also, our performance scales smoothly with regular expression complexity so it can be expected to remain better for complicated expressions in general.
     199
     200
Note: See TracChangeset for help on using the changeset viewer.