Changeset 3642 for docs/Working


Ignore:
Timestamp:
Feb 24, 2014, 2:24:02 AM (5 years ago)
Author:
cameron
Message:

Substitute gre2p for grep; cite GPU long-stream add; remove excess figures

Location:
docs/Working/re
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/avx2.tex

    r3637 r3642  
    7575file {data/avxinstructions3.dat};
    7676
    77 \legend{Bitstreams,NRGrep,Grep,Annot}
     77\legend{Bitstreams,NRGrep,Gre2p,Annot}
    7878\end{axis}
    7979\end{tikzpicture}
     
    126126file {data/avxcycles3.dat};
    127127
    128 \legend{Bitstreams,NRGrep,Grep,Annot}
     128\legend{Bitstreams,NRGrep,Gre2p,Annot}
    129129\end{axis}
    130130\end{tikzpicture}
     
    136136instruction count was reflected in a significant speed-up
    137137in the bitstreams implementation.  However, the speed-up was
    138 considerably less than expected.  As shown in \ref{fig:AVXIPC}
    139 the AVX2 version has lost some of the superscalar efficiency
    140 of the SSE2 code.   This is a performance debugging issue
    141 that we have yet to resolve.
    142 
    143 
    144 \begin{figure}
    145 \begin{center}
    146 \begin{tikzpicture}
    147 \begin{axis}[
    148 xtick=data,
    149 ylabel=Change in Instructions per Cycle,
    150 xticklabels={@,Date,Email,URIorEmail,HexBytes},
    151 tick label style={font=\tiny},
    152 enlarge x limits=0.15,
    153 %enlarge y limits={0.15, upper},
    154 ymin=0,
    155 legend style={at={(0.5,-0.15)},
    156 anchor=north,legend columns=-1},
    157 ybar,
    158 bar width=7pt,
    159 ]
    160 \addplot
    161 file {data/avxipc1.dat};
    162 \addplot
    163 file {data/avxipc2.dat};
    164 \addplot
    165 file {data/avxipc3.dat};
    166 
    167 
    168 
    169 \legend{Bitstreams,NRGrep,Grep,Annot}
    170 \end{axis}
    171 \end{tikzpicture}
    172 \end{center}
    173 \caption{Change in Instructions Per Cycle With AVX2}\label{fig:AVXIPC}
    174 \end{figure}
    175 
    176 Overall, the results on our AVX2 machine were quite good,
     138considerably less than expected. 
     139The bitstreams code  on AVX2 has suffered from a considerable
     140reduction in instructions per cycle compared to the SSE2
     141implementation, possibly indicating
     142that our grep implementation has become memory-bound.
     143Nevertheless, the overall results on our AVX2 machine were quite encouraging,
    177144demonstrating very good scalability of the bitwise data-parallel approach.
    178145
  • docs/Working/re/re-main.tex

    r3637 r3642  
    442442is far from ideal.
    443443
    444 We have developed a general model using SIMD methods for constant-time
    445 long-stream addition up to 4096 bits.   
     444We use the following general model using SIMD methods for constant-time
     445long-stream addition up to 4096 bits.   Related solutions have been
     446independently developed on GPUs
     447(\verb`http://stackoverflow.com/questions/12957116/` verb`-integer-addition-with-cuda`),
     448however our model is intended to be a more broady applicable abstraction.
    446449We assume the availability of the following SIMD/SIMT operations
    447450operating on vectors of $f$ 64-bit fields.
     
    586589the 64 work groups.  Each work group carries out the regular
    587590expression matching operations 4096 bytes at a time using SIMT
    588 processing.   We were able to adapt our long-stream addition
    589 model to the GPU as shown in Figure \ref{fig:GPUadd}.  The GPU
    590 does not directly support the mask and spread operations,
    591 but we are able to simulate them using shared memory.
     591processing.   Although the GPU
     592does not directly support the mask and spread operations required
     593by our long-stream addition model,
     594we are able to simulate them using shared memory.
    592595Each thread maintains
    593596its own carry and bubble values in shared memory and performs
    594597synchronized updates with the other threads using a six-step
    595598parallel-prefix style process.  Others have implemented
    596 long-stream addition on the GPU using similar techniques.
    597 
     599long-stream addition on the GPU using similar techniques,
     600as noted previously.
    598601
    599602We performed our test on an AMD Radeon HD A10-6800K APU machine.
     
    624627of a given input.
    625628
    626 
    627 \begin{figure*}[tbh]
    628 \begin{center}\small
    629 \begin{verbatim}
    630 inline BitBlock adc(int idx, BitBlock a, BitBlock b, __local BitBlock *carry, _
    631                     _local BitBlock *bubble, BitBlock *group_carry, const int carryno){
    632         BitBlock carry_mask;
    633         BitBlock bubble_mask;
    634 
    635         BitBlock partial_sum = a+b;
    636         BitBlock gen = a&b;
    637         BitBlock prop = a^b;
    638         carry[idx] = ((gen | (prop & ~partial_sum))&CARRY_BIT_MASK)>>(WORK_GROUP_SIZE-1-idx);
    639         bubble[idx] = (partial_sum + 1)? 0:(((BitBlock)1)<<idx);
    640        
    641         barrier(CLK_LOCAL_MEM_FENCE);
    642         for(int offset=WORK_GROUP_SIZE/2; offset>0; offset=offset>>1){
    643                 carry[idx] = carry[idx]|carry[idx^offset];
    644                 bubble[idx] = bubble[idx]|bubble[idx^offset];
    645                 barrier(CLK_LOCAL_MEM_FENCE);
    646         }
    647        
    648         carry_mask = (carry[0]<<1)|group_carry[carryno];
    649         bubble_mask = bubble[0];
    650        
    651         BitBlock s = (carry_mask + bubble_mask) & ~bubble_mask;
    652         BitBlock inc = s | (s-carry_mask);
    653         BitBlock rslt = partial_sum + ((inc>>idx)&0x1);
    654         group_carry[carryno] = (carry[0]|(bubble_mask & inc))>>63;
    655         return rslt;
    656 }
    657 \end{verbatim}
    658 
    659 \end{center}
    660 \caption{OpenCL 4096-bit Addition}
    661 \label{fig:GPUadd}
    662 \end{figure*}
    663 
    664 
    665629\begin{figure}
    666630\begin{center}
  • docs/Working/re/reference.bib

    r3625 r3642  
    343343  publisher={Wiley Online Library}
    344344}
     345
     346@misc{cox2010RE2,
     347  title={Regular expression matching in the wild},
     348  author={Cox, Russ},
     349  url={http://swtch.com/\~{}rsc/regexp/regexp3.html},
     350  year={2010}
     351}
  • docs/Working/re/sse2.tex

    r3637 r3642  
    11\section{SSE2 Implementation and Evaluation}\label{sec:SSE2}
     2
     3\begin{table*}[htbp]
     4\begin{center}
     5{
     6\footnotesize
     7\begin{tabular}{|l|l|}
     8\hline
     9Name            & Expression    \\ \hline   
     10@               & \verb`@`              \\ \hline     
     11Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
     12Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
     13URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
     14HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
     15\end{tabular}
     16}
     17\end{center}
     18\caption{Regular Expressions}
     19\label{RegularExpressions}
     20\end{table*}
     21
    222
    323\paragraph{Implementation Notes.}
     
    1535
    1636\paragraph{Comparative Implementations.}
    17 We evaluate our bitwise data parallel implementation versus several alternatives.   We report data for two of these: GNU grep version 2.10
    18 and nrgrep version 1.12. GNU grep is a popular open-source grep implementation that uses DFAs,
    19 as well as heuristics for important special cases.
     37We evaluate our bitwise data parallel implementation versus several
     38alternatives.   
     39We report data for two of these:
     40gre2p
     41%GNU grep version 2.10
     42and nrgrep version 1.12.
     43The gre2p program is a grep version implemented using the recently developed
     44RE2 regular expression library, using a systematic DFA-based approach
     45(as well as some NFA fallback techniques) \cite{cox2010RE2}.
    2046The NFA class is represented by nrgrep, one of the
    2147strongest competitors in regular expression matching performance.
    22 We also considered agrep 3.41 as and alternative NFA-based implementation
     48We also considered GNU grep 2.10, agrep 3.41 as and alternative NFA-based implementation
    2349and pcregrep 8.12 as a backtracking implementation, but do not
    24 report data for them.  The agrep implementation does not support
     50report data for them. 
     51GNU grep is a popular open-source implementation that is claimed to be
     52primarily DFA-based with
     53heuristics for important special cases.   
     54The agrep implementation does not support
    2555some of the common regular expression syntax feature and is limited to
    2656patterns of at most 32 characters.   As a backtracking implementation,
     
    2959
    3060We performed our SSE2 performance study using an Intel Core i7 quad-core (Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core), 32+32 kB (per core) L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
    31 
    32 \begin{table*}[htbp]
    33 \begin{center}
    34 {
    35 \footnotesize
    36 \begin{tabular}{|l|l|}
    37 \hline
    38 Name            & Expression    \\ \hline   
    39 @               & \verb`@`              \\ \hline     
    40 Date            & \verb`([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)`          \\ \hline     
    41 Email           & \verb`([^ @]+)@([^ @]+)`              \\ \hline
    42 URIOrEmail      & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)`           \\ \hline     
    43 HexBytes                & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])`              \\ \hline
    44 \end{tabular}
    45 }
    46 \end{center}
    47 \caption{Regular Expressions}
    48 \label{RegularExpressions}
    49 \end{table*}
    5061
    5162
     
    8899file {data/cycles3.dat};
    89100 
    90 \legend{Bitstreams,NRGrep,Grep,Annot}
     101\legend{Bitstreams,NRGrep,Gre2p,Annot}
    91102\end{axis}
    92103\end{tikzpicture}
     
    131142file {data/instructions3.dat};
    132143 
    133 \legend{Bitstreams,NRGrep,Grep,Annot}
     144\legend{Bitstreams,NRGrep,Gre2p,Annot}
    134145\end{axis}
    135146\end{tikzpicture}
     
    171182file {data/ipc3.dat};
    172183
    173 \legend{Bitstreams,NRGrep,Grep,Annot}
     184\legend{Bitstreams,NRGrep,Gre2p,Annot}
    174185\end{axis}
    175186\end{tikzpicture}
     
    201212file {data/branch-misses3.dat};
    202213
    203 \legend{Bitstreams,NRGrep,Grep,Annot}
     214\legend{Bitstreams,NRGrep,Gre2p,Annot}
    204215\end{axis}
    205216\end{tikzpicture}
Note: See TracChangeset for help on using the changeset viewer.