# Changeset 3475

Ignore:
Timestamp:
Sep 14, 2013, 7:50:36 AM (6 years ago)
Message:

Location:
docs/Working/re
Files:
 r3473 This approach relies on a bitwise data parallel view of text streams as well as a surprising use of addition to match runs of characters in a single step.  The closest previous runs of characters in a sin`gle step.  The closest previous work is that underlying bit-parallel XML parsing using 128-bit SSE2 SIMD technology together with a parallel scanning primitive also \section{Block-at-a-Time Processing}\label{sec:blockwise} The unbounded stream model of the previous section must of course be translated an implementation that proceeds block-at-a-time for realistic application.  In this, we primarily rely on the Pablo compiler of the Parabix toolchain \cite{lin2012parabix}.  Given input statements expressed as arbitrary-length bitstream equations, Pablo produces block-at-a-time C++ code that initializes and maintains all the necessary carry bits for each of the additions and shifts involved in the bitstream calculations. In the present work, our principal contribution to the block-at-a-time model is the technique of long-stream addition described below. Otherwise, we were able to use Pablo directly in compiling our SSE2 and AVX2 implementations.   Our GPU implementation required some scripting to modify the output of the Pablo compiler for our purpose. \paragraph*{Long-Stream Addition.}  The maximum word size for \item  \verb#simd<64>::eq(X, -1)# :  comparison of the 64-bit fields of \verb:x: each with the constant value -1 (all bits 1), producing an $f$-bit mask value an $f$-bit mask value, \item  \verb#hsimd<64>::mask(X)# : gathering the high bit of each 64-bit field into a single compressed $f$-bit mask value, and \item Compute an $f$-bit mask of all fields of {\tt R} that will overflow with an incoming carry bit. an incoming carry bit.  This is the {\em bubble mask}. $\text{\tt b} = \text{\tt simd<64>::eq(R, -1)}$ \item Determine an $f$-bit mask identifying the fields of {\tt R} that need to be incremented to produce the final sum.  This is just MatchStar. incremented to produce the final sum.  Here we find a new application of MatchStar! $\text{\tt i} = \text{\tt MatchStar(c*2, b)}$ \item Computed the final result {\tt Z}. This is the key step.  The mask {\tt c} of outgoing carries must be shifted one position ({\tt c*2}) so that each outgoing carry bit becomes associated with the next digit.  At the incoming position, the carry will increment the 64-bit digit.   However, if this digit is all ones (as signalled by the corresponding bit of bubble mask {\tt b}, then the addition will generate another carry.  In fact, if there is a sequence of digits that are all ones, then the carry must bubble through each of them.   This is just MatchStar! \item Compute the final result {\tt Z}. $\text{\tt Z} = \text{\tt simd<64>::add(R, simd<64>::spread(i))}$ \end{enumerate} Step 5 is the key step to determine which field-values of {\tt R} must be incremented due to incoming carries.  Note that {\tt c} is the bit mask of outgoing carries, so the multiplication by 2 is necessary to move each carry to its incoming position. At the incoming position, the carry will require an increment to the corresponding field in {\tt R}, as well as all further fields that are reachable by carry propagation through a consecutive run of ones. Figure \ref{fig:longadd} shows the operation of long-stream addition through an example. \begin{figure} \begin{center} \begin{tabular}{c||r|r|r|r|r|r|r|r||}\cline{2-9} {\tt X} & {\tt 19} & {\tt 31} & {\tt BA} & {\tt 4C} & {\tt 3D} & {\tt 45} & {\tt 21} & {\tt F1} \\ \cline{2-9} {\tt Y} & {\tt 22} & {\tt 12} & {\tt 45} & {\tt B3} & {\tt E2} & {\tt 16} & {\tt 17} & {\tt 36} \\ \cline{2-9} {\tt R} & {\tt 3B} & {\tt 43} & {\tt FF} & {\tt FF} & {\tt 1F} & {\tt 5B} & {\tt 38} & {\tt 27} \\ \cline{2-9} {\tt x} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 1} \\ \cline{2-9} {\tt y} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 0} \\ \cline{2-9} {\tt r} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 0} \\ \cline{2-9} {\tt c} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 1} \\ \cline{2-9} {\tt c*2} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 0} \\ \cline{2-9} {\tt b} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 0} & {\tt 0} \\ \cline{2-9} {\tt i} & {\tt 0} & {\tt 1} & {\tt 1} & {\tt 1} & {\tt 0} & {\tt 0} & {\tt 1} & {\tt 0} \\ \cline{2-9} {\tt Z} & {\tt 3B} & {\tt 44} & {\tt 0} & {\tt 0} & {\tt 1F} & {\tt 5B} & {\tt 39} & {\tt 27} \\ \cline{2-9} \end{tabular} \end{center} \caption{Long Stream Addition}\label{fig:longadd} \end{figure} Figure \ref{fig:longadd} illustrates the process.  In the figure, we illustrate the process with 8-bit fields rather than 64-bit fields and show all field values in hexadecimal notation.  Note that two of the individual 8-bit additions produce carries, while two others produce {\tt FF} values that generate bubble bits.  The net result is that four of the original 8-bit sums must be incremented to produce the long stream result. A slight extension to the process produces a long-stream full adder that can be used in chained addition.  In this case, the adder must take an additional carry-in bit {\tt p} and produce a carry-out bit {\tt q}. This may be accomplished by incorporating {\tt p} in calculating the increment mask in the low bit position, and then extracting the carry-out {\tt q} from the high bit position. $\text{\tt i} = \text{\tt MatchStar(c*2+p, b)}$ $\text{\tt q} = \text{\tt i >> f}$ \raggedbottom \section{Analytical Comparison with DFA and NFA Implementations}\label{sec:analysis} \begin{figure*}[tbh] \begin{center} \begin{code} static IDISA_ALWAYS_INLINE void add_ci_co(bitblock256_t x, bitblock256_t y, carry_t carry_in, carry_t & carry_out, bitblock256_t & sum) { \begin{verbatim} void add_ci_co(bitblock256_t x, bitblock256_t y, carry_t carry_in, carry_t & carry_out, bitblock256_t & sum) { bitblock256_t all_ones = simd256<1>::constant<1>(); bitblock256_t gen = simd_and(x, y); } \end{code} \end{verbatim} \end{center}