Changeset 3892 for docs/Working


Ignore:
Timestamp:
Jun 24, 2014, 4:19:50 PM (5 years ago)
Author:
cameron
Message:

Trim to 12 pages: eliminate Unicode section, avx2 addition figure.

Location:
docs/Working/re
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/avx2.tex

    r3889 r3892  
    1717
    1818
    19  \begin{figure}[tbh]
    20 
    21 \begin{center} \small
    22 \begin{verbatim}
    23 bitblock_t spread(uint64_t bits) {
    24   uint64_t s = 0x0000200040008001 * bits;
    25   uint64_t t = s & 0x0001000100010001;
    26   return _mm256_cvtepu16_epi64(t);
    27 }
    28 \end{verbatim}
    29 \end{center}
    30 \caption{AVX2 256-bit Spread}
    31 \label{fig:AVX2spread}
    32 
    33 \end{figure}
    34 
    35 \paragraph*{AVX2 256-Bit Addition} Bitstream addition at the 256-bit block size was implemented using the
     19\paragraph*{AVX2 256-Bit Addition} Bitstream addition
     20at the 256-bit block size was implemented using the
    3621long-stream addition technique.   The AVX2 instruction set directly
    3722supports the \verb#hsimd<64>::mask(X)# operation using
    3823the \verb#_mm256_movemask_pd#  intrinsic, extracting
    3924the required 4-bit mask directly from the 256-bit vector.
    40 The \verb#hsimd<64>::spread(X)# is slightly more
    41 problematic, requiring a short sequence of instructions
     25The \verb#hsimd<64>::spread(X)# is slightly more complex, requiring a short sequence of instructions
    4226to convert the computed 4-bit increment mask back
    43 into a vector of 4 64-bit values.   One method is to
    44 use the AVX2 broadcast instruction to make 4 copies
    45 of the mask to be spread, followed by appropriate
    46 bit manipulation.   Another uses multiplication to
    47 first spread to 16-bit fields as shown in Figure \ref{fig:AVX2spread}.
     27into a vector of 4 64-bit values.
    4828
    4929We also compiled new versions of the {\tt egrep} and {\tt nrgrep} programs
  • docs/Working/re/pact051-cameron.tex

    r3891 r3892  
    479479\input{compilation}
    480480
    481 \input{re-Unicode}
    482481
    483482\section{Block-at-a-Time Processing}\label{sec:blockwise}
Note: See TracChangeset for help on using the changeset viewer.