Ignore:
Timestamp:
Feb 24, 2014, 2:24:02 AM (5 years ago)
Author:
cameron
Message:

Substitute gre2p for grep; cite GPU long-stream add; remove excess figures

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex

    r3637 r3642  
    442442is far from ideal.
    443443
    444 We have developed a general model using SIMD methods for constant-time
    445 long-stream addition up to 4096 bits.   
     444We use the following general model using SIMD methods for constant-time
     445long-stream addition up to 4096 bits.   Related solutions have been
     446independently developed on GPUs
     447(\verb`http://stackoverflow.com/questions/12957116/` verb`-integer-addition-with-cuda`),
     448however our model is intended to be a more broady applicable abstraction.
    446449We assume the availability of the following SIMD/SIMT operations
    447450operating on vectors of $f$ 64-bit fields.
     
    586589the 64 work groups.  Each work group carries out the regular
    587590expression matching operations 4096 bytes at a time using SIMT
    588 processing.   We were able to adapt our long-stream addition
    589 model to the GPU as shown in Figure \ref{fig:GPUadd}.  The GPU
    590 does not directly support the mask and spread operations,
    591 but we are able to simulate them using shared memory.
     591processing.   Although the GPU
     592does not directly support the mask and spread operations required
     593by our long-stream addition model,
     594we are able to simulate them using shared memory.
    592595Each thread maintains
    593596its own carry and bubble values in shared memory and performs
    594597synchronized updates with the other threads using a six-step
    595598parallel-prefix style process.  Others have implemented
    596 long-stream addition on the GPU using similar techniques.
    597 
     599long-stream addition on the GPU using similar techniques,
     600as noted previously.
    598601
    599602We performed our test on an AMD Radeon HD A10-6800K APU machine.
     
    624627of a given input.
    625628
    626 
    627 \begin{figure*}[tbh]
    628 \begin{center}\small
    629 \begin{verbatim}
    630 inline BitBlock adc(int idx, BitBlock a, BitBlock b, __local BitBlock *carry, _
    631                     _local BitBlock *bubble, BitBlock *group_carry, const int carryno){
    632         BitBlock carry_mask;
    633         BitBlock bubble_mask;
    634 
    635         BitBlock partial_sum = a+b;
    636         BitBlock gen = a&b;
    637         BitBlock prop = a^b;
    638         carry[idx] = ((gen | (prop & ~partial_sum))&CARRY_BIT_MASK)>>(WORK_GROUP_SIZE-1-idx);
    639         bubble[idx] = (partial_sum + 1)? 0:(((BitBlock)1)<<idx);
    640        
    641         barrier(CLK_LOCAL_MEM_FENCE);
    642         for(int offset=WORK_GROUP_SIZE/2; offset>0; offset=offset>>1){
    643                 carry[idx] = carry[idx]|carry[idx^offset];
    644                 bubble[idx] = bubble[idx]|bubble[idx^offset];
    645                 barrier(CLK_LOCAL_MEM_FENCE);
    646         }
    647        
    648         carry_mask = (carry[0]<<1)|group_carry[carryno];
    649         bubble_mask = bubble[0];
    650        
    651         BitBlock s = (carry_mask + bubble_mask) & ~bubble_mask;
    652         BitBlock inc = s | (s-carry_mask);
    653         BitBlock rslt = partial_sum + ((inc>>idx)&0x1);
    654         group_carry[carryno] = (carry[0]|(bubble_mask & inc))>>63;
    655         return rslt;
    656 }
    657 \end{verbatim}
    658 
    659 \end{center}
    660 \caption{OpenCL 4096-bit Addition}
    661 \label{fig:GPUadd}
    662 \end{figure*}
    663 
    664 
    665629\begin{figure}
    666630\begin{center}
Note: See TracChangeset for help on using the changeset viewer.