# Changeset 3649

Ignore:
Timestamp:
Feb 24, 2014, 2:39:02 PM (6 years ago)
Message:

mCleannups

Location:
docs/Working/re
Files:
2 edited

### Legend:

Unmodified
 r3645 We also have adapted our long-stream addition technique to perform 4096-bit additions using 64 threads working in lock-step SIMT fashion.  A similar technique is known to the GPGPU programming community\cite{}. SIMT fashion. \begin{figure}[tbh] We use the following general model using SIMD methods for constant-time long-stream addition up to 4096 bits.   Related solutions have been independently developed on GPUs (\verbhttp://stackoverflow.com/questions/12957116/ verb-integer-addition-with-cuda), however our model is intended to be a more broady applicable abstraction. long-stream addition up to 4096 bits.   Related GPGPU solutions have been independently developed\cite{Crovella2012}, however our model is intended to be a more broadly applicable abstraction. We assume the availability of the following SIMD/SIMT operations operating on vectors of $f$ 64-bit fields. the parallel units.   There are a variety of ways in which these facilities may be implemented depending on the underlying architecture; details of our AVX2 and GPU implementations underlying architecture; details of our AVX2 and GPGPU implementations are presented later. expression matching as shown herein, it seems reasonable to expect such instructions to become available.    Alternatively, it may be worthwhile to simply ensure that the \verb#hsimd<64>::mask(X)# be worthwhile to simply ensure that the \verb#hsimd<64>::mask(X)# and \verb#simd<64>::spread(X)# operations are efficiently supported. file {data/gputime.dat}; \legend{SSE2,AVX2,GPU,Annot} \legend{SSE2,AVX2,GPGPU,Annot} \end{axis} \end{tikzpicture}