Ignore:
Timestamp:
Feb 22, 2014, 11:15:13 AM (5 years ago)
Author:
lindanl
Message:

GPU section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex

    r3630 r3633  
    571571using bit-parallel data streams, we implemented a GPGPU version
    572572in OpenCL.   
    573 We arranged for 64 work groups each having 64
    574 threads.  Input files are divided in data parallel fashion among
     573We arranged for 64 work groups each having 64 threads.
     574The size of work group and number of work groups is choosen
     575to provide the best occupancy calculated by AMD App Profiler.
     576Input files are divided in data parallel fashion among
    575577the 64 work groups.  Each work group carries out the regular
    576578expression matching operations 4096 bytes at a time using SIMT
    577579processing.  Figure \ref{fig:GPUadd} shows our implementation
    578580of long-stream addition on the GPU.  Each thread maintains
    579 its own carry and bubble values and performs synchronized
    580 updates with the other threads using a six-step parallel-prefix
    581 style process.
     581its own carry and bubble values in shared memroy and performs
     582synchronized updates with the other threads using a six-step
     583parallel-prefix style process.
     584
     585
     586We performed our test on an AMD Radeon HD A10-6800K APU machine.
     587On the AMD Fusion systems, the input buffer is allocated in
     588pinned memory to take advantage of the zero-copy memory regions
     589where data can be read directly into this region by CPU
     590and also accessed by GPU for further processing. Therefore,
     591the expensive data transferring time that needed by traditional
     592discrete GPUs is hinden and we compare only the kernel execution
     593time with our SSE2 and AVX mplementations as shown in Figure
     594\ref{fig:SSE-AVX-GPU}. The GPU version gives 30\% to 55\% performance
     595improvement over SSE version and 10\% to 40\% performance
     596improvement over AVX version.
     597
    582598
    583599\begin{figure*}[tbh]
     
    619635
    620636
    621 Our GPU test machine was an AMD A10-6800K APU with Radeon(tm) HD Graphics.
    622 Figure \ref{fig:SSE-AVX-GPU} compares the performance of
    623 our SSE2, AVX and GPU implementations.
    624 
    625637\begin{figure}
    626638\begin{center}
Note: See TracChangeset for help on using the changeset viewer.