Changeset 1120


Ignore:
Timestamp:
Apr 11, 2011, 4:45:23 PM (8 years ago)
Author:
ksherdy
Message:

Minor edits.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/07-avx.tex

    r1116 r1120  
    11\section{Scaling Parabix2 for AVX}
    22
    3 Parabix2 was originally developed for 128-bit SSE2 technology widely
    4 and is available on all 64-bit Intel and AMD processors.  In this section,
    5 we discuss the scalability and performance of Parabix2 to take
    6 advantage of the new 256-bit AVX (Advanced Vector Extensions)
    7 technology that has just become commercially available in the
    8 latest Intel processors based on the \SB\ microarchitecture.
     3In this section, we discuss the scalability and performance advantages of our 256-bit AVX (Advanced Vector Extensions) Parabix2 port.
     4Parabix2 originally targetted the 128-bit SSE2 SIMD technology available on all modern 64-bit Intel and AMD processors but
     5has recently been ported to AVX. AVX technology is commercially available on the
     6latest the \SB\ microarchitecture Intel processors.
    97
    108\begin{figure*}
     
    2624\subsection{Three Operand Form}
    2725
    28 In addition to the introduction of 256-bit operations, AVX technology
    29 also makes a change in the structure of the base SSE instructions,
    30 moving from a destructive 2-operand form long used with SSE technologies
    31 to a nondestructive 3-operand form.   In the 2-operand form,
    32 one register is used as both a source and
    33 destination register, equivalent to the assignment $a = a~\texttt{[op]}~b$.
    34 Thus, whenever the subsequent instructions used the value of both $a$ and $b$,
    35 one of them had to be copied beforehand, or reconstituted or reloaded
    36 afterwards in order to recover the value.
    37 With 3-operand form, output may be directed to a third register independent
    38 of the source operands, as reflected by the assignment $c = a~\texttt{[op]}~b$.
     26In addition to the widening of 128-bit operations to 256-bit operations, AVX technology
     27uses a nondestructive 3-operand instruction format. Previous SSE implementations
     28used a destructive 2-operand instruction format. In the 2-operand format
     29a single register is used as both a source and
     30destination register. For example, $a = a~\texttt{[op]}~b$.
     31As such, 2-operand instructions that require the value of both $a$ and $b$,
     32must either copy an additional register value beforehand, or reconstitute or reload a register value
     33afterwards to recover the value.
     34With the 3-operand format, output may now be directed to the third register independently
     35of the source operands. For example, $c = a~\texttt{[op]}~b$.
    3936By avoiding the copying or reconstituting of operand values, a considerable
    40 reduction in instruction count may be possible.
    41 AVX technology makes available the 3-operand form both with the new 256-bit
    42 operations as well as with base 128-bit operations of SSE.
     37reduction in instruction count in the form of reduced load and store instructions is possible.
     38AVX technology makes available the 3-operand form for both the new 256-bit
     39operations as well as the base 128-bit SSE operations.
    4340
    4441\subsection{256-bit AVX Operations}
    4542
    46 With the introduction of 256-bit SIMD registers with AVX technology,
    47 one might ideally expect up to a 50\% reduction in the instruction
    48 count for the SIMD workload of Parabix2.   However, in the \SB\
    49 implementation, Intel has focused on implementing floating point
    50 operations as opposed to the integer based operations.  That is,
     43With the introduction of 256-bit SIMD registers, and under ideal conditions, one would anticipate a corresponding
     4450\% reduction in the SIMD instruction count of Parabix2 on AVX.  However, in the \SB\ AVX
     45implementation, Intel has focused primarily on floating point operations
     46as opposed to the integer based operations. 
    5147256-bit SIMD is available for loads, stores, bitwise logic and
    52 floating operations, while SIMD integer operations and shifts are
    53 only available in 128-bit form.   Nevertheless, with loads, stores
     48floating operations, whereas SIMD integer operations and shifts are
     49only available in the 128-bit form.  Nevertheless, with loads, stores
    5450and bitwise logic comprising a major portion of the Parabix2
    5551SIMD instruction mix, a substantial reduction in instruction count
    56 and consequent performance improvement was anticipated.
     52and consequent performance improvement was anticipated but not achieved.
    5753
    5854\subsection{Performance Results}
    5955
    60 We implemented two versions of Parabix2 using AVX technology.   The first
     56We implemented two versions of Parabix2 using AVX technology.  The first
    6157was simply the recompilation of the existing Parabix2 source code
    62 to take advantage of the 3-operand form of AVX instructions while retaining
    63 a uniform 128-bit SIMD processing width.  The second involved rewriting
    64 core library functions for Parabix2 to use 256-bit AVX operations wherever
     58written to take advantage of the 3-operand form of AVX instructions while retaining
     59a uniform 128-bit SIMD processing width.  The second involved rewriting the
     60core library functions of Parabix2 to leverage the 256-bit AVX operations wherever
    6561possible and to simulate the remaining operations using pairs of 128-bit
    6662operations.   
Note: See TracChangeset for help on using the changeset viewer.