source: docs/PACT2011/07-avx.tex @ 993

Last change on this file since 993 was 993, checked in by lindanl, 8 years ago

section 7

File size: 3.4 KB
3In this section, we briefly highlight the improvements made in the Advanced Vector Extensions (AVX) extension to the x86 instruction set architecture
4and discuss the impact of these improvements on Parabix2. As neither Expat nor Xerces-C benefit from AVX, we do not discuss them in this section.
5%The results of our experiments with the AVX and Sandy Bridge architecture can be seen in Figure \ref{avx}.
7% Following AMD's announcement of their SSE5 architecture, Intel announced their intention to develop the AVX
13\caption{Total CPU cycles /KB on AVX}
21\caption{Instructions per byte on Sandybridge}
25\subsection{Three Operand Form}
27Originally, SIMD SSE instructions operated using a two-operand form.
28This meant that given any SIMD instruction $a~\texttt{[op]}~b$ the result of that instruction would replace the value of $a$ or $b$ with the result.
29Thus whenever the subsequent instructions used the value of both $a$ and $b$, one of them had to be either reconstructed,
30or an additional store and load operation was required to recover that value.
31Utilizing the new VEX instruction coding scheme \textbf{[citation needed]},
32Intel now allows the use of non-destructive three-operand operations in their SSE and AVX instruction sets.
33As shown in Figure \ref{insmix}, the total number of non-bitwise logic SIMD operations, which involve many memory movements is 32\% to 34\% less.
34Simply enabling three-operand form on the existing 128-bit SSE instructions reduced the overall cycle count by between 11.7\% and 13.5\%, which is shown in Figure \ref{avx}.
35While this is a one-time savings, it provided a significant performance improvement that traditional parsers cannot leverage since they cannot benefit from the three-operand form designed for SIMD instruction set and as shown in Figure \ref{insmix}, the total number of non-vector instructions does not change.
37\subsection{256-bit Operations}
39The AVX instruction set provided on the Sandy Bridge allows the use of 256-bit SIMD registers.
40Ideally, we only need half of the SIMD instructions compared with the version that uses SSE instruction set (three-operand form).
41Therefore, Parabix2 should be able to achieve 50\% performance improvement on SIMD operations, which means 26\% to 38\% improvement of total processing time simply by using AVX intruction set instead of SSE instruction set.
42However, Intel focused on implementing floating point operations as opposed to the integer based operations, we only gain from bitwise logic operations and SIMD loading operations.
43As shown in Figure \ref{insmix}, the total number of SIMD instructions executed with AVX instruction set is 71\% to 79\% of the SIMD instructions with SSE instruction set.
44The number of bitwise logic operations, which is expected to be 50\% less, only goes down by 33\% to 39\% because they are used to simulate some other 256-bit operations that exsit on SSE but is not provided by AVX instruction set.
45As the total number of instructions goes down by 11\% to 23\%, we should be able to see less processing time and better performance.
46However, as shown in Figure \ref{avx}, the processing time is longer except the one with 23\% less instructions.
47The reason is that AVX instruction has longer latency. (cite Agner Fog?)
Note: See TracBrowser for help on using the repository browser.