# Changeset 1783 for docs/HPCA2012/final_ieee/07-avx.tex

Ignore:
Timestamp:
Dec 14, 2011, 2:27:41 PM (8 years ago)
Message:

Final pass

File:
1 edited

### Legend:

Unmodified
 r1778 Note that, in each workload, the number of non-SIMD instructions remains relatively constant with each implementation.  As expected, the number of bitwise SIMD operations remains the same for both SSE and 128-bit AVX while dropping dramatically when operating 256-bits at a time. The reduction was measured at 32\%--39\% depending on markup density of the workload. The other SIMD'' class shows a substantial 30\%--35\% reduction with AVX 128-bit technology compared to SSE. This reduction is due to elimination of register unloading and reloading when SIMD operations are compiled using 3-operand AVX form versus 2-operand SSE form.  A further 10\%--20\% reduction is also observed when Parabix-XML utilized the AVX runtime library. The number of non-SIMD instructions remains relatively constant with each implementation.  The number of bitwise SIMD operations remains the same for both SSE and 128-bit AVX while dropping dramatically when operating 256-bits at a time. The reduction was measured at 32\%--39\% depending on markup density of the workload. The other SIMD'' class shows a substantial 30\%--35\% reduction with AVX 128-bit technology compared to SSE. This reduction is due to elimination of register unloading and reloading when SIMD operations are compiled using 3-operand AVX form versus 2-operand SSE form.  A further 10\%--20\% reduction is also observed when Parabix-XML utilized the AVX runtime library. %[AS] Check numbers. The reductions in instruction counts are quite dramatic with the AVX extensions in Parabix demonstrating the ability of our runtime framework to exploit the available hardware resources. As shown in Figure \ref{avx}, the benefits of the reduced SIMD instruction count are achieved only in the AVX 128-bit version.  In this case, the benefits of 3-operand form seem to fully translate to performance benefits.  Based on the reduction of overall Bitwise-SIMD instructions we expected a 11\% improvement in performance. Surprisingly, the performance of Parabix in the 256-bit AVX implementation does not improve significantly and actually degrades for files with higher markup density ($\sim11\%$). dew.xml, on which bitwise-SIMD instructions were reduced by 39\%, saw a performance improvement of 8\%.  We believe that this is primarily due to the intricacies of the first generation AVX implementation in \SB{}, with significant latency in many of the 256-bit instructions in comparison to their 128-bit counterparts. The 256-bit instructions also have different scheduling constraints that seem to reduce overall throughput.  If these latency issues can be addressed in future AVX implementations, further performance and energy benefits could be realized in Parabix-XML. The reductions in instruction counts are significant with the AVX extensions demonstrating the ability of Parabix to exploit wider SIMD extensions. Figure \ref{avx} shows the benefits of the reduced SIMD instruction count are achieved only in the AVX 128-bit version; The 3-operand form seems to fully translate to performance benefits. Based on the reduction of overall Bitwise-SIMD instructions we expected a 11\% improvement in performance.  Surprisingly, the performance of Parabix in the 256-bit AVX implementation does not improve significantly and actually degrades for files with higher markup density ($\sim11\%$). dew.xml, on which bitwise-SIMD instructions were reduced by 39\%, saw a performance improvement of 8\%.  We believe that this is primarily due to the intricacies of the first generation AVX implementation in \SB{}, with significant latency in many of the 256-bit instructions in comparison to their 128-bit counterparts. The 256-bit instructions also have different scheduling constraints that seem to reduce overall throughput.  If these latency issues can be addressed in future AVX implementations, further performance and energy benefits could be realized by Parabix.