78 | | 1. IDISA generator kit. |
79 | | The IDISA generator kit is used to generate IDISA |
80 | | implementations for given source language/compiler/architecture |
81 | | combinations. For example, we could generate an IDISA |
82 | | language consist of a C library using GCC vector conventions |
83 | | for the Power PC Altivec instruction set, or a C++ library |
84 | | using MSVC conventions for the Intel SSE2 instruction set. |
85 | | However, it should also have the flexibility for non-SIMD implementations |
86 | | such as implementation of a Python library using Python |
87 | | conventions for operations on unbounded bitstreams. |
| 78 | === IDISA Generator Kit === |
| 79 | The IDISA generator kit is used to generate IDISA |
| 80 | implementations for given source language/compiler/architecture |
| 81 | combinations. For example, we could generate an IDISA |
| 82 | language consist of a C library using GCC vector conventions |
| 83 | for the Power PC Altivec instruction set, or a C++ library |
| 84 | using MSVC conventions for the Intel SSE2 instruction set. |
| 85 | However, it should also have the flexibility for non-SIMD implementations |
| 86 | such as implementation of a Python library using Python |
| 87 | conventions for operations on unbounded bitstreams. |
98 | | 1. IDISA compile-time specialization kit. The compile-time |
99 | | specialization kit is used to provide optimized implementations |
100 | | of IDISA under known static properties of operand values. |
101 | | For example, if it is known that the high bit of each 4-bit |
102 | | field in registers $a$ an $b$ is zero, then a simd<4>::add(a,b) |
103 | | operation with no direct implementation on a particular |
104 | | platform can be realized by a wider-width operation that is, |
105 | | such as simd<16>::add(a,b) on most platforms. |
| 98 | === IDISA Compile-Time Specialization Kit === |
107 | | 1. IDISA reverse instruction optimizer. Various processor |
108 | | architectures provide combined SIMD operations that correspond |
109 | | to sequences of IDISA instructions. For example, the Intel |
110 | | PSADBW performs a packed sum of absolute differences corresponding |
111 | | to the following 5 IDISA operations. |
112 | | t1 = simd<8>::abs(simd<8>::sub(a,b)) |
113 | | psadbw = simd<64>::add(simd<32>::add(simd<16>::add(t1))) |
114 | | The reverse instruction optimizer uses knowledge of these |
115 | | available optimized forms to generate optimized implementations |
116 | | where appropriate IDISA instruction sequences may be found. |
117 | | Note that the recognition may involve special case logic: |
118 | | psadbw can be efficiently used for the 8-field horizontal |
119 | | addition: simd<64>::add(simd<32>::add(simd<16>::add(x))) |
120 | | using psadbw(x, 0). |
| 100 | The compile-time |
| 101 | specialization kit is used to provide optimized implementations |
| 102 | of IDISA under known static properties of operand values. |
| 103 | For example, if it is known that the high bit of each 4-bit |
| 104 | field in registers a and b is zero, then a simd<4>::add(a,b) |
| 105 | operation with no direct implementation on a particular |
| 106 | platform can be realized by a wider-width operation that is, |
| 107 | such as simd<16>::add(a,b) on most platforms. |
| 108 | |
| 109 | === IDISA Reverse Instruction Optimizer. === |
| 110 | Various processor |
| 111 | architectures provide combined SIMD operations that correspond |
| 112 | to sequences of IDISA instructions. For example, the Intel |
| 113 | PSADBW performs a packed sum of absolute differences corresponding |
| 114 | to the following 5 IDISA operations. |
| 115 | t1 = simd<8>::abs(simd<8>::sub(a,b))[[BR]] |
| 116 | psadbw = simd<64>::add(simd<32>::add(simd<16>::add(t1)))[[BR]] |
| 117 | The reverse instruction optimizer uses knowledge of these |
| 118 | available optimized forms to generate optimized implementations |
| 119 | where appropriate IDISA instruction sequences may be found. |
| 120 | Note that the recognition may involve special case logic: |
| 121 | psadbw can be efficiently used for the 8-field horizontal |
| 122 | addition: simd<64>::add(simd<32>::add(simd<16>::add(x))) |
| 123 | using psadbw(x, 0). |