Changes between Version 4 and Version 5 of ParabixTransform


Ignore:
Timestamp:
Apr 22, 2014, 4:08:49 PM (4 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ParabixTransform

    v4 v5  
    8585of 24 operations for the entire process.   
    8686
    87 
    8887=== Implementation of Ideal Packing Using Bit Shuffles ===
    8988
     89Unfortunately, the {{{hsimd<{2,4,8}>::pack{h,l}(e1, e2)}}} family of operations for
     90ideal transposition are not to be found within the SIMD instruction set of current
     91commodity processors.  However, if *bit shuffle* operations are to be found instead,
     92then the pack operations can be expressed directly in terms of equivalent
     93bit shuffles.   Bit shuffles can be found both at the intermediate representation
     94level with the LLVM {{{shufflevector}}} operation and at the processor ISA level
     95in terms of the Haswell new instruction {{{pext}}}.
    9096
     97The LLVM {{{shufflevector}}} operation allows a result vector to be populated
     98by directly selecting elements from a concatenated pair of input vectors.
     99A constant vector of {{{i32}}} selectors lets each vector element be selected
     100from any of the positions within either of the two input vectors.   For example,
     101working with 8-bit input vectors (for simplicity of the example), the
     102{{{hsimd<2>::pack{h,l}(e1, e2)}}} operations may be translated directly into
     103{{{shufflevector}}} operations.
     104{{{
     105define <8 x i1> @hsimd_packh_2(<8 x i1> %x, <8 x i1> %y) {
     106   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     107                <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
     108   return <8 x i1> result
     109}
     110define <8 x i1> @hsimd_packl_2(<8 x i1> %x, <8 x i1> %y) {
     111   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     112                <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
     113   return <8 x i1> result
     114}
     115}}}
     116
     117Similarly, it is straightforward to define the additional {{{hsimd<{4,8}>::pack{h,l}(e1, e2)}}}
     118operations on 8-bit registers as follows.
     119{{{
     120define <8 x i1> @hsimd_packh_4(<8 x i1> %x, <8 x i1> %y) {
     121   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     122                <8 x i32> <i32 2, i32 3, i32 6, i32 7, i32 10, i32 11, i32 14, i32 15>
     123   return <8 x i1> result
     124}
     125define <8 x i1> @hsimd_packl_4(<8 x i1> %x, <8 x i1> %y) {
     126   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     127                <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 8, i32 9, i32 12, i32 13>
     128   return <8 x i1> result
     129}
     130}}}
     131{{{
     132define <8 x i1> @hsimd_packh_8(<8 x i1> %x, <8 x i1> %y) {
     133   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     134                <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 12, i32 13, i32 14, i32 15>
     135   return <8 x i1> result
     136}
     137define <8 x i1> @hsimd_packl_8(<8 x i1> %x, <8 x i1> %y) {
     138   %result = shufflevector <8 x i1> %x, <8 x i1> %y,
     139                <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
     140   return <8 x i1> result
     141}
     142}}}
     143
     144
     145
     146
     147>  %step = zext i32 %y to i128
     148>         %newX = bitcast <2 x i64> %x to i128
     149>  %newX1 = shl i128 %newX, %step
     150>         %result = bitcast i128 %newX1 to <2 x i64>
     151>         ret <2 x i64> %result
     152> }
     153>
     154
     155The LLVM shufflevector operation
     156
     157
     158
     159However, it is possible to model these packing operations
     160using bit shuffle operations.