Changes between Version 1 and Version 2 of ShuffleVector


Ignore:
Timestamp:
Mar 25, 2014, 11:30:55 PM (4 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ShuffleVector

    v1 v2  
    66a byte swap, for example. 
    77{{{
    8 shufflevector <8 x i8> %v1, <8 x i8> %v2,
    9                  <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>  ; yields <8 x i8>
     8%v3 = shufflevector <8 x i8> %v1, <8 x i8> undef,
     9                    <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>  ; yields <8 x i8>
    1010}}}
    11 Transforming this to {{{@llvm.bswap.i64(i64 %v1)}}} may allow efficient implementation
     11Transforming this to {{{
     12%t0 = bitcast %v1 to i64
     13@llvm.bswap.i64(i64 %t0)
     14}}} may allow efficient implementation
    1215on an architecture supporting byte swap, but not shuffle.
     16
     17Generalizing this pattern, we may have arbitrary rotations expressed using shuffle masks.
     18For example, consider the  shufflevector of 4-bit fields:
     19{{{
     20%v3 = shufflevector <8 x i4> %v1, <8 x i4> undef,
     21              <8 x i32> <i32 1, i32 2, i32 3, i32 0, i32 5, i32 6, i32 7, i32 4>  ; yields <8 x i8>
     22}}}
     23Shuffles on 4-bit fields are generally not supported by SIMD instruction sets, but this one
     24can be implemented by transforming to 16-bit vector shift operations.
     25{{{
     26%t0 = bitcast %v1 to <2 x i16>
     27%t1 = shl %t0, <2 x i16> <i16 12, i16 12>
     28%t2 = lshr4 %t0, <2 x i16> <i16 4, i16 4>
     29%v3 = xor %t1, %t2
     30}}}
     31
     32Can these examples be turned into general rules that systematically capture
     33these special cases?
     34
     35== Vectorized Sequential Code ==
     36
     37If there is no known fully parallel implementation of a particular case, it may
     38still be possible to partially parallelize by making a vectorized sequential
     39loop.   
    1340
    1441== Shuffling Bit-By-Bit ==
     
    2148{{{
    2249shufflevector <128 x i1> %v1, <128 x i1> %v2,
    23                  <128 x i1> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, ...>  ; yields <8 x i8>
     50                 <128 x i1> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, ...>
    2451}}}
    2552
    26 An architecture may not support shuffle at the bit-level, but could bit-level merge can be
     53An architecture may not support shuffle at the bit-level, but bit-level merge can be
    2754implemented using four byte-level shuffles combined with shifting and bitwise logic.
    2855