Version 1 (modified by cameron, 5 years ago) (diff)


The ShuffleVector Project

This project investigates code generation for the LLVM shufflevector operation, particularly in the case that the shuffle mask is a compile-time constant.

For example, the shuffle mask pattern for a shuffle vector could be just a byte swap, for example.

shufflevector <8 x i8> %v1, <8 x i8> %v2,
                 <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>  ; yields <8 x i8>

Transforming this to @llvm.bswap.i64(i64 %v1) may allow efficient implementation on an architecture supporting byte swap, but not shuffle.

Shuffling Bit-By-Bit

There are many interesting possibilities with shuffle masks that perform bit-by-bit selection.

Bit interleave can be expressed by a shuffle mask with alternating bits from the two vectors. For example the IDISA operation simd<1>::mergel(v1, v2) could be expressed as a shuffle vector operation.

shufflevector <128 x i1> %v1, <128 x i1> %v2,
                 <128 x i1> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, ...>  ; yields <8 x i8>

An architecture may not support shuffle at the bit-level, but could bit-level merge can be implemented using four byte-level shuffles combined with shifting and bitwise logic.

Parallel Extract and Parallel Deposit

Intel's Haswell Architecture has two new 64-bit instructions pext and pdep that can be used for flexible extraction and placement of bits. Code generation for shufflevector in terms of these operations is certainly worth study.