Version 1 (modified by cameron, 4 years ago) (diff) |
---|

# The ShuffleVector Project

This project investigates code generation for the LLVM shufflevector operation, particularly in the case that the shuffle mask is a compile-time constant.

For example, the shuffle mask pattern for a shuffle vector could be just a byte swap, for example.

shufflevector <8 x i8> %v1, <8 x i8> %v2, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6> ; yields <8 x i8>

Transforming this to `@llvm.bswap.i64(i64 %v1)` may allow efficient implementation
on an architecture supporting byte swap, but not shuffle.

## Shuffling Bit-By-Bit

There are many interesting possibilities with shuffle masks that perform bit-by-bit selection.

Bit interleave can be expressed by a shuffle mask with alternating bits from the two vectors.
For example the IDISA operation `simd<1>::mergel(v1, v2)` could be expressed as a shuffle vector
operation.

shufflevector <128 x i1> %v1, <128 x i1> %v2, <128 x i1> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, ...> ; yields <8 x i8>

An architecture may not support shuffle at the bit-level, but could bit-level merge can be implemented using four byte-level shuffles combined with shifting and bitwise logic.

## Parallel Extract and Parallel Deposit

Intel's Haswell Architecture has two new 64-bit instructions `pext` and `pdep` that can
be used for flexible extraction and placement of bits. Code generation for `shufflevector` in
terms of these operations is certainly worth study.