Version 5 (modified by cameron, 5 years ago) (diff) |
---|

# Towards a Library of Shufflevector Patterns

The LLVM IR contains a very powerful vector operation
called {{{shufflevector}}}. Unless the target architecture
supports a similarly powerful operation directly, good
code generation for `shufflevector` instructions with arbitrary
shuffle patterns is very difficult.

However, there are many `shufflevector` patterns that
map to simpler operations, both at the LLVM IR level and
at the target architecture level. This document introduces
a library of patterns as well as a systematic approach to
code generation for `shufflevector` operations.

## Pattern Recognition

With a potentially large library of shuffle vector patterns and corresponding implementations, how do we systematically search for patterns without incurring high pattern matching costs for brute-force sequential search? One answer is to abstract the patterns and to define quick, parallel methods of pattern recognition.

### Shuffle Vector Canonical Form

In order to facilitate recognition of shuffle vector patterns, we first perform some transformations to put shuffle vectors in canonical form.

For example, consider the following three `shufflevector` patterns
are all equivalent.

shufflevector <4 x i32> zeroinitializer, <4 x i32> %x, <4 x i32> <i32 4, i32 2, i32 5, i32 1> shufflevector <4 x i32> %x, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 6, i32 1, i32 5> shufflevector <4 x i32> %x, <4 x i32> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 1, i32 4>

This example illustrates two points about equivalent forms.
First, it is always possible to produce equivalent results
by exchanging the order of the two input vectors and modifying
the shuffle selection mask, accordingly. Second, when
one of the input vectors is a constant vector (the vector of all zeroes `zeroinitializer`, in this case), then
we can any selection mask index in this vector is equivalent to any other.

This example also illustrates two rules for the conversion to canonical form
in this case. As described below, if `zeroinitializer` occurs
as an input vector, then it should normally occur as the second
input vector. Secondly, if the selection mask values for a constant
vector always refer to the lowest element. In the example, the
selection mask values 5 and 6 are replaced by 4.

In general, in discussing canonical forms, we refer to the
three consecutive arguments of `shufflevector` as
vector 1 or `v1`, vector 2 or `v2` and `mask`.

#### Canonical Order of Vectors

Consider a shufflevector invocation of the following form.

shufflevector <n x <ty>> v1, <n x <ty>> v2, <m x i32> mask

Given such a shufflevector is always possible to reverse the order of the vectors with a suitable transformation of the mask.

shufflevector <n x <ty>> v2, <n x <ty>> v1, <m x i32> mask'

The mask transformation replaces each mask element `mask`[i]
with the value which is either `mask`[i] + n,
if `mask`[i] < n, or `mask`[i] - n, if `mask`[i] >= n.
This may be expressed by the following equation.

`mask'`[i] = (`mask`[i] + n) mod (2n)

The following rules define canonical order for the vector arguments.

- If one of the vectors is
`undef`, then vector 2 must be`undef`in canonical order. - If one of the vectors is
`zeroinitializer`(or the equivalent constant), and the other vector is not`undef`, then vector 2 must be`zeroinitializer`in canonical order. - If one of the vectors is a constant vector and the other vector is neither constant nor
`undef`, then vector 2 must be a constant vector in canonical order. - If all of the selector values in a mask are >= n, then the vector arguments must be exchanged to be placed in canonical order.

## Standard Pattern Examples

### The Rotate Pattern

The double vector rotate pattern is recognized as follows. Form the first-order difference vector of the shuffle index vector.