Changes between Version 6 and Version 7 of ShuffleVector


Ignore:
Timestamp:
Apr 16, 2014, 2:05:09 AM (4 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ShuffleVector

    v6 v7  
    4040
    4141There are many interesting possibilities with shuffle masks that perform bit-by-bit selection.
     42See the BitShuffle subproject.
    4243
    43 Bit interleave can be expressed by a shuffle mask with alternating bits from the two vectors.
    44 For example the IDISA operation {{{simd<1>::mergel(v1, v2)}}} could be expressed as a shuffle vector
    45 operation.
    46 {{{
    47 shufflevector <128 x i1> %v1, <128 x i1> %v2,
    48                  <128 x i1> <i32 0, i32 128, i32 1, i32 129, i32 2, i32 130, i32 3, i32 131, ...>
    49 }}}
    50 
    51 An architecture may not support shuffle at the bit-level, but bit-level merge can be
    52 implemented using four byte-level shuffles combined with shifting and bitwise logic.
    53 
    54 == Parallel Extract and Parallel Deposit ==
    55 
    56 Intel's Haswell Architecture has two new {{{pext}}} and {{{pdep}}} that can
    57 be used for flexible extraction and placement of bits.   
    58 These instructions work with either 32-bit or
    59 64-bit general registers.
    60 
    61 {{{pext}}} extracts bits from marked positions, producing a
    62 compressed result. 
    63 
    64 For example, given an 8-bit value {{{hgfedcba}}} and an
    65 8-bit extraction mask {{{01100101}}}, four bits would be
    66 extracted to produce the result {{{0000gfca}}}. 
    67 
    68 {{{pext}}} operates in reverse, depositing bits at marked
    69 positions, placing 0 elsewhere. 
    70 
    71 For example, given an 8-bit value {{{hgfedcba}}} and an
    72 8-bit deposit mask {{{01100101}}}, four bits would be
    73 extracted to produce the result {{{0dc00b0a}}}. 
    74 
    75 In these examples, bit patterns have been shown in
    76 the little-endian order used by in the x86 architecture.
    77 
    78 Both operations can be modeled by shufflevector. 
    79 
    80 For {{{pext}}}, the first argument is the bit vector from
    81 which bits are to be extracted, the second argument
    82 is a vector of all 0 bits, and the selection
    83 vector has the indexes of desired bits.   For the
    84 8-bit extraction mask, {{{01100101}}}, the extraction
    85 positions of the four bits are 6,5,2,0 (little-endian
    86 bit numbering), so the 8-bit vector could be
    87 {{{<0, 2, 5, 6, 8, 8, 8, 8>}}} 
    88 Here the selector 8 specifies the position of a
    89 bit in the second vector, known to be 0.
    90 
    91 For {{{pdep}}}, we again have the bit vector from which bits
    92 are to be taken as the first argument, a constant 0
    93 vector as the second argument, and an index vector
    94 for deposits.   In the example, the vector for deposits is
    95 {{{<0, 2, 5, 6, 8, 8, 8, 8>}}} 
    96 
    97 The code generation problem is thus to recognize these
    98 shufflevector patterns and generate the appropriate {{{pdep}}} or
    99 {{{pext}}} calls.   
    100 
    101 The {{{pext}}} requirements are:
    102   a. the two input vector arguments are the same type, either<32 x i1> or <64 x i1>
    103   a. the second input vector argument is a constant 0 vector
    104   a. the selection pattern consists of two parts:  a strictly increasing sequence of positions from the first argument (<0, 2, 5, 6> in the example), followed by a selectors from the second argument (<8, 8, 8, 8> in the example).
    105 
    106 For {{{pdep}}}, only the selection pattern differs.   In this case, the selection pattern must consist of interspersed
    107 selectors (<0, 1, 2, ...>) from the first vector and arbitrary selectors from the second vector.
    108