Changes between Version 5 and Version 6 of ShuffleVector


Ignore:
Timestamp:
Apr 16, 2014, 2:00:40 AM (4 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ShuffleVector

    v5 v6  
    5454== Parallel Extract and Parallel Deposit ==
    5555
    56 Intel's Haswell Architecture has two new 64-bit instructions {{{pext}}} and {{{pdep}}} that can
    57 be used for flexible extraction and placement of bits.   Code generation for {{{shufflevector}}} in
    58 terms of these operations is certainly worth study.
     56Intel's Haswell Architecture has two new {{{pext}}} and {{{pdep}}} that can
     57be used for flexible extraction and placement of bits.   
     58These instructions work with either 32-bit or
     5964-bit general registers.
     60
     61{{{pext}}} extracts bits from marked positions, producing a
     62compressed result. 
     63
     64For example, given an 8-bit value {{{hgfedcba}}} and an
     658-bit extraction mask {{{01100101}}}, four bits would be
     66extracted to produce the result {{{0000gfca}}}. 
     67
     68{{{pext}}} operates in reverse, depositing bits at marked
     69positions, placing 0 elsewhere. 
     70
     71For example, given an 8-bit value {{{hgfedcba}}} and an
     728-bit deposit mask {{{01100101}}}, four bits would be
     73extracted to produce the result {{{0dc00b0a}}}. 
     74
     75In these examples, bit patterns have been shown in
     76the little-endian order used by in the x86 architecture.
     77
     78Both operations can be modeled by shufflevector. 
     79
     80For {{{pext}}}, the first argument is the bit vector from
     81which bits are to be extracted, the second argument
     82is a vector of all 0 bits, and the selection
     83vector has the indexes of desired bits.   For the
     848-bit extraction mask, {{{01100101}}}, the extraction
     85positions of the four bits are 6,5,2,0 (little-endian
     86bit numbering), so the 8-bit vector could be
     87{{{<0, 2, 5, 6, 8, 8, 8, 8>}}} 
     88Here the selector 8 specifies the position of a
     89bit in the second vector, known to be 0.
     90
     91For {{{pdep}}}, we again have the bit vector from which bits
     92are to be taken as the first argument, a constant 0
     93vector as the second argument, and an index vector
     94for deposits.   In the example, the vector for deposits is
     95{{{<0, 2, 5, 6, 8, 8, 8, 8>}}} 
     96
     97The code generation problem is thus to recognize these
     98shufflevector patterns and generate the appropriate {{{pdep}}} or
     99{{{pext}}} calls.   
     100
     101The {{{pext}}} requirements are:
     102  a. the two input vector arguments are the same type, either<32 x i1> or <64 x i1>
     103  a. the second input vector argument is a constant 0 vector
     104  a. the selection pattern consists of two parts:  a strictly increasing sequence of positions from the first argument (<0, 2, 5, 6> in the example), followed by a selectors from the second argument (<8, 8, 8, 8> in the example).
     105
     106For {{{pdep}}}, only the selection pattern differs.   In this case, the selection pattern must consist of interspersed
     107selectors (<0, 1, 2, ...>) from the first vector and arbitrary selectors from the second vector.
     108