56 | | Intel's Haswell Architecture has two new 64-bit instructions {{{pext}}} and {{{pdep}}} that can |
57 | | be used for flexible extraction and placement of bits. Code generation for {{{shufflevector}}} in |
58 | | terms of these operations is certainly worth study. |
| 56 | Intel's Haswell Architecture has two new {{{pext}}} and {{{pdep}}} that can |
| 57 | be used for flexible extraction and placement of bits. |
| 58 | These instructions work with either 32-bit or |
| 59 | 64-bit general registers. |
| 60 | |
| 61 | {{{pext}}} extracts bits from marked positions, producing a |
| 62 | compressed result. |
| 63 | |
| 64 | For example, given an 8-bit value {{{hgfedcba}}} and an |
| 65 | 8-bit extraction mask {{{01100101}}}, four bits would be |
| 66 | extracted to produce the result {{{0000gfca}}}. |
| 67 | |
| 68 | {{{pext}}} operates in reverse, depositing bits at marked |
| 69 | positions, placing 0 elsewhere. |
| 70 | |
| 71 | For example, given an 8-bit value {{{hgfedcba}}} and an |
| 72 | 8-bit deposit mask {{{01100101}}}, four bits would be |
| 73 | extracted to produce the result {{{0dc00b0a}}}. |
| 74 | |
| 75 | In these examples, bit patterns have been shown in |
| 76 | the little-endian order used by in the x86 architecture. |
| 77 | |
| 78 | Both operations can be modeled by shufflevector. |
| 79 | |
| 80 | For {{{pext}}}, the first argument is the bit vector from |
| 81 | which bits are to be extracted, the second argument |
| 82 | is a vector of all 0 bits, and the selection |
| 83 | vector has the indexes of desired bits. For the |
| 84 | 8-bit extraction mask, {{{01100101}}}, the extraction |
| 85 | positions of the four bits are 6,5,2,0 (little-endian |
| 86 | bit numbering), so the 8-bit vector could be |
| 87 | {{{<0, 2, 5, 6, 8, 8, 8, 8>}}} |
| 88 | Here the selector 8 specifies the position of a |
| 89 | bit in the second vector, known to be 0. |
| 90 | |
| 91 | For {{{pdep}}}, we again have the bit vector from which bits |
| 92 | are to be taken as the first argument, a constant 0 |
| 93 | vector as the second argument, and an index vector |
| 94 | for deposits. In the example, the vector for deposits is |
| 95 | {{{<0, 2, 5, 6, 8, 8, 8, 8>}}} |
| 96 | |
| 97 | The code generation problem is thus to recognize these |
| 98 | shufflevector patterns and generate the appropriate {{{pdep}}} or |
| 99 | {{{pext}}} calls. |
| 100 | |
| 101 | The {{{pext}}} requirements are: |
| 102 | a. the two input vector arguments are the same type, either<32 x i1> or <64 x i1> |
| 103 | a. the second input vector argument is a constant 0 vector |
| 104 | a. the selection pattern consists of two parts: a strictly increasing sequence of positions from the first argument (<0, 2, 5, 6> in the example), followed by a selectors from the second argument (<8, 8, 8, 8> in the example). |
| 105 | |
| 106 | For {{{pdep}}}, only the selection pattern differs. In this case, the selection pattern must consist of interspersed |
| 107 | selectors (<0, 1, 2, ...>) from the first vector and arbitrary selectors from the second vector. |
| 108 | |