| 12 | |
| 13 | === Ideal Three-Stage Parallel Transposition for Byte Streams === |
| 14 | |
| 15 | The Parabix Transform to transform byte-oriented character stream data to bit-parallel data streams |
| 16 | can be implemented in a parallel fashion using the following |
| 17 | three-stage transposition strategy. |
| 18 | |
| 19 | 1. Transform the byte stream data into two parallel nybble streams, one for the high nybble of each byte, and one for the low nybble of each byte. (A nybble is one-half of a byte, i.e., 4 bits). |
| 20 | 2. Transform each of the nybble streams into two parallel bit-pair streams. Each bit-pair stream consists of a stream of 2-bit units in one-to-one correspondence with the input byte stream. The bit-pairs in the first such stream consist of bits 0 and 1 of each byte, the bit-pairs in the second stream consist of bits 2 and 3 of each byte, the bit-pairs in the third stream consist of bits 4 and 5 of each byte and the bit-pairs in the fourth and final bit-pair stream consist of the bits 6 and 7 of each byte. |
| 21 | 3. Transform each of the bit-pair streams into two individual bit streams. |
| 22 | |
| 53 | |
| 54 | {{{ |
| 55 | // Transpose to 2 nybble streams |
| 56 | lo_nybble0 = hsimd::packl(s0, s1); |
| 57 | lo_nybble1 = hsimd::packl(s2, s3); |
| 58 | lo_nybble3 = hsimd::packl(s4, s5); |
| 59 | lo_nybble4 = hsimd::packl(s6, s7); |
| 60 | hi_nybble0 = hsimd::packh(s0, s1); |
| 61 | hi_nybble1 = hsimd::packh(s2, s3); |
| 62 | hi_nybble3 = hsimd::packh(s4, s5); |
| 63 | hi_nybble4 = hsimd::packh(s6, s7); |
| 64 | // Transpose 2 nybble streams to 4 bit-pair streams. |
| 65 | bit01pair_0 = hsimd::packl(lo_nybble0, lo_nybble1); |
| 66 | bit01pair_1 = hsimd::packl(lo_nybble2, lo_nybble3); |
| 67 | bit23pair_0 = hsimd::packh(lo_nybble0, lo_nybble1); |
| 68 | bit23pair_1 = hsimd::packh(lo_nybble2, lo_nybble3); |
| 69 | bit45pair_0 = hsimd::packl(hi_nybble0, hi_nybble1); |
| 70 | bit45pair_1 = hsimd::packl(hi_nybble2, hi_nybble3); |
| 71 | bit67pair_0 = hsimd::packh(hi_nybble0, hi_nybble1); |
| 72 | bit67pair_1 = hsimd::packh(hi_nybble2, hi_nybble3); |
| 73 | // Transpose 4 bit-pairs streams to 8 bit streams. |
| 74 | bit0 = hsimd::packl(bit01pair_0, bit01pair_1); |
| 75 | bit1 = hsimd::packh(bit01pair_0, bit01pair_1); |
| 76 | bit2 = hsimd::packl(bit23pair_0, bit23pair_1); |
| 77 | bit3 = hsimd::packh(bit23pair_0, bit23pair_1); |
| 78 | bit4 = hsimd::packl(bit45pair_0, bit45pair_1); |
| 79 | bit5 = hsimd::packh(bit45pair_0, bit45pair_1); |
| 80 | bit6 = hsimd::packl(bit67pair_0, bit67pair_1); |
| 81 | bit7 = hsimd::packh(bit67pair_0, bit67pair_1); |
| 82 | }}} |
| 83 | |
| 84 | Overall, transposition requires 8 pack operations for each of the three transposition steps, for a total |
| 85 | of 24 operations for the entire process. |