Version 2 (modified by cameron, 4 years ago) (diff)


The Parabix Transform

The Parabix Transform transposes ordinary character stream data into sets of bit-parallel data streams, each stream containing one bit per original character code unit position. Most often, byte-oriented streams of data in representations such as ASCII, ISO Latin 1 or UTF-8 are transposed into sets of 8 parallel bit streams, one stream for each bit of the corresponding code units. However, the concept also applies for character data streams using larger code units, such as those using the 16-bit code units of UTF-16 or the 32-bit code units of UTF-32. In each case, the transposed representation consists of one bit stream each for each bit position in the code units. In the case of UTF-32, however, implementations typically produce only 21 bit streams, as the high 11 bits of each 32-bit code unit are always zero.

Implementing the Parabix Transform

Depending on the available processor architecture, there are many ways that the Parabix Transform may be implemented. Although the transformation can be performed in a serial fashion extracting one bit at a time from each byte, the overhead of the serial approach greatly limits its usefulness. In this section, we concentrate instead on parallel methods that impose a relatively small overhead on the processing of character data streams.

Ideal Implementation Using SIMD Packing

Given a processor architecture that supports binary SIMD operations on 2k-bit registers, an ideal implementation of the Parabix Transposition can be based on the availability of a family of horizontal packing operations. The operations required each have the IDISA pattern hsimd<{2,4,8}>::pack{h,l}(e1, e2), in which e1 and e2 are 2k-bit input registers, the field width of the fields processed is either 2, 4, or 8, and the packing operation selects the bits comprising either the high (h) or low (l) half of each field. The result in each case is a single 2k-bit value comprising the packed bits that are selected. For example, hsimd<2>::packh(e1, e2) selects the high bit of each 2-bit field in the concatenation of e1 and e2, returning the packed set of 2k bits as a single 2k-bit value. The following example illustrates this operation working with 16-bit registers.

hsimd<2>::packh(e1, e2)ABCDEFGHJKLMNPQR

Similarly, hsimd<8>::packl(e1, e2) selects the low 4-bits of each 8-bit field in the concatenation of e1 and e2, again returning the result as a single 2k-bit value, as illustrated by the following example.

hsimd<8>::packh(e1, e2)CdDdGgHhLlMmQqRr

Using these operations it is possible to perform transposition in a straightforward fashion. Given a 2k-byte sequence held consecutively in 8 registers s0, s1, … s7, the following 3-step transformation process performs transposition to parallel bit streams.

Implementation of Ideal Packing Using Bit Shuffles