ParabixTransform
on the availability of a family of horizontal packing operations. The operations 
required each have the IDISA pattern {{{hsimd<{2,4,8}>::pack{h,l}(e1, e2)}}}, 
in which {{{e1}}} and {{{e2}}} are 2^k^bit input registers, 
the field width of the fields processed is either 2, 4, or 8, and 
the packing operation selects the bits comprising either the high ({{{h}}}) or low ({{{l}}}) half of each field. The result in each case is a single 2^k^bit value 
comprising the packed bits that are selected. For example, 
{{{hsimd<2>::packh(e1, e2)}}} selects the high bit of each 2bit field 
in the concatenation of {{{e1}}} and {{{e2}}}, returning the packed set of 
2^k^ bits as a single 2^k^bit value. 
The following example illustrates this operation working with 16bit registers. 
{{{e1}}}{{{AaBbCcDd}}}{{{EeFfGgHh}}} 
{{{e2}}}{{{JjKkLlMm}}}{{{NnPpQqRr}}} 
{{{hsimd<2>::packh(e1, e2)}}}{{{ABCDEFGH}}}{{{JKLMNPQR}}} 
Similarly, {{{hsimd<8>::packl(e1, e2)}}} 
selects the low 4bits of each 8bit field in the concatenation of {{{e1}}} and {{{e2}}}, 
again returning the result as a single 2^k^bit value, as illustrated by the following example. 
{{{e1}}}{{{AaBbCcDd}}}{{{EeFfGgHh}}} 
{{{e2}}}{{{JjKkLlMm}}}{{{NnPpQqRr}}} 
{{{hsimd<8>::packh(e1, e2)}}}{{{CdDdGgHh}}}{{{LlMmQqRr}}} 

Using these operations it is possible to perform transposition in a 
straightforward fashion. Given a 2^k^byte sequence held consecutively 
in 8 registers {{{s0}}}, {{{s1}}}, … {{{s7}}}, the following 
3step transformation process performs transposition to parallel bit streams.