Changes between Version 7 and Version 8 of ParabixTransform


Ignore:
Timestamp:
Apr 22, 2014, 6:20:53 PM (3 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ParabixTransform

    v7 v8  
    11= The Parabix Transform =
     2
     3[[PageOutline]]
    24
    35The Parabix Transform transposes ordinary character stream data into sets of bit-parallel data streams, each stream containing one bit per original character code unit position.
     
    57using larger code units, such as those using the 16-bit code units of UTF-16 or the 32-bit code units of UTF-32.  In each case, the transposed representation consists of one bit stream each for each bit position in the code units.   In the case of UTF-32, however, implementations typically produce only 21 bit streams, as the high 11 bits of each 32-bit code unit are always zero.
    68
    7 == Implementing the Parabix Transform ==
    8 
    99Depending on the available processor architecture, there are many ways that the Parabix
    1010Transform may be implemented.  Although the transformation can be performed in a serial
    1111fashion extracting one bit at a time from each byte, the overhead of the serial approach greatly limits its usefulness.   In this section, we concentrate instead on parallel methods that impose a relatively small overhead on the processing of character data streams.
    1212
    13 === Ideal Three-Stage Parallel Transposition for Byte Streams ===
     13== Ideal Three-Stage Parallel Transposition for Byte Streams ==
    1414
    1515The Parabix Transform to transform byte-oriented character stream data to bit-parallel data streams
     
    168168}
    169169}}}
     170
     171== Byte Pack Implementations ==
     172
     173Although SIMD units typically do not provide direct bit packing implementations,
     174''byte packing'' operations that extract bytes from 16-bit fields are common.
     175For example the SSE2 instruction {{{packuswb}}} packs 16-byte integers into
     176bytes using unsigned saturation.   By clearing the high 8 bits of the 16-bit
     177field, this operation can be efficiently used to implement both the
     178{{{hsimd<16>::packh}}} and {{{hsimd<16>::packl}}} operations.
     179
     180{{{
     181template <>  bitblock_t hsimd<16>::packh(bitblock_t arg1, bitblock_t arg2)
     182{
     183  return _mm_packus_epi16(_mm_srli_epi16(arg2, 8), _mm_srli_epi16(arg1, 8));
     184}
     185
     186template <> bitblock_t hsimd<16>::packl(bitblock_t arg1, bitblock_t arg2)
     187{
     188  const bitblock_t lomask = simd<16>::constant(0x00FF);       
     189  return _mm_packus_epi16(simd_and(arg2, lomask), simd_and(arg1, lomask));
     190}
     191}}}
     192
     193