of versions of the \verb#simd<8>::mergeh# and \verb#simd<8>::mergel#
operations that are available with each of the SSE and Altivec instruction
+sets. To perform the full inverse transform of 8 parallel
+registers of bit stream data into 8 serial registers of byte stream data,
+a RefA implementation requires 120 operations, while a RefB
+implementation reduces this to 72.
\begin{figure}[tbh]
\end{figure}
+An algorithm employing only 24 operations using IDISAA/B is relatively
straightforward.. In stage 1, parallel registers for individual bit streams
are first merged with bitlevel interleaving
parallel bit stream form can then each be used at will in
character stream applications.

