Index: /docs/ASPLOS09/asplos094cameron.tex
===================================================================
 /docs/ASPLOS09/asplos094cameron.tex (revision 242)
+++ /docs/ASPLOS09/asplos094cameron.tex (revision 243)
@@ 897,7 +897,8 @@
of versions of the \verb#simd<8>::mergeh# and \verb#simd<8>::mergel#
operations that are available with each of the SSE and Altivec instruction
sets. These algorithms take 72 operations to perform the
inverse transposition of 8 parallel registers of bit stream
data into 8 serial registers of byte stream data.
+sets. To perform the full inverse transform of 8 parallel
+registers of bit stream data into 8 serial registers of byte stream data,
+a RefA implementation requires 120 operations, while a RefB
+implementation reduces this to 72.
\begin{figure}[tbh]
@@ 934,6 +935,5 @@
\end{figure}
An algorithm employing only 24 operations using the
inductive doubling instruction set architecture is relatively
+An algorithm employing only 24 operations using IDISAA/B is relatively
straightforward.. In stage 1, parallel registers for individual bit streams
are first merged with bitlevel interleaving
@@ 963,6 +963,4 @@
parallel bit stream form can then each be used at will in
character stream applications.

