 r4922 \documentclass[runningheads,a4paper]{llncs} \documentclass[runningheads]{llncs} \usepackage{tikz} \usetikzlibrary{shapes,positioning,arrows,calc,fadings} \maketitle \begin{abstract} Bitwise data parallelism using short vector (SIMD) instructions has recently been shown to have considerable promise In performance comparisons with several contemporary alternatives, 10$\times$ or better speedups are often observed. \vskip 10pt This is the authors' version of the paper published in Algorithms and Architectures for Parallel Processing, Wang, Guojun and Zomaya, Albert and Perez, Gregorio Martinez and Li, Kenli (eds), {\em Lecture Notes in Computer Science} {\b 9529}, Nov. 2015, pp 373-387, http://dx.doi.org/10.1007/978-3-319-27122-4\_26. The final publication is available at link.springer.com. \keywords{ SIMD } \end{abstract}