 r1370 mispredictions. Parabix's data reorganization significantly improves the overall cache miss rate. We experience 7$\times$ less misses than Expat and 25 $\times$ less misses than Xerces at the L1 and 104$\times$ less misses than Expat and 15 $\times$ less misses than Xerces at the L2 level. The improved cache Expat and 25$\times$ less misses than Xerces at the L1 and 104$\times$ less misses than Expat and 15$\times$ less misses than Xerces at the L2 level. The improved cache utilization keeps the SIMD units busy and prevent memory related stalls. Note that cache misses also cause increased application energy eliminates many branches. Further optimizations take advantage of Parabix's data organization and replace condition branches with {\em bit scan} operations that can process up to 64 characters worth of bit scan} operations that can process up to 128 characters worth of branches with one operation. In many cases, we also replace the branches with logical predicate operations. Our predicate are cheaper As shown in Figure \ref{corei3_BR}, Parabix processing is almost branch free. Parabix exhibits minimal dependence on source XML markup density; it experiences a constant number of branch mispredictions irrespective of the input. The cost of dependence on source XML markup density; it experiences between 19.5 and 30.7 branch mispredictions per thousand of XML byte. The cost of branch mispredictions for the Expat parser can be over 7 cycles per XML byte (see Figure \ref{corei3_BM}) ---this cost alone is higher
