source: docs/HPCA2012/11-conclusions.tex @ 5826

Last change on this file since 5826 was 1652, checked in by ksherdy, 8 years ago

Minor edit. Prefer 128-bit over 128 bit, 256-bit over 256 bit.

File size: 2.0 KB
3% In this paper we presented a framework.
4% We demonstrated on XML.
5% We showed benefits
6% We analyzed SIMD
7% We stacked multithreading
8% We have released it.
10% Future research
12In this paper we presented Parabix a software runtime framework for
13exploiting SIMD data units found on commodity processors for text
14processing.  The Parabix framework allows to focus on exposing the
15parallelism in their application assuming an infinite resource
16abstract SIMD machine without worrying about or having to change code
17to handle processor specifics (e.g., 128-bit SIMD SSE vs 256-bit SIMD
18on AVX). We applied Parabix technology to a widely deployed
19application; XML parsing and demonstrate the efficiency gains that can
20be obtained on commodity processors. Compared to the conventional XML
21parsers, Expat and Xerces, we achieve 2$\times$---7$\times$
22improvement in performance and average 4$\times$ improvement in
23energy. We achieve high compute efficiency with an overall 9$\times$---15$\times$
24reduction in branches, 7$\times$---15$\times$ reduction in branch mispredictions,
25% ?\times$ reduction in LLC misses, and increase in data parallelism
26processing up to 128 characters with a single operation. We used the
27Parabix framework and XML parsers to study the features of the new 256
28bit AVX extension in Intel processors. We find that while the move to
293-operand instructions deliver significant benefit the wider
30operations in some cases have higher overheads compared to the
31existing 128-bit SSE operations. We also compare Intel's SIMD
32extensions against the ARM \NEON{}. Note that Parabix allowed us to
33perform these studies without having to change the application source.
34Finally, we parallelized the Parabix XML parser to take advantage of
35the SIMD units in every core on the chip. We demonstrate that the
36benefits of thread-level-parallelism are complementary to the
37fine-grain parallelism we exploit; parallelized Parabix achieves a
38further 2$\times$ improvement in performance.
Note: See TracBrowser for help on using the repository browser.