source: docs/HPCA2012/final_ieee/11-conclusions.tex @ 3121

Last change on this file since 3121 was 1775, checked in by cameron, 8 years ago

Minor fixes; figure placement

File size: 2.0 KB
3% In this paper we presented a framework.
4% We demonstrated on XML.
5% We showed benefits
6% We analyzed SIMD
7% We stacked multithreading
8% We have released it.
10% Future research
12This paper presents Parabix as a software runtime framework for
13exploiting SIMD data units found on commodity processors for text
14processing.  The Parabix framework allows programmers to focus on exposing the
15parallelism in their application assuming an infinite resource
16abstract SIMD machine without worrying about or having to change code
17to handle processor specifics (e.g., 128-bit SIMD SSE vs 256-bit SIMD
18on AVX). Parabix technology was applied to XML parsing
19to demonstrate the efficiency gains that can
20be obtained on commodity processors. Compared to the conventional XML
21parsers, Expat and Xerces, a 2$\times$---7$\times$
22improvement in performance and average 4$\times$ improvement in
23energy was achieved. Furthermore, computational efficiency was
24greatly increased, with an overall 9$\times$---15$\times$
25reduction in branches and 7$\times$---15$\times$ reduction in branch mispredictions.
27The Parabix framework and XML parsers was also used to study the
28features of the new 256-bit AVX extension in Intel processors.  While the move to
293-operand instructions delivers significant benefits, the
30advantage of loads and bitwise logic with 256 bits at a time was
31negated by the need to convert to 128 bit SIMD registers for
32integer operations.  We expect this will be remedied with AVX2.
33Intel's SIMD
34extensions were also compared with the ARM \NEON{}. Note that Parabix allowed us to
35perform these studies without having to change the application source.
36Finally, the Parabix XML parser was parallelized
37to take advantage of the SIMD units in every core on the chip, demonstrating that the
38benefits of thread-level-parallelism are complementary to the
39fine-grain parallelism we exploit.   In this study, our parallelized Parabix achieves a
40further 2$\times$ improvement in performance.
Note: See TracBrowser for help on using the repository browser.