source: docs/PACT2011/00-abstract.tex @ 991

Last change on this file since 991 was 984, checked in by ksherdy, 9 years ago

Minor edits.

File size: 2.2 KB
1XML is a data format designed for documents as well as the
2representation of data structures. The simplicity and generality of
3the rules make it widely used in web services and database
4systems. Traditional XML parsers have been built around the
5byte-at-a-time model, in which they process every character token in
6the file in a sequential fashion. Unfortunately, the byte-at-time
7sequential model is a fundamental hindrance on performance and in
8some cases can add up 100\% overhead to the database queries
11In this paper, we propose a new XML parser, Parabix, based on parallel
12bit stream technology, which converts the character strings into
13bitstreams and then exploits SIMD operations prevalent on modern CPUs.
14The first generation parser that we developed, Parabix1, uses the
15bitscan and bit level sequencing SIMD operations to emulate much of the
16parsers functions. Unfortunately operations like bitscan are
17inherently sequential in nature and Parabix1's speedup is limited. We
18present a second generation parser, Parabix2, that fully parallelizes
19the parsing operations using using parallel bit level logic provided in
20modern SIMD extensions like SSE2.  We evaluate Parabix1 and Parabix2
21against two widely-used XML parsers, James Clark's Expat and Apache's Xerces
22on three generations of x86 machines, including the new Intel
23Sandy Bridge. We show that Parabix2's speedup is 2$\times$---8$\times$
24over Expat and Xerces. Across the different Intel machine generations,
25Parabix rides the scalability curve of SIMD operations whose
26performance inherently scales better than traditional sequential
27thread performance. Comparing Intel's new Sandy Bridge core with the Core
28i3 we observed performance improvement between 20---60\% for our
29Parabix parsers while sequential parsers like Xerces improve by
30$<$20\%. We measure real CPU power to demonstrate that Parabix also
31brings with itself significant energy efficiency. On the core i3,
32Parabix consumes $\simeq$4nJ per byte parsed while Xerces consumes
33$\simeq$20nJ per byte parsed. Finally, we perform a case study of the
34Intel's new 256-bit wide AVX instructions, and demonstrate that it
35provides X speedup over 128 bit SSE2 instruction set.
Note: See TracBrowser for help on using the repository browser.