source: docs/PACT2011/00-abstract.tex @ 954

Last change on this file since 954 was 954, checked in by lindanl, 9 years ago

Add more charts, modified abstract and some other minor changes

File size: 2.1 KB
Line 
1XML is a data format designed for documents as well as the
2representation of data structures. The simplicity and generality of
3the rules make it widely used in web services and database
4systems. Traditional XML parsers have been built around the
5byte-at-a-time model, in which they process every character token in
6the file in a sequential fashion. Unfortunately, the byte-at-time
7sequential model is a fundamental hindrance on performance and and in
8some cases can add up 100\% overhead to the database queries
9themselves.
10
11In this paper, we propose a new XML parser, Parabix, based on parallel
12bit stream technology, which converts the character strings into
13bitstreams and then exploits SIMD operations prevalent on modern CPUs.
14The first generation parser that we developed, Parabix1, uses the
15bitscan and bitlevel sequencing SIMD operations to emulate much of the
16parsers functions. Unfortunately operations like bitscan are
17inherently sequential nature and Parabix1's speedup is limited. We
18present a second generation parser, Parabix2, that fully parallelizes
19the parsing operations using using parallel bitlevel logic provided in
20modern SIMD extensions like SSE2.  We evaluate Parabix1and Parabix2
21against two widely-used XML parsers, Apache's Expat and IBM's Xerces
22on three generations of x86 machines, including the new Intel
23Sandybridge. We show that Parabix2's speedup is 2$\times$---8$\times$
24over Expat and Xerces. Across the different Intel machine generations,
25Parabix rides the scalability curve of SIMD operations whose
26performance inherently scales better than traditional sequential
27thread performance. Comparing Intel's new Sandbrige core with the Core
28i3 we observed performance improvement between 20---60\% for our
29Parabix parsers while sequential parsers like Xerces improve by
30$<$20\%. We measure real CPU power to demonstrate that Parabix also
31brings with itself significant energy efficiency. On the core i3,
32Parabix consumes $\simeq$4nJ per byte parsed while Xerces consumes
33$\simeq$20nJ per byte parsed. Finally, we perform a case study of the
34Intel's new 256-bit wide AVX instructions, and demonstrate that it
35provides X speedup over 128 bit SSE2 instruction set.
36
Note: See TracBrowser for help on using the repository browser.