# Changeset 1289

Ignore:
Timestamp:
Aug 8, 2011, 1:06:13 PM (8 years ago)
Message:

Minor edits to improve readability and flow.

File:
1 edited

### Legend:

Unmodified
 r1088 XML is a data format designed for documents as well as the representation of data structures.  The simplicity and generality of the rules make it widely used in web services and database systems.  Traditional XML parsers have been built around the byte-at-a-time model, in which they process every character token in the file in a sequential fashion.  Unfortunately, the byte-at-time sequential model is a performance barrier in demanding applications, and is also energy-inefficient, making poor use of the wide registers and other parallelism features in modern processors. XML is a set of rules for the encoding documents in machine-readable form. The simplicity and generality of the rules make it widely used in web services and database systems.  Traditional XML parsers are built around a byte-at-a-time processing model where each character token of an XML document is examined in sequence.  Unfortunately, the byte-at-a-time sequential model is a performance barrier in more demanding applications, is energy-inefficient, and makes poor use of the wide SIMD registers and other parallelism features of modern processors. This paper assesses the energy and performance of a new approach to XML parsing based on parallel bit stream technology.  This method first converts the character steams into sets of parallel bitstreams and then exploits SIMD operations prevalent on modern CPUs. The first generation Parabix1 parser then uses bit-scan instructions over these streams to make multibyte moves in an otherwise sequential to XML parsing, based on parallel bit stream technology, and as implemented on successive software generations of the Parabix XML parser. This method first converts the character streams into sets of parallel bit streams and then exploits SIMD operations prevalent on commodity-level hardware. The first generation Parabix1 parser exploits the processor built-in bit-scan instructions over these streams to make multibyte moves but follows an otherwise sequential approach.  The second generation Parabix2 technology adds further parallelism by replacing much of the sequential bit scanning with a parallel scanning approach based on bit-stream bit scanning with a parallel scanning approach based on bit stream addition.  We evaluate Parabix1 and Parabix2 against two widely-used XML parsers, James Clark's Expat and Apache's Xerces on three generations of x86 machines, including the new Intel against two widely used XML parsers, James Clark's Expat and Apache's Xerces, and across three generations of x86 machines, including the new Intel \SB{}.  We show that Parabix2's speedup is 2$\times$--7$\times$ over Expat and Xerces.  In stark contrast to the energy expenditures necessary to realize performance gains through multicore parallelism, we also show that our Parabix parsers deliver energy savings directly in proportion to performance gains.  We also assess the scalability advantages of SIMD processor improvements the different Intel machine generations, that our Parabix parsers deliver energy savings in direct proportion to the gains in performance.  In addition, we assess the scalability advantages of SIMD processor improvements across Intel processor generations, culminating with an evaluation of the 256-bit AVX technology in \SB{} vs. the now legacy 128-bit SSE technology. \SB{} versus the now legacy 128-bit SSE technology.