Changeset 981 for docs/PACT2011

Mar 23, 2011, 7:08:29 PM (9 years ago)


1 edited


  • docs/PACT2011/01-intro.tex

    r968 r981  
    1414or back-room, that doesn't process XML sometimes."
    16 With all this XML processing, the performance of XML parsers
    17 has attracted considerable attention.
    18 Briefly discuss/cite XML chip, XML FPGA, multicore XML acceleration
    19 Our own work ...
     16With all this XML processing, a substantial literature has arisen
     17addressing XML processing performance in general and the
     18performance of XML parsers in particular.   Nicola and John
     19specifically identified XML parsing as a threat to database
     20performance and outlined a number of  potential directions for potential
     21performance improvements \cite{NicolaJohn03}.  The nature XML
     22APIs was found to have a significant affect on performance with
     23event-based SAX (Simple API for XML) parsers avoiding the tree
     24construction costs of the more flexible DOM (Document Object
     25Model) parsers \cite{Perkins05}.  The commercial importance
     26of XML parsing spurred developments of hardware-based approaches
     27including the development of a custom XML chip \cite{Leventhal2009}
     28as well as FPGA-based implementations \cite{DaiNiZhu2010}.
     29As promising as these approaches may be for particular niche applications,
     30however, it is still likely that the bulk of the world's XML
     31processing workload will be carried out on commodity processors
     32using software-based solutions.
    21 But what about energy?  How many joules per hour are spent in
    22 XML processing?  Do performance improvements necessarily translate
    23 into energy savings?   Or does multicore parallelism actually
    24 increase the energy cost with extra work required for interthread
    25 synchronization?
     34To accelerate XML parsing performance in software, most recent
     35work has focused on parallelization.  The use of multicore
     36parallelism for chip multiprocessors has attracted
     37the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, 10.1109/PDCAT.2009.64},
     38while SIMD (single-instruction multiple data) parallelism
     39has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
     40as well as to our group in developing parallel bit stream technology
     42Each of these approaches has shown considerable performance
     43benefits over traditional sequential parsing following the
     44byte-at-a-time model.
    27 In this paper, we study the issues of energy and performance
    28 of XML parsing.   [We would like to make the strong claim that
    29 of all approaches to parallel XML parsing, ours has the
    30 greatest potential for world-wide energy savings....]
     46With a focus on performance, however, relatively less attention
     47has been paid to reducing energy consumption.   For example, in addressing
     48performance through multicore parallelism, one generally has to
     49pay an energy price for performance gains because of the
     50increased processing required for synchronization.   
     51A focus on reduction of energy consumption is a key topic in this
     52paper, in which we study the energy and performance
     53characteristics of several XML parsers across three generations
     54of x86-64 processor technology.   A compelling result is that
     55the performance benefits of parallel bit stream technology
     56translate directly and proportionately to substantial energy savings.
     58The remainder of this paper is organized as follows.
     59Section 2 presents background material on XML parsing
     60and traditional parsing methods.   Section 3 then reviews
     61parallel bit stream technology as applied to
     62XML parsing in our Parabix1 and Parabix2 parsers.
     63Section 4 then introduces our methodology and approach
     64for the performance and energy study tackled in the
     65remainder of the paper.   Section 5 presents a
     66detailed performance evaluation on a Core i3 processor
     67as our primary evaluation platform, addressing a
     68number of microarchitectural issues including cache
     69misses, branch mispredictions, SIMD instruction counts
     70and so on.  Section 6 then looks at scalability and
     71performance gains through three generations of Intel
     72architecture culminating with performance assessment
     73on our one week-old Sandy Bridge test machine.
     74Section 7 looks specifically at issues in applying
     75the new 256-bit AVX technology to parallel bit stream
     76technology and notes that the major performance benefit
     77seen so far is a result of the change to 3-operand
     78instruction form.   Section 8 concludes the paper with
     79a discussion of ongoing work and further research directions.
Note: See TracChangeset for help on using the changeset viewer.