source: docs/PACT2011/01-intro.tex @ 985

Last change on this file since 985 was 985, checked in by ksherdy, 9 years ago

Minor edit.

File size: 6.3 KB
Line 
1\section{Introduction}
2
3Extensible Markup Language (XML) is a core technology standard
4of the World-Wide Web Consortium (W3C) that provides a common
5framework for encoding and communicating structured information
6of all kinds.   In applications ranging from Office Open XML in
7Microsoft Office to NDFD XML of the NOAA National Weather
8Service, from KML in Google Earth to Castor XML in the Martian Rovers,
9from ebXML for e-commerce data interchange to RSS for news feeds
10from web sites everywhere, XML plays a ubiquitous role in providing
11a common framework for data interoperability world-wide and beyond.
12As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
13"there is essentially no computer in the world, desk-top, hand-held,
14or back-room, that doesn't process XML sometimes."
15
16With all this XML processing, a substantial literature has arisen
17addressing XML processing performance in general and the
18performance of XML parsers in particular.   Nicola and John
19specifically identified XML parsing as a threat to database
20 performance and outlined a number of potential directions for potential
21performance improvements \cite{NicolaJohn03}.  The nature of XML
22APIs was found to have a significant affect on performance with
23event-based SAX (Simple API for XML) parsers avoiding the tree
24construction costs of the more flexible DOM (Document Object
25Model) parsers \cite{Perkins05}.  The commercial importance
26of XML parsing spurred developments of hardware-based approaches
27including the development of a custom XML chip \cite{Leventhal2009}
28as well as FPGA-based implementations \cite{DaiNiZhu2010}.
29As promising as these approaches may be for particular niche applications,
30however, it is still likely that the bulk of the world's XML
31processing workload will be carried out on commodity processors
32using software-based solutions.
33
34To accelerate XML parsing performance in software, most recent
35work has focused on parallelization.  The use of multicore
36parallelism for chip multiprocessors has attracted
37the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
38while SIMD (single-instruction multiple data) parallelism
39has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
40as well as to our group in developing parallel bit stream technology
41\cite{CameronHerdyLin2008,Cameron2009,Cameron2010}.
42Each of these approaches has shown considerable performance
43benefits over traditional sequential parsing following the
44byte-at-a-time model.
45
46With a focus on performance, however, relatively less attention
47has been paid to reducing energy consumption.   For example, in addressing
48performance through multicore parallelism, one generally has to
49pay an energy price for performance gains because of the
50increased processing required for synchronization.   
51A focus on reduction of energy consumption is a key topic in this
52paper, in which we study the energy and performance
53characteristics of several XML parsers across three generations
54of x86-64 processor technology.   A compelling result is that
55the performance benefits of parallel bit stream technology
56translate directly and proportionately to substantial energy savings.
57
58The remainder of this paper is organized as follows.
59Section 2 presents background material on XML parsing
60and traditional parsing methods.   Section 3 then reviews
61parallel bit stream technology as applied to
62XML parsing in our Parabix1 and Parabix2 parsers.
63Section 4 then introduces our methodology and approach
64for the performance and energy study tackled in the
65remainder of the paper.   Section 5 presents a
66detailed performance evaluation on a Core i3 processor
67as our primary evaluation platform, addressing a
68number of microarchitectural issues including cache
69misses, branch mispredictions, SIMD instruction counts
70and so on.  Section 6 then looks at scalability and
71performance gains through three generations of Intel
72architecture culminating with performance assessment
73on our one week-old Sandy Bridge test machine.
74Section 7 looks specifically at issues in applying
75the new 256-bit AVX technology to parallel bit stream
76technology and notes that the major performance benefit
77seen so far is a result of the change to 3-operand
78instruction form.   Section 8 concludes the paper with
79a discussion of ongoing work and further research directions.
80
81
82%Traditional measures of performance fail to capture the impact of energy consumption \cite {bellosa2001}.
83%In a study done in 2007, it was estimated that in 2005, the annual operating cost\footnote{This figure only included the cost of server power consumption and cooling;
84%it did not account for the cost of network traffic, data storage, service and maintenance or system replacement.} of corporate servers
85%and data centers alone was over \$7.2 billion---with the expectation that this cost would increase to \$12.7 billion by 2010 \cite{koomey2007}.
86%But when it comes to power consumption, corporate costs are not the only concern: in the world of mobile devices, battery life is paramount.
87%While the capabilities and users' expectations of mobile devices has rapidly increased, little imp%rovement to battery technology itself is foreseen in the near future \cite{silven2007, walker2007}.
88
89%One area in which both servers and mobile devices devote considerable
90%computational effort into is in the processing of Extensible Markup
91%Language (XML) documents.  It was predicted that corporate servers
92%would see a ``growth in XML traffic\ldots from 15\% [of overall
93%network traffic] in 2004 to just under 48\% by 2008''
94%\cite{coyle2005}.  Further, ``from the point of view of server
95%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
96%computing workload'' \cite{leventhal2009}.  In other words, XML is the
97%quickly becoming the backbone of most server/server and client/server
98%%information exchanges.  Similarly, there is growing interest in the
99%use of mobile web services for personalization, context-awareness, and
100%content-adaptation of mobile web sites---most of which rely on XML
101%\cite{canali2009}.  Whether the end user realizes it or not, XML is
102%part of their daily life.
103
104%Why are XML parsers important ?
105%Talk about XML parsers and what they do in general.
106%Brief few lines about byte-at-time ?
107%What's new with Parabix style approach ?
108%Introduce Parabix1 and Parabix2 ?
109%Present overall quantiative improvements compared to other parsers.
110
111
112
113
Note: See TracBrowser for help on using the repository browser.