source: docs/PACT2011/01-intro.tex @ 1024

Last change on this file since 1024 was 998, checked in by cameron, 9 years ago

Figure 1 in plots/

File size: 6.8 KB
Line 
1\section{Introduction}
2
3Extensible Markup Language (XML) is a core technology standard
4of the World-Wide Web Consortium (W3C) that provides a common
5framework for encoding and communicating structured information
6of all kinds.   In applications ranging from Office Open XML in
7Microsoft Office to NDFD XML of the NOAA National Weather
8Service, from KML in Google Earth to Castor XML in the Martian Rovers,
9from ebXML for e-commerce data interchange to RSS for news feeds
10from web sites everywhere, XML plays a ubiquitous role in providing
11a common framework for data interoperability world-wide and beyond.
12As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
13"there is essentially no computer in the world, desk-top, hand-held,
14or back-room, that doesn't process XML sometimes."
15
16With all this XML processing, a substantial literature has arisen
17addressing XML processing performance in general and the
18performance of XML parsers in particular.   Nicola and John
19specifically identified XML parsing as a threat to database
20 performance and outlined a number of potential directions for potential
21performance improvements \cite{NicolaJohn03}.  The nature of XML
22APIs was found to have a significant affect on performance with
23event-based SAX (Simple API for XML) parsers avoiding the tree
24construction costs of the more flexible DOM (Document Object
25Model) parsers \cite{Perkins05}.  The commercial importance
26of XML parsing spurred developments of hardware-based approaches
27including the development of a custom XML chip \cite{Leventhal2009}
28as well as FPGA-based implementations \cite{DaiNiZhu2010}.
29As promising as these approaches may be for particular niche applications,
30however, it is still likely that the bulk of the world's XML
31processing workload will be carried out on commodity processors
32using software-based solutions.
33
34To accelerate XML parsing performance in software, most recent
35work has focused on parallelization.  The use of multicore
36parallelism for chip multiprocessors has attracted
37the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
38while SIMD (single-instruction multiple data) parallelism
39has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
40as well as to our group in developing parallel bit stream technology
41\cite{CameronHerdyLin2008,Cameron2009,Cameron2010}.
42Each of these approaches has shown considerable performance
43benefits over traditional sequential parsing following the
44byte-at-a-time model.
45
46With a focus on performance, however, relatively less attention
47has been paid to reducing energy consumption.   For example, in addressing
48performance through multicore parallelism, one generally has to
49pay an energy price for performance gains because of the
50increased processing required for synchronization.   
51A focus on reduction of energy consumption is a key topic in this
52paper, in which we study the energy and performance
53characteristics of several XML parsers across three generations
54of x86-64 processor technology.   The parsers we consider are
55the widely used byte-at-a-time parsers Expat and Xerces as well our
56own Parabix1 and Parabix2 parsers. 
57A compelling result is that
58the performance benefits of parallel bit stream technology
59translate directly and proportionately to substantial energy savings.
60Figure \ref{perf-energy} is an energy-performance scatter plot
61showing the results we obtain for the four parsers.
62
63\begin{figure}
64\begin{center}
65\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
66\end{center}
67\caption{Energy vs. Performance for Four XML Parsers}
68\label{perf-energy}
69\end{figure}
70
71
72
73The remainder of this paper is organized as follows.
74Section 2 presents background material on XML parsing
75and traditional parsing methods.   Section 3 then reviews
76parallel bit stream technology as applied to
77XML parsing in our Parabix1 and Parabix2 parsers.
78Section 4 then introduces our methodology and approach
79for the performance and energy study tackled in the
80remainder of the paper.   Section 5 presents a
81detailed performance evaluation on a Core i3 processor
82as our primary evaluation platform, addressing a
83number of microarchitectural issues including cache
84misses, branch mispredictions, SIMD instruction counts
85and so on.  Section 6 then looks at scalability and
86performance gains through three generations of Intel
87architecture culminating with performance assessment
88on our one week-old Sandy Bridge test machine.
89Section 7 looks specifically at issues in applying
90the new 256-bit AVX technology to parallel bit stream
91technology and notes that the major performance benefit
92seen so far is a result of the change to 3-operand
93instruction form.   Section 8 concludes the paper with
94a discussion of ongoing work and further research directions.
95
96
97%Traditional measures of performance fail to capture the impact of energy consumption \cite {bellosa2001}.
98%In a study done in 2007, it was estimated that in 2005, the annual operating cost\footnote{This figure only included the cost of server power consumption and cooling;
99%it did not account for the cost of network traffic, data storage, service and maintenance or system replacement.} of corporate servers
100%and data centers alone was over \$7.2 billion---with the expectation that this cost would increase to \$12.7 billion by 2010 \cite{koomey2007}.
101%But when it comes to power consumption, corporate costs are not the only concern: in the world of mobile devices, battery life is paramount.
102%While the capabilities and users' expectations of mobile devices has rapidly increased, little imp%rovement to battery technology itself is foreseen in the near future \cite{silven2007, walker2007}.
103
104%One area in which both servers and mobile devices devote considerable
105%computational effort into is in the processing of Extensible Markup
106%Language (XML) documents.  It was predicted that corporate servers
107%would see a ``growth in XML traffic\ldots from 15\% [of overall
108%network traffic] in 2004 to just under 48\% by 2008''
109%\cite{coyle2005}.  Further, ``from the point of view of server
110%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
111%computing workload'' \cite{leventhal2009}.  In other words, XML is the
112%quickly becoming the backbone of most server/server and client/server
113%%information exchanges.  Similarly, there is growing interest in the
114%use of mobile web services for personalization, context-awareness, and
115%content-adaptation of mobile web sites---most of which rely on XML
116%\cite{canali2009}.  Whether the end user realizes it or not, XML is
117%part of their daily life.
118
119%Why are XML parsers important ?
120%Talk about XML parsers and what they do in general.
121%Brief few lines about byte-at-time ?
122%What's new with Parabix style approach ?
123%Introduce Parabix1 and Parabix2 ?
124%Present overall quantiative improvements compared to other parsers.
125
126
127
128
Note: See TracBrowser for help on using the repository browser.