source: docs/HPCA2011/01-intro.tex @ 1302

Last change on this file since 1302 was 1302, checked in by lindanl, 8 years ago

Create a directory for HPCA

File size: 6.8 KB
RevLine 
[1302]1\section{Introduction}
2
3Extensible Markup Language (XML) is a core technology standard
4of the World Wide Web Consortium (W3C) that provides a common
5framework for encoding and communicating structured information. 
6In applications ranging from Office Open XML in
7Microsoft Office to NDFD XML of the NOAA National Weather
8Service, from KML in Google Earth to Castor XML in the Martian Rovers,
9from ebXML for e-commerce data interchange to RSS for news feeds
10from web sites everywhere, XML plays a ubiquitous role in providing
11a common framework for data interoperability world-wide and beyond.
12As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
13"there is essentially no computer in the world, desk-top, hand-held,
14or back-room, that doesn't process XML sometimes."
15
16With all this XML processing, a substantial literature has arisen
17addressing XML processing performance in general and the
18performance of XML parsers in particular.   Nicola and John
19specifically identified XML parsing as a threat to database
20 performance and outlined a number of potential directions for potential
21performance improvements \cite{NicolaJohn03}.  The nature of XML
22APIs was found to have a significant affect on performance with
23event-based SAX (Simple API for XML) parsers avoiding the tree
24construction costs of the more flexible DOM (Document Object
25Model) parsers \cite{Perkins05}.  The commercial importance
26of XML parsing spurred developments of hardware-based approaches
27including the development of a custom XML chip \cite{Leventhal2009}
28as well as FPGA-based implementations \cite{DaiNiZhu2010}.
29However promising these approaches may be for particular niche applications,
30it is likely that the bulk of the world's XML
31processing workload will be carried out on commodity processors
32using software-based solutions.
33
34To accelerate XML parsing performance in software, most recent
35work has focused on parallelization.  The use of multicore
36parallelism for chip multiprocessors has attracted
37the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
38while SIMD (Single Instruction Multiple Data) parallelism
39has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
40, as well as to the developers of parallel bit stream technology
41\cite{CameronHerdyLin2008,Cameron2009,Cameron2010}.
42Each of these approaches has shown considerable performance
43benefits over traditional sequential parsing techniques that follow the
44byte-at-a-time model.
45
46With this focus on performance however, relatively little attention
47has been paid on reducing energy consumption in XML processing.  For example, in addressing
48performance through multicore parallelism, one generally must
49pay an energy price for performance gains because of the
50increased processing required for synchronization.   
51This focus on reduction of energy consumption is a key topic in this
52paper. We study the energy and performance
53characteristics of several XML parsers across three generations
54of x86-64 processor technology.  The parsers we consider are
55the widely used byte-at-a-time parsers Expat and Xerces, as well the
56Parabix1 and Parabix2 parsers based on parallel bit stream technology. 
57A compelling result is that the performance benefits of parallel bit stream technology
58translate directly and proportionally to substantial energy savings.
59Figure \ref{perf-energy} is an energy-performance scatter plot
60showing the results obtained.
61
62\begin{figure}
63\begin{center}
64\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
65\end{center}
66\caption{XML Parser Technology Energy vs. Performance}
67\label{perf-energy}
68\end{figure}
69
70The remainder of this paper is organized as follows.
71Section 2 presents background material on XML parsing
72and traditional parsing methods.  Section 3 reviews
73parallel bit stream technology as applied to
74XML parsing in the Parabix1 and Parabix2 parsers.
75Section 4 introduces our methodology and approach
76for the performance and energy study tackled in the
77remainder of the paper.  Section 5 presents a
78detailed performance evaluation on a \CITHREE\ processor
79as our primary evaluation platform, addressing a
80number of microarchitectural issues including cache
81misses, branch mispredictions, SIMD instruction counts
82and so forth.  Section 6 examines scalability and
83performance gains through three generations of Intel
84architecture culminating with a performance assessment
85on our two week-old \SB\ test machine.
86Section 7 looks specifically at issues in applying
87the new 256-bit AVX technology to parallel bit stream
88technology and notes that the major performance benefit
89seen so far results from the change to the non-destructive three-operand
90instruction format.  Section 8 concludes with
91a discussion of ongoing work and further research directions.
92
93
94%Traditional measures of performance fail to capture the impact of energy consumption \cite {bellosa2001}.
95%In a study done in 2007, it was estimated that in 2005, the annual operating cost\footnote{This figure only included the cost of server power consumption and cooling;
96%it did not account for the cost of network traffic, data storage, service and maintenance or system replacement.} of corporate servers
97%and data centers alone was over \$7.2 billion---with the expectation that this cost would increase to \$12.7 billion by 2010 \cite{koomey2007}.
98%But when it comes to power consumption, corporate costs are not the only concern: in the world of mobile devices, battery life is paramount.
99%While the capabilities and users' expectations of mobile devices has rapidly increased, little imp%rovement to battery technology itself is foreseen in the near future \cite{silven2007, walker2007}.
100
101%One area in which both servers and mobile devices devote considerable
102%computational effort into is in the processing of Extensible Markup
103%Language (XML) documents.  It was predicted that corporate servers
104%would see a ``growth in XML traffic\ldots from 15\% [of overall
105%network traffic] in 2004 to just under 48\% by 2008''
106%\cite{coyle2005}.  Further, ``from the point of view of server
107%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
108%computing workload'' \cite{leventhal2009}.  In other words, XML is the
109%quickly becoming the backbone of most server/server and client/server
110%%information exchanges.  Similarly, there is growing interest in the
111%use of mobile web services for personalization, context-awareness, and
112%content-adaptation of mobile web sites---most of which rely on XML
113%\cite{canali2009}.  Whether the end user realizes it or not, XML is
114%part of their daily life.
115
116%Why are XML parsers important ?
117%Talk about XML parsers and what they do in general.
118%Brief few lines about byte-at-time ?
119%What's new with Parabix style approach ?
120%Introduce Parabix1 and Parabix2 ?
121%Present overall quantiative improvements compared to other parsers.
122
123
124
125
Note: See TracBrowser for help on using the repository browser.