source: docs/PACT2011/PACT.tex @ 919

Last change on this file since 919 was 919, checked in by lindanl, 8 years ago

paper for PACT

File size: 15.9 KB
1\documentclass[10pt, twocolumn]{article}
6\title{Characterization of Performance and Energy Consumption of XML Parsing Technology}
9        Ken Herdy, Dan Lin, Nigel Medforth, Arrvindh Shriraman\\
10        Simon Fraser University\\
11        School of Computing Science\\ 
12        \{ksherdy,lindanl,medfort,arrvindh\_shriraman\}
18\section{Statement of Purpose}
19The purpose of this research is to investigate the performance and energy consumption characteristics of three C/C++ based, event-driven, stream-oriented XML parsers. The Parabix2 XML parser is compared against the Xerces-C++ parser and the Expat parser. Energy consumption is measured using the Fluke i410 current clamp. Hardware performance event data are gathered to gain further insight into the execution characteristics of each parser.
23%General Motivation: energy efficency -> mobile (battery life), server (cost/cooling)
25%IBM research groups
27Traditional measures of performance fail to capture the impact of energy consumption \cite {bellosa2001}. In a study done in 2007, it was estimated that in 2005, the annual operating cost\footnote{This figure only included the cost of server power consumption and cooling; it did not account for the cost of network traffic, data storage, service and maintenance or system replacement.} of corporate servers and data centers alone was over \$7.2 billion---with the expectation that this cost would increase to \$12.7 billion by 2010 \cite{koomey2007}. But when it comes to power consumption, corporate costs are not the only concern: in the world of mobile devices, battery life is paramount. While the capabilities and users' expectations of mobile devices has rapidly increased, little improvement to battery technology itself is foreseen in the near future \cite{silven2007, walker2007}.
29One area in which both servers and mobile devices devote considerable computational effort into is in the processing of Extensible Markup Language (XML) documents. It was predicted that corporate servers would see a ``growth in XML traffic\ldots from 15\% [of overall network traffic] in 2004 to just under 48\% by 2008'' \cite{coyle2005}. Further, ``from the point of view of server efficiency[,] XML\ldots is the closest thing there is to a ubiquitous computing workload'' \cite{leventhal2009}. In other words, XML is the quickly becoming the backbone of most server/server and client/server information exchanges. Similarly, there is growing interest in the use of mobile web services for personalization, context-awareness, and content-adaptation of mobile web sites---most of which rely on XML \cite{canali2009}. Whether the end user realizes it or not, XML is part of their daily life.
32%XML specific: proportion of traffic on XML; [balisaga]
33% Performance =/= energy efficient
38%Single sentence on parser selection: expat, xerces-c, parabix [cascon]
42XML documents tend to be verbose---especially in the case of SOAP and WSDL. Processing these documents typically requires parsing them from a text-based format into an application-specific one. Cameron et al.'s work in \cite{CameronHerdyLin2008} shows that both parser selection and markup density have a substantial impact in the computational cost of processing XML documents. The foundational work by Bellosa in \cite{bellosa2001} as well as more recent work in \cite {bircher2007, bertran2010} show that hardware-usage patterns has a significant impact in the energy consumption of a particular application; \cite{bellosa2001, bircher2007, bertran2010} further show that there is a strong correlation between specific performance events and energy usage---but the authors of each differ slightly in opinion as to which performance monitoring counters\footnote{Performance monitoring counters (PMCs) are special-purpose registers that are included in most modern microprocessors; they store the running count of specific hardware events, such as retired instructions, cache misses, branch mispredictions, and arithmetic-logic unit operations to name a few. They can be used to capture information about any program at run-time, under any workload, at a very fine granularity.} (PMCs) to use.
44In order to determine how and which performance factors influence energy consumption, we intend to use the Fluke i410 current clamp in conjunction with PMCs to compare the per parser invocation and per source XML byte energy usage of three XML parsers: Expat 2.0.1, Xerces-C++ 3.1.1 (SAX2), and Parabix2. All three parsers are C/C++ based, event-driven, stream-oriented XML parsers. The first two parsers employ traditional byte-at-a-time methods of parsing; these parsers were selected based on their popularity in the marketplace and the availability of source code for deeper analysis. The last parser is a high-performance SIMD-based parallel-bitstream XML parser developed by Cameron et al. \cite{CameronHerdyLin2008}. The Fluke i410 current clamp is a digital multimeter that reads the magnetic field of a live electrical cable to determine the current passing through it without affecting the underlying hardware.
46The results of \cite{CameronHerdyLin2008} showed that Parabix, the predecessor of Parabix2, was dramatically faster than both Expat 2.0.1 and Xerces-C++ 2.8.0. It is our expectation is that Parabix2 will outperform both Expat 2.0.1 and Xerces-C++ 3.1.1 in terms of energy consumption per source XML byte. This expectation is based on the relatively-branchless code composition of Parabix2 and the more-efficient utilization of last-level cache resources. The authors of \cite {bellosa2001, bircher2007, bertran2010} indicate that such factors have a considerable effect on overall energy consumption. Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage.
47%Hence, a focuses of our study is the manner in which straight line SIMD code influences energy usage.
51In this section, we describe our methodology for the measurements and investigation of XML parsing energy consumption and performance. In brief, for each of the XML parsers under study we propose to measure and evaluate the energy consumption required to carry out XML well-formedness checking, under a variety of workloads, and as executed on both mobile device and server hardware.
53To begin our study, we propose to first investigate each of the XML parsers in terms of the PMCs hardware events as listed in the following subsection. Based on previous key works \cite{bellosa2001, bertran2010, bircher2007}, we have chosen several key hardware performance events for which the authors indicate have a strong correlation to energy consumption. From these data, we hope to gain insight into the XML parser execution characteristics which most significantly contribute to overall energy consumption. Secondly, using the Fluke i410 current clamp meter, we plan to measure the total energy consumption required to complete XML well-formedness checking for each XML parser, on each hardware platform, and for each of a number of XML source files.
55% The use of performance counters for modeling power is not a new concept.
57%Although the microprocessor is typically the largest consumers of power, Bertran et al. found that the chipset, memory, I/O, and disk may can account for a significant of the total system energy consumption \cite{bertran2010}.
59%As such, through the selection of a representative subset of hardware performance events, as based on the combined works of  \cite{bellosa2001, bertran2010, bircher2007}, we hope to gain insight into the XML parser execution characteristics which contribute most significantly to overall energy consumption.
61The following subsections describe the XML parsers under study, XML workloads, the mobile device and server hardware architectures, PMC hardware events selected for measurement, and the Fluke i401 current clamp meter. The expected outcomes of this section are hardware performance counter measurements and total energy consumption measurements for each of XML parser, XML source file, and hardware combination.
64The XML parsing technologies selected for this study are the Parabix2, Xerces-C++, and Expat XML parsers.
65Parabix2 \cite{parabix2} (parallel bit streams for XML) is the second generation Parabix parser. Parabix2 is an open-source XML parser that leverages the SIMD capabilities of modern commodity processors; it employs the new parallelization techniques using parallel parsing with bit stream addition to deliver dramatic performance improvements over traditional byte-at-a-time parsing technology.
66Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a validating open source XML parser written in C++ by the Apache project.
67Expat version 2.0.1 \cite{expat} is a non-validating XML parser library written in C.
73File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
74File Type               & document              & document              & data          & data          & data   \\ \hline     
75File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
76Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
77Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
80\caption{XML Document Characteristics} 
84\subsubsection{XML Well-Formedness Checking}
85The XML specification defines an XML document as a text which is well-formed. That is, it satisfies a list of syntax rules provided in the specification.
86The definition of an XML document excludes texts which contain violations of well-formedness rules; they are simply not XML.
87A well-formed XML document must conform to the XML syntax rules. If all tags in a document are correctly formed and follow XML guidelines, then a document is considered as well formed.
89An XML well-formedness checking application is evaluated for each XML parsing technology. The decision to perform XML well-formedness checking is based on the following rational. First, an XML parser must provide well-formedness checking functionality.
90Secondly, this functionality indicates that an XML document meets the minimum requirements of being readable by computers but avoids any additional costs due to non-parsing related computation.
94Distinguishing between "document-oriented" XML and "data-oriented" XML is a popular way to describe the two basic classes of XML documents.
95Data-oriented XML is used as an interchange format. Document-oriented XML is used to impose structure on information that rarely fits neatly into a relational database--particularly information intended for publishing. Data-oriented XML are characterized by a higher markup density. Markup density is defined as the ratio of the total markup contained within an XML file to the total XML document size.  This metric may have substantial influence on the performance of XML parsing. As such we choose workloads with distinguishable markup densities.
97Table \ref{XMLDocChars} shows the document characteristics of the XML instances selected for this performance study.
98The jawiki.xml and dewiki.xml XML files represent document-oriented XML instances of Wikimedia books,
99written in German and Japanese, respectively. The remaining files are data-oriented.
100The roads.gml file is an instance of Geography Markup Language (GML), a modeling language for geographic
101systems as well as an open interchange format for geographic transactions on the Internet.
102The po.xml file is an example of purchase order data, while the soap.xml file contains a large SOAP message.
103This markup density metric is reported for each document.\cite{CameronHerdyLin2008}
105\subsection{Platform Hardware}
106\subsubsection{Mobile - ARM}
107The Advanced RISC Machine (ARM) is a 32-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by ARM Holdings.
108ARM processors are used extensively in mobile phones. About 98 percent of the more than one billion mobile phones sold on 2005 use at least one ARM processor \cite{arm}.
109Table \ref{arm} gives the hardware description of the ARM based Samsung Galaxy Tablet selected.
114Processor & ARM Cortex-A8 (1.0GHz) \\ \hline
115L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline
116L2 Cache &  TBD\\ \hline
117Memory & 512M   \\ \hline       
118Storage & 16G \\ \hline
122\caption{Samsung Galaxy Tablet} 
126\subsubsection{Server - Intel Core i3}
127The Intel Core i3 is a Nehalem based processor produced by Intel. The intent of this processor is to serve as a
128low end server processor. Table \ref{i3} gives the hardware description of the Intel Core i3 based machine selected.
134Processor & Clarkdale I3-530 (2.93GHz) \\ \hline
135L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline 
136L2 Cache & 256KB \\ \hline
137L3 Cache & 4-MB \\ \hline
138Front Side Bus & 1333 MHz \\ \hline
139Memory  & 4GB \\ \hline
140Hard disk & SCSI 1TB \\ \hline
144\caption{Core i3} 
148\subsection{PMC Hardware Events}\label{events}
150Each of the hardware events selected relates to the energy consumption due to one or more hardware units. For example, total branch miss predictions corresponds to the use of the branch misprediction unit.
152Initial PMC hardware event set:
154\item Processor Cycles
155\item Retired Instructions
156\item Branch Instructions
157\item Branch Miss Predictions
158\item Integer Instructions
159\item Integer Loads
160\item SIMD Instructions
161\item SIMD Loads
162\item Last Level Cache Misses
165Additional candidate PMC hardware events:
167\item Translation Lookaside Buffer
168\item DMA Accesses
169\item I/O Interrupts
170\item Last Level Cache Requests
173\subsection{Measurement Hardware - Fluke i410}
174The Fluke i310 current clamp meter is an electrical tester that combines a voltmeter with a clamp type current meter. Like the multimeter, the clamp meter has transitioned through the analog period and into the digital era. Created primarily as a single purpose test tool for electricians,
175the Fluke i410 have incorporated more measurement functions and accuracy \cite{clamp}.
177\section{Expected Results}
178Through the use of the Fluke i410, we intend to capture the energy usage patterns of each XML parser under a variety of workloads on both server and mobile architectures. We plan to compare those patterns against the PMCs identified by Bellosa, Bertran and Bircher as significant hardware events w.r.t energy consumption \cite{bellosa2001, bertran2010, bircher2007}. Based on these results, we expect to gain insight into which XML parser execution characteristics contribute most significantly to overall energy consumption.
182This research is limited to the study of the coarse grained inter-parser power consumption on mobile and server hardware architectures. Our work is limited to a single representative mobile hardware instance and a single server hardware instance. We select XML well-formedness checking as our test application since this XML processing task is ubiquitous agmonst processors but acknowledge that additional higher-level XML processing tasks often follow. Performance monitoring counters are used to gain insight into the general execution characteristics of each XML parser. The hardware events selected for capture are based on the previous works of Bellosa, Bertran and Bircher \cite{bellosa2001, bertran2010, bircher2007}. It is beyond the scope of this work to correlate specific hardware performance events to the power consumption costs of individual hardware units. Further, Bellosa et al. demonstrate that temporal power consumption is complex and highly dependent on the characteristics of the source document \cite{bellosa2001}. As such, this study does not consider the temporal energy consumption behaviour of the XML parser applications.
Note: See TracBrowser for help on using the repository browser.