source: docs/PACT2011/04-methodology.tex @ 949

Last change on this file since 949 was 949, checked in by lindanl, 9 years ago

more outline on paper

File size: 7.4 KB
Line 
1\section{Methodology}
2
3
4In this section, we describe our methodology for the measurements and investigation of XML parsing energy consumption and performance. In brief, for each of the XML parsers under study we propose to measure and evaluate the energy consumption required to carry out XML well-formedness checking, under a variety of workloads, and as executed on both mobile device and server hardware.
5
6To begin our study, we propose to first investigate each of the XML parsers in terms of the PMCs hardware events as listed in the following subsection. Based on previous key works \cite{bellosa2001, bertran2010, bircher2007}, we have chosen several key hardware performance events for which the authors indicate have a strong correlation to energy consumption. From these data, we hope to gain insight into the XML parser execution characteristics which most significantly contribute to overall energy consumption. Secondly, using the Fluke i410 current clamp meter, we plan to measure the total energy consumption required to complete XML well-formedness checking for each XML parser, on each hardware platform, and for each of a number of XML source files.
7
8The foundational work by Bellosa in \cite{bellosa2001} as well as more recent work in \cite {bircher2007, bertran2010} 
9show that hardware-usage patterns has a significant impact in the energy consumption of a particular application;
10\cite{bellosa2001, bircher2007, bertran2010} further show that there is a strong correlation between
11specific performance events and energy usage---but the authors of each differ slightly in opinion as to
12which performance monitoring counters\footnote{Performance monitoring counters (PMCs) are special-purpose registers that are included in most modern microprocessors;
13they store the running count of specific hardware events, such as retired instructions, cache misses, branch mispredictions, and arithmetic-logic unit operations to name a few.
14They can be used to capture information about any program at run-time, under any workload, at a very fine granularity.} (PMCs) to use.
15
16
17
18% The use of performance counters for modeling power is not a new concept.
19
20%Although the microprocessor is typically the largest consumers of power, Bertran et al. found that the chipset, memory, I/O, and disk may can account for a significant of the total system energy consumption \cite{bertran2010}.
21
22%As such, through the selection of a representative subset of hardware performance events, as based on the combined works of  \cite{bellosa2001, bertran2010, bircher2007}, we hope to gain insight into the XML parser execution characteristics which contribute most significantly to overall energy consumption.
23
24The following subsections describe the XML parsers under study, XML workloads, the mobile device and server hardware architectures, PMC hardware events selected for measurement, and the Fluke i401 current clamp meter. The expected outcomes of this section are hardware performance counter measurements and total energy consumption measurements for each of XML parser, XML source file, and hardware combination.
25
26\subsection{Parsers}\label{parsers}
27
28The XML parsing technologies selected for this study are the Parabix2, Xerces-C++, and Expat XML parsers.
29Parabix2 \cite{parabix2} (parallel bit streams for XML) is the second generation Parabix parser. Parabix2 is an open-source XML parser that leverages the SIMD capabilities of modern commodity processors; it employs the new parallelization techniques using parallel parsing with bit stream addition to deliver dramatic performance improvements over traditional byte-at-a-time parsing technology.
30Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a validating open source XML parser written in C++ by the Apache project.
31Expat version 2.0.1 \cite{expat} is a non-validating XML parser library written in C.
32
33\begin{table*}
34\begin{center}
35\begin{tabular}{|c||r|r|r|r|r|}
36\hline
37File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
38File Type               & document              & document              & data          & data          & data   \\ \hline     
39File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
40Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
41Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
42\end{tabular}
43\end{center}
44\caption{XML Document Characteristics} 
45\label{XMLDocChars} 
46\end{table*}
47
48\subsection{Workloads}\label{workloads}
49
50Distinguishing between "document-oriented" XML and "data-oriented" XML is a popular way to describe the two basic classes of XML documents.
51Data-oriented XML is used as an interchange format. Document-oriented XML is used to impose structure on information that rarely fits neatly into a relational database--particularly information intended for publishing. Data-oriented XML are characterized by a higher markup density. Markup density is defined as the ratio of the total markup contained within an XML file to the total XML document size.  This metric may have substantial influence on the performance of XML parsing. As such we choose workloads with distinguishable markup densities.
52
53Table \ref{XMLDocChars} shows the document characteristics of the XML instances selected for this performance study.
54The jawiki.xml and dewiki.xml XML files represent document-oriented XML instances of Wikimedia books,
55written in German and Japanese, respectively. The remaining files are data-oriented.
56The roads.gml file is an instance of Geography Markup Language (GML), a modeling language for geographic
57systems as well as an open interchange format for geographic transactions on the Internet.
58The po.xml file is an example of purchase order data, while the soap.xml file contains a large SOAP message.
59This markup density metric is reported for each
60document.\cite{CameronHerdyLin2008}
61
62Describe parameters; what each parameter means.
63\subsection{Platform Hardware}
64
65
66\subsubsection{Server - Intel Core i3}
67The Intel Core i3 is a Nehalem based processor produced by Intel. The intent of this processor is to serve as a
68low end server processor. Table \ref{i3} gives the hardware description of the Intel Core i3 based machine selected.
69
70\begin{table}[h]
71\begin{center}
72\begin{tabular}{|c||c|}
73\hline
74Processor & Clarkdale I3-530 (2.93GHz) \\ \hline
75L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline 
76L2 Cache & 256KB \\ \hline
77L3 Cache & 4-MB \\ \hline
78Front Side Bus & 1333 MHz \\ \hline
79Memory  & 4GB \\ \hline
80Hard disk & SCSI 1TB \\ \hline
81
82\end{tabular}
83\end{center}
84\caption{Core i3} 
85\label{i3} 
86\end{table}
87
88\subsubsection{Server - Sandy Bridge}
89
90\subsection{PMC Hardware Events}\label{events}
91
92Each of the hardware events selected relates to the energy consumption due to one or more hardware units. For example, total branch miss predictions corresponds to the use of the branch misprediction unit.
93
94Initial PMC hardware event set:
95\begin{itemize}
96\item Processor Cycles
97\item Branch Instructions
98\item Branch Mispredictions
99\item Integer Instructions
100\item *Integer Loads
101\item SIMD Instructions
102\item *SIMD Loads
103\item Last Level Cache Misses
104\end{itemize}
105
106\subsection{Measurement Hardware}
107The Fluke i410 current clamp meter is an electrical tester that combines a voltmeter with a clamp type current meter.
108Like the multimeter, the clamp meter has transitioned through the analog period and into the digital era. Created primarily as a single purpose test tool for electricians,
109the Fluke i410 have incorporated more measurement functions and accuracy \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.