source: docs/PACT2011/04-methodology.tex @ 1024

Last change on this file since 1024 was 1022, checked in by ksherdy, 9 years ago

Minor updates.

File size: 7.5 KB
Line 
1\section{Methodology}
2
3
4In this section, we describe our methodology for the measurements and
5investigation of XML parsing energy consumption and performance.  In
6brief, for each of the XML parsers under study we propose to measure
7and evaluate the energy consumption required to carry out XML
8well-formedness checking, under a variety of workloads, and as
9executed on three different Intel cores.
10
11To begin our study, we propose to first investigate each of the XML
12parsers in terms of the PMCs hardware events as listed in the
13following subsection. Based on the recommendation of previous
14proposals \cite{bellosa2001, bertran2010, bircher2007}, we have chosen
15several key hardware performance events for which the authors indicate
16have a strong correlation to energy consumption.  We also measure
17other runtime counts such as the number of SIMD instructions and
18bitwise operations using the PIN binary instrumentation
19framework. From these data, we hope to gain insight into the XML
20parser execution characteristics and compare and constrast different
21industrial parsers.
22
23The foundational work by Bellosa in \cite{bellosa2001} as well as more
24recent work in \cite {bircher2007, bertran2010} show that
25hardware-usage patterns has a significant impact in the energy
26consumption of a particular application; \cite{bellosa2001,
27  bircher2007, bertran2010} further show that there is a strong
28correlation between specific performance events and energy usage---but
29the authors of each differ slightly in opinion as to which performance
30monitoring counters\footnote{Performance monitoring counters (PMCs)
31  are special-purpose registers that are included in most modern
32  microprocessors; they store the running count of specific hardware
33  events, such as retired instructions, cache misses, branch
34  mispredictions, and arithmetic-logic unit operations to name a few.
35  They can be used to capture information about any program at
36  run-time, under any workload, at a very fine granularity.} (PMCs) to
37use.
38
39
40The following subsections describe the XML parsers under study, XML
41workloads, the hardware architectures, PMC hardware events selected
42for measurement, and the energy measurement set up. We analyze the
43performance of the different parsers based on the hardware performance
44counter measurements and contrast their energy consumption
45measurements based on direct measurement.
46
47
48\subsection{Parsers}\label{parsers}
49
50The XML parsing technologies selected for this study are the Parabix2,
51Xerces-C++, and Expat XML parsers.  Parabix2 \cite{parabix2} (parallel
52bit streams for XML) is the second generation Parabix parser. Parabix2
53is an open-source XML parser that leverages the SIMD capabilities of
54modern commodity processors; it employs the new parallelization
55techniques using parallel parsing with bit stream addition to deliver
56dramatic performance improvements over traditional byte-at-a-time
57parsing technology.  Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a
58validating open source XML parser written in C++ by the Apache
59project.  Expat version 2.0.1 \cite{expat} is a non-validating XML
60parser library written in C.
61
62\begin{table*}
63\begin{center}
64\begin{tabular}{|c||r|r|r|r|r|}
65\hline
66File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
67File Type               & document              & document              & data          & data          & data   \\ \hline     
68File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
69Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
70Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
71\end{tabular}
72\end{center}
73\caption{XML Document Characteristics} 
74\label{XMLDocChars} 
75\end{table*}
76
77\subsection{Workloads}\label{workloads}
78
79Distinguishing between ``document-oriented'' XML and ``data-oriented''
80XML is a popular way to describe the two basic classes of XML
81documents.  Data-oriented XML is used as an interchange format.
82Document-oriented XML is used to impose structure on information that
83rarely fits neatly into a relational database--particularly
84information intended for publishing.  Data-oriented XML are
85characterized by a higher markup density.  Markup density is defined
86as the ratio of the total markup contained within an XML file to the
87total XML document size.  This metric may have substantial influence
88on the performance of XML parsing.  As such we choose workloads with a
89spectrum of markup densities.
90
91Table \ref{XMLDocChars} shows the document characteristics of the XML
92input files selected for this performance study.  The jawiki.xml and
93dewiki.xml XML files represent document-oriented XML inputs,
94containing three-byte and four-byte UTF8 sequence.  The remaining
95files are data-oriented inputs and consist of only ASCII
96characters.\cite{CameronHerdyLin2008}
97
98
99\subsection{Platform Hardware}
100\paragraph{Intel Core 2}
101The Intel Core 2 is a Conroe based processor produced by
102Intel. Table \ref{core2info} gives the hardware description of the
103Intel Core 2 machine selected.
104\begin{table}[h]
105\begin{center}
106\begin{tabular}{|c||c|}
107\hline
108Processor & Intel Core 2 Duo processor 6400  (2.13GHz) \\ \hline
109L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline 
110L2 Cache & 2MB \\ \hline
111Front Side Bus &  1066 MHz\\ \hline
112Memory  & 2GB \\ \hline
113Hard disk & 80GB SCSI \\ \hline
114Max TDP & 65W \\ \hline
115\end{tabular}
116\end{center}
117\caption{Core 2} 
118\label{core2info} 
119\end{table}
120
121\paragraph {Intel Core i3}
122The Intel Core i3 is a Nehalem based processor produced by Intel. The
123intent of this processor is to serve as an example low end server
124processor. Table \ref{i3info} gives the hardware description of the
125Intel Core i3 machine selected.
126
127\begin{table}[h]
128\begin{center}
129\begin{tabular}{|c||c|}
130\hline
131Processor & Intel Clarkdale I3-530 (2.93GHz) \\ \hline
132L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline 
133L2 Cache & 256KB \\ \hline
134L3 Cache & 4-MB \\ \hline
135Front Side Bus & 1333 MHz \\ \hline
136Memory  & 4GB \\ \hline
137Hard disk & SCSI 1TB \\ \hline
138Max TDP & 73W \\ \hline
139
140\end{tabular}
141\end{center}
142\caption{Core i3} 
143\label{i3info} 
144\end{table}
145
146\paragraph{Intel Core i5}
147The Intel Core i5 is a Sandy Bridge based processor produced by
148Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
149Intel Core i3 machine selected.
150
151\begin{table}[h]
152\begin{center}
153\begin{tabular}{|c||c|}
154\hline
155Processor & Intel Core I5-2300 (2.80GHz) \\ \hline
156L1 Cache &  192 KB\\ \hline     
157L2 Cache &  4 X 256KB \\ \hline
158L3 Cache & 6-MB \\ \hline
159Front Side Bus &  1333 MHz\\ \hline
160Memory  &  6GB DDDR\\ \hline
161Hard disk &  SATA 1TB\\ \hline
162Max TDP & 95W \\ \hline
163
164\end{tabular}
165\end{center}
166\caption{Sandy Bridge} 
167\label{sandybridgeinfo} 
168\end{table}
169
170\subsection{PMC Hardware Events}\label{events}
171
172Each of the hardware events selected relates to the energy consumption
173due to one or more hardware units. For example, total branch miss
174predictions corresponds to the use of the branch misprediction unit.
175
176Initial PMC hardware event set:
177\begin{itemize}
178\item Processor Cycles
179\item Branch Instructions
180\item Branch Mispredictions
181\item Integer Instructions
182\item SIMD Instructions
183\item Cache Misses
184\end{itemize}
185
186\subsection{Energy Measurement}
187  To measure energy we use a Fluke i410 current
188clamp applied on the 12V wires that supply power to the processor
189sockets. The clamp detects the magnetic field created by the flowing
190current and converts it into voltage lev- els (1mV per 1A
191current). The voltage levels are then monitored by an Agilent 34410a
192multimeter at the granu- larity of 100 samples per second. This
193measurement cap- tures the power to the processor package, including
194cores, caches, Northbridge memory controller, and the quick-path
195interconnects. \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.