source: docs/PACT2011/04-methodology.tex @ 1078

Last change on this file since 1078 was 1078, checked in by ksherdy, 9 years ago

Update macros \CI to \CI3

File size: 7.2 KB
[1001]4In this section, we describe our methodology for the measurements and
5investigation of XML parsing energy consumption and performance.  In
[1031]6brief, for each of the XML parsers under study we propose to measure
[1001]7and evaluate the energy consumption required to carry out XML
8well-formedness checking, under a variety of workloads, and as
[1031]9executed on three different Intel cores.
[1031]11To begin our study, we propose to first investigate each of the XML
[1001]12parsers in terms of the PMCs hardware events as listed in the
13following subsection. Based on the recommendation of previous
14proposals \cite{bellosa2001, bertran2010, bircher2007}, we have chosen
15several key hardware performance events for which the authors indicate
16have a strong correlation to energy consumption.  We also measure
17other runtime counts such as the number of SIMD instructions and
[1034]18bitwise operations using the Intel Pin binary instrumentation
[1001]19framework. From these data, we hope to gain insight into the XML
20parser execution characteristics and compare and constrast different
21industrial parsers.
[1001]23The foundational work by Bellosa in \cite{bellosa2001} as well as more
24recent work in \cite {bircher2007, bertran2010} show that
25hardware-usage patterns has a significant impact in the energy
26consumption of a particular application; \cite{bellosa2001,
27  bircher2007, bertran2010} further show that there is a strong
28correlation between specific performance events and energy usage---but
29the authors of each differ slightly in opinion as to which performance
30monitoring counters\footnote{Performance monitoring counters (PMCs)
31  are special-purpose registers that are included in most modern
32  microprocessors; they store the running count of specific hardware
33  events, such as retired instructions, cache misses, branch
34  mispredictions, and arithmetic-logic unit operations to name a few.
35  They can be used to capture information about any program at
36  run-time, under any workload, at a very fine granularity.} (PMCs) to
[1001]40The following subsections describe the XML parsers under study, XML
41workloads, the hardware architectures, PMC hardware events selected
42for measurement, and the energy measurement set up. We analyze the
43performance of the different parsers based on the hardware performance
44counter measurements and contrast their energy consumption
45measurements based on direct measurement.
[1001]50The XML parsing technologies selected for this study are the Parabix2,
51Xerces-C++, and Expat XML parsers.  Parabix2 \cite{parabix2} (parallel
52bit streams for XML) is the second generation Parabix parser. Parabix2
53is an open-source XML parser that leverages the SIMD capabilities of
54modern commodity processors; it employs the new parallelization
55techniques using parallel parsing with bit stream addition to deliver
56dramatic performance improvements over traditional byte-at-a-time
57parsing technology.  Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a
58validating open source XML parser written in C++ by the Apache
59project.  Expat version 2.0.1 \cite{expat} is a non-validating XML
60parser library written in C.
66File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
67File Type               & document              & document              & data          & data          & data   \\ \hline     
68File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
69Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
70Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
73\caption{XML Document Characteristics} 
[1031]79Markup density is defined
[1001]80as the ratio of the total markup contained within an XML file to the
81total XML document size.  This metric may have substantial influence
[1031]82on the performance of XML parsing. 
83We use a mixture of document-oriented and data-oriented XML
84files in our study to provide workloads with a spectrum of
85markup densities.
[1001]87Table \ref{XMLDocChars} shows the document characteristics of the XML
88input files selected for this performance study.  The jawiki.xml and
89dewiki.xml XML files represent document-oriented XML inputs,
90containing three-byte and four-byte UTF8 sequence.  The remaining
[1078]91files are data-oriented inputs and consist of only ASCI3I
[927]95\subsection{Platform Hardware}
[1039]96\paragraph{Intel \CO{}}
97The Intel \CO\ is a Conroe based processor produced by
[1022]98Intel. Table \ref{core2info} gives the hardware description of the
[1039]99Intel \CO\ machine selected.
[1039]104Processor & Intel Core2 Duo processor 6400  (2.13GHz) \\ \hline
[954]105L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline 
106L2 Cache & 2MB \\ \hline
[969]107Front Side Bus &  1066 MHz\\ \hline
[954]108Memory  & 2GB \\ \hline
[1022]109Hard disk & 80GB SCSI \\ \hline
[977]110Max TDP & 65W \\ \hline
[1078]117\paragraph {Intel \CI3{}}
118The Intel \CI3\ is a Nehalem based processor produced by Intel. The
[1022]119intent of this processor is to serve as an example low end server
[1001]120processor. Table \ref{i3info} gives the hardware description of the
[1078]121Intel \CI3\ machine selected.
[1039]127Processor & Intel i3-530 (2.93GHz) \\ \hline
[927]128L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline 
129L2 Cache & 256KB \\ \hline
130L3 Cache & 4-MB \\ \hline
131Front Side Bus & 1333 MHz \\ \hline
132Memory  & 4GB \\ \hline
133Hard disk & SCSI 1TB \\ \hline
[977]134Max TDP & 73W \\ \hline
[1022]142\paragraph{Intel Core i5}
[1039]143The Intel Core i5 is a \SB\ based processor produced by
[1022]144Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
[1078]145Intel \CI3\ machine selected.
[1039]151Processor & Intel Sandybridge i5-2300 (2.80GHz) \\ \hline
[978]152L1 Cache &  192 KB\\ \hline     
153L2 Cache &  4 X 256KB \\ \hline
[969]154L3 Cache & 6-MB \\ \hline
[1006]155Front Side Bus &  1333 MHz\\ \hline
156Memory  &  6GB DDDR\\ \hline
157Hard disk &  SATA 1TB\\ \hline
[977]158Max TDP & 95W \\ \hline
[927]166\subsection{PMC Hardware Events}\label{events}
[1031]168Each of the hardware events selected relates to performance
169and energy features associated with
170one or more hardware units.   For example, total branch mispredictions
171relate to the branch predictor and branch target buffer capacity.
[1031]173The set of PMC events used include the following.
175\item Processor Cycles
176\item Branch Instructions
177\item Branch Mispredictions
178\item Integer Instructions
179\item SIMD Instructions
[969]180\item Cache Misses
[1001]183\subsection{Energy Measurement}
184  To measure energy we use a Fluke i410 current
185clamp applied on the 12V wires that supply power to the processor
186sockets. The clamp detects the magnetic field created by the flowing
[1031]187current and converts it into voltage levels (1mV per 1A
[1001]188current). The voltage levels are then monitored by an Agilent 34410a
[1031]189multimeter at the granularity of 100 samples per second. This
190measurement captures the power to the processor package, including
[1001]191cores, caches, Northbridge memory controller, and the quick-path
[1031]192interconnects \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.