source: docs/HPCA2011/04-methodology.tex @ 1302

Last change on this file since 1302 was 1302, checked in by lindanl, 8 years ago

Create a directory for HPCA

File size: 8.0 KB
Line 
1\section{Methodology}
2
3
4In this section we describe our methodology for the measurements and
5investigation of XML parser energy consumption and performance.  In
6brief, for each of the four XML parsers under study we propose to measure
7and evaluate the energy consumption required to carry out XML
8well-formedness checking, under a variety of workloads, and as
9executed on three different Intel processors.
10
11To begin our study we propose to first investigate each of the XML
12parsers in terms of the Performance Monitoring Counter \footnote{Performance Monitoring Counters
13 are special-purpose registers available with most modern
14 microprocessors. PMCs store the running count of specific hardware
15 events, such as retired instructions, cache misses, branch
16 mispredictions, and arithmetic-logic unit operations.
17 PMCs can be used to capture information about any program at
18 run-time and under any workload at a fine granularity.} (PMC) hardware events listed in
19the PMC Hardware Events subsection. Based on the findings of previous
20work \cite{bellosa2001, bertran2010, bircher2007} we have chosen
21several key hardware performance events for which the authors indicate
22a strong correlation with energy consumption. In addition, we measure
23the runtime counts of SIMD instructions and
24bitwise operations using the Intel Pin binary instrumentation
25framework. Based on these data we gain further insight into XML
26parser execution characteristics and compare and constrast each of the Parabix parser versions
27against the performance of standard industry parsers.
28
29The foundational work by Bellosa in \cite{bellosa2001} as well as more
30recent work in \cite {bircher2007, bertran2010} demonstrate that
31hardware-usage patterns have a significant impact on the energy
32consumption characteristics of an application \cite{bellosa2001,
33  bircher2007, bertran2010}. Further, the authors demonstrate a strong
34correlation between specific PMC events and energy usage. However, each
35author differs slightly in their opinion of the exact set of PMCs to use.
36
37The following subsections describe the XML parsers under study, XML
38workloads, the hardware architectures, PMC hardware events selected
39for measurement, and the energy measurement instrumentation set up. We analyze the
40performance of each of the XML parsers under study based on PMC hardware event counts and contrast their energy consumption
41measurements based on direct measurements.
42
43
44\subsection{Parsers}\label{parsers}
45
46The XML parsing technologies selected for this study are the Parabix1, Parabix2,
47Xerces-C++, and Expat XML parsers. Parabix1 (parallel bit Streams for XML) is our first generation SIMD and Parallel Bit Stream technology based XML parser \cite{Parabix1}.
48Parabix1 leverages the processor built-in {\em bitscan} operation for high-performance XML character scanning as well as the
49SIMD capabilities of modern commodity processors to achieve high performance.
50Parabix2 \cite{parabix2} represents the second generation of the Parabix1 parser. Parabix2
51is an open-source XML parser that also leverages Parallel Bit Stream technology and the SIMD capabilities of
52modern commodity processors. However, Parabix2 differs from Parabix1 in that it employs new parallelization
53techniques, such as a multiple cursor approach to parallel parsing together with bit stream addition techniques to advance multiple cursors independently and in parallel. Parabix2 delivers
54dramatic performance improvements over traditional byte-at-a-time
55parsing technology.  Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a
56validating open source XML parser written in C++ by the Apache
57project.  Expat version 2.0.1 \cite{expat} is a non-validating XML
58parser library written in C.
59
60\begin{table*}
61\begin{center}
62\begin{tabular}{|l||l|l|l|l|l|}
63\hline
64File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
65File Type               & document              & document              & data          & data          & data   \\ \hline     
66File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
67Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
68Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
69\end{tabular}
70\end{center}
71\caption{XML Document Characteristics} 
72\label{XMLDocChars} 
73\end{table*}
74
75\subsection{Workloads}\label{workloads}
76
77Markup density is defined
78as the ratio of the total markup contained within an XML file to the
79total XML document size.  This metric has substantial influence
80on the performance of traditional recursive descent XML parser implementations. 
81We use a mixture of document-oriented and data-oriented XML
82files in our study to provide workloads with a full spectrum of
83markup densities.
84
85Table \ref{XMLDocChars} shows the document characteristics of the XML
86input files selected for this performance study.  The jawiki.xml and
87dewiki.xml XML files represent document-oriented XML inputs
88and contain the three-byte and four-byte UTF-8 sequence required for the UTF-8 encoding of Japanese and German characters respectively.  The remaining
89data files are data-oriented XML documents and consist entirely of single byte $7$-bit encoded ASCII characters. 
90
91
92\subsection{Platform Hardware}
93\paragraph{Intel \CO{}}
94Intel \CO{} processor, code name Conroe, produced by
95Intel. Table \ref{core2info} gives the hardware description of the
96Intel \CO{} machine.
97\begin{table}[h]
98\begin{center}
99\begin{tabular}{|l||l|}
100\hline
101Processor & Intel Core2 Duo processor 6400  (2.13GHz) \\ \hline
102L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline 
103L2 Cache & 2MB \\ \hline
104Front Side Bus &  1066 MHz\\ \hline
105Memory  & 2GB \\ \hline
106Hard disk & 80GB SCSI \\ \hline
107Max TDP & 65W \\ \hline
108\end{tabular}
109\end{center}
110\caption{\CO{}} 
111\label{core2info} 
112\end{table}
113
114\paragraph {Intel \CITHREE{}}
115Intel \CITHREE\ processor, code name Nehalem, produced by Intel. The
116intent of the selection of this processor is to serve as an example of a low end server
117processor. Table \ref{i3info} gives the hardware description of the
118Intel \CITHREE\ machine.
119
120\begin{table}[h]
121\begin{center}
122\begin{tabular}{|l||l|}
123\hline
124Processor & Intel i3-530 (2.93GHz) \\ \hline
125L1 Cache & 32KB I-Cache, 32K D-Cache \\ \hline 
126L2 Cache & 256KB \\ \hline
127L3 Cache & 4-MB \\ \hline
128Front Side Bus & 1333 MHz \\ \hline
129Memory  & 4GB \\ \hline
130Hard disk & SCSI 1TB \\ \hline
131Max TDP & 73W \\ \hline
132
133\end{tabular}
134\end{center}
135\caption{\CITHREE{}} 
136\label{i3info} 
137\end{table}
138
139\paragraph{Intel \CIFIVE{}}
140Intel \CIFIVE\  processor, code name \SB\, produced by
141Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
142Intel \CITHREE\ machine.
143
144\begin{table}[h]
145\begin{center}
146\begin{tabular}{|l||l|}
147\hline
148Processor & Intel Sandybridge i5-2300 (2.80GHz) \\ \hline
149L1 Cache &  32KB I-Cache, 32K D-Cache \\ \hline 
150L2 Cache &  4 X 256KB \\ \hline
151L3 Cache & 6-MB \\ \hline
152Front Side Bus &  1333 MHz\\ \hline
153Memory  &  6GB DDDR\\ \hline
154Hard disk &  SATA 1TB\\ \hline
155Max TDP & 95W \\ \hline
156
157\end{tabular}
158\end{center}
159\caption{\SB{}} 
160\label{sandybridgeinfo} 
161\end{table}
162
163\subsection{PMC Hardware Events}\label{events}
164
165Each of the hardware events selected relates to performance
166and energy features associated with
167one or more hardware units.   For example, total branch mispredictions
168relate to the branch predictor and branch target buffer capacity.
169
170The set of PMC events used included in this study are as follows.
171\begin{itemize}
172\item Processor Cycles
173\item Branch Instructions
174\item Branch Mispredictions
175\item Integer Instructions
176\item SIMD Instructions
177\item Cache Misses
178\end{itemize}
179
180\subsection{Energy Measurement}
181  We measure energy consumption using the Fluke i410 current
182clamp applied on the 12V wires that supply power to the processor
183sockets. The clamp detects the magnetic field created by the flowing
184current and converts it into voltage levels (1mV per 1A
185current). The voltage levels are then monitored by an Agilent 34410a
186multimeter at the granularity of 100 samples per second. This
187measurement captures the power to the processor package, including
188cores, caches, Northbridge memory controller, and the quick-path
189interconnects \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.