source: docs/HPCA2012/04-methodology.tex @ 1339

Last change on this file since 1339 was 1339, checked in by cameron, 8 years ago

Intro updates; section cross-references

File size: 7.1 KB
Line 
1\section{Methodology}
2\label{section:methodology}
3
4In this section we describe our methodology for the measurements and
5investigation of XML parser energy consumption and performance.  In
6brief, for each of the four XML parsers under study we propose to measure
7and evaluate the energy consumption required to carry out XML
8well-formedness checking, under a variety of workloads, and as
9executed on three different Intel processors.
10
11To begin our study we propose to first investigate each of the XML
12parsers in terms of the Performance Monitoring Counter \footnote{Performance Monitoring Counters
13 are special-purpose registers available with most modern
14 microprocessors. PMCs store the running count of specific hardware
15 events, such as retired instructions, cache misses, branch
16 mispredictions, and arithmetic-logic unit operations.
17 PMCs can be used to capture information about any program at
18 run-time and under any workload at a fine granularity.} (PMC) hardware events listed in
19the PMC Hardware Events subsection. Based on the findings of previous
20work \cite{bellosa2001, bertran2010, bircher2007} we have chosen
21several key hardware performance events for which the authors indicate
22a strong correlation with energy consumption. In addition, we measure
23the runtime counts of SIMD instructions and
24bitwise operations using the Intel Pin binary instrumentation
25framework. Based on these data we gain further insight into XML
26parser execution characteristics and compare and constrast each of the Parabix parser versions
27against the performance of standard industry parsers.
28
29The foundational work by Bellosa in \cite{bellosa2001} as well as more
30recent work in \cite {bircher2007, bertran2010} demonstrate that
31hardware-usage patterns have a significant impact on the energy
32consumption characteristics of an application \cite{bellosa2001,
33  bircher2007, bertran2010}. Further, the authors demonstrate a strong
34correlation between specific PMC events and energy usage. However, each
35author differs slightly in their opinion of the exact set of PMCs to use.
36
37The following subsections describe the XML parsers under study, XML
38workloads, the hardware architectures, PMC hardware events selected
39for measurement, and the energy measurement instrumentation set up. We analyze the
40performance of each of the XML parsers under study based on PMC hardware event counts and contrast their energy consumption
41measurements based on direct measurements.
42
43
44\subsection{Parsers}\label{parsers}
45
46The XML parsing technologies selected for this study are the Parabix1, Parabix2,
47Xerces-C++, and Expat XML parsers. Parabix1 (parallel bit Streams for XML) is our first generation SIMD and Parallel Bit Stream technology based XML parser \cite{Parabix1}.
48Parabix1 leverages the processor built-in {\em bitscan} operation for high-performance XML character scanning as well as the
49SIMD capabilities of modern commodity processors to achieve high performance.
50Parabix2 \cite{parabix2} represents the second generation of the Parabix1 parser. Parabix2
51is an open-source XML parser that also leverages Parallel Bit Stream technology and the SIMD capabilities of
52modern commodity processors. However, Parabix2 differs from Parabix1 in that it employs new parallelization
53techniques, such as a multiple cursor approach to parallel parsing together with bit stream addition techniques to advance multiple cursors independently and in parallel. Parabix2 delivers
54dramatic performance improvements over traditional byte-at-a-time
55parsing technology.  Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a
56validating open source XML parser written in C++ by the Apache
57project.  Expat version 2.0.1 \cite{expat} is a non-validating XML
58parser library written in C.
59
60\begin{table*}
61\begin{center}
62{
63\footnotesize
64\begin{tabular}{|l||l|l|l|l|l|}
65\hline
66File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
67File Type               & document              & document              & data          & data          & data   \\ \hline     
68File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
69Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
70Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
71\end{tabular}
72}
73\end{center}
74\caption{XML Document Characteristics} 
75\label{XMLDocChars} 
76\end{table*}
77
78\subsection{Workloads}\label{workloads}
79
80Markup density is defined as the ratio of the total markup contained
81within an XML file to the total XML document size.  This metric has
82substantial influence on the performance of traditional recursive
83descent XML parser implementations.  We use a mixture of
84document-oriented and data-oriented XML files in our study to provide
85workloads with a full spectrum of markup densities.
86
87Table \ref{XMLDocChars} shows the document characteristics of the XML
88input files selected for this performance study.  The jawiki.xml and
89dewiki.xml XML files represent document-oriented XML inputs and
90contain the three-byte and four-byte UTF-8 sequence required for the
91UTF-8 encoding of Japanese and German characters respectively.  The
92remaining data files are data-oriented XML documents and consist
93entirely of single byte $7$-bit encoded ASCII characters.
94
95
96\subsection{Platform Hardware}
97\paragraph{Intel \CO{}}
98Intel \CO{} processor, code name Conroe, produced by
99Intel. Table \ref{core2info} gives the hardware description of the
100Intel \CO{} machine.
101
102\begin{table*}[h]
103\footnotesize
104\begin{tabular}{|l||l|l|l|}
105\hline
106Processor & Core2 Duo (2.13GHz) & i3-530 (2.93GHz) & Sandybridge (2.80GHz) \\ \hline
107L1 D Cache & 32KB & 32KB & 32KB \\ \hline       
108L2 Cache & Shared 2MB & 256KB/core & 256KB/core \\ \hline 
109L3 Cache & --- & 4MB  & 6MB \\ \hline 
110Bus or QPI &  1066Mhz Bus & 1333Mhz QPI & 1333Mhz QPI \\ \hline
111Memory  & 2GB & 4GB & 6GB\\ \hline
112Max TDP & 65W & 73W &  95W \\ \hline
113\end{tabular}
114\caption{Platform Hardware Specs} 
115\end{table*}
116
117Intel \CITHREE\ processor, code name Nehalem, produced by Intel. The
118intent of the selection of this processor is to serve as an example of a low end server
119processor. Table \ref{i3info} gives the hardware description of the
120Intel \CITHREE\ machine. Intel \CIFIVE\  processor, code name \SB\, produced by
121Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
122Intel \CITHREE\ machine.
123Each of the hardware events selected relates to performance and energy
124features associated with one or more hardware units.  For example,
125total branch mispredictions relate to the branch predictor and branch
126target buffer capacity.
127
128The set of PMC events used included in this study are as follows.
129Processor Cycles, Branch Instructions, Branch Mispredictions, Integer
130Instructions, SIMD Instructions and Cache Misses.
131
132\subsection{Energy Measurement}
133  We measure energy consumption using the Fluke i410 current
134clamp applied on the 12V wires that supply power to the processor
135sockets. The clamp detects the magnetic field created by the flowing
136current and converts it into voltage levels (1mV per 1A
137current). The voltage levels are then monitored by an Agilent 34410a
138multimeter at the granularity of 100 samples per second. This
139measurement captures the power to the processor package, including
140cores, caches, Northbridge memory controller, and the quick-path
141interconnects \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.