source: docs/HPCA2012/04-methodology.tex @ 1362

Last change on this file since 1362 was 1362, checked in by ashriram, 8 years ago

first parse. Pipeline stage

File size: 6.8 KB
4In this section we describe our methodology for the measurements and
5investigation of XML parser energy consumption and performance.  In
6brief, for each of the four XML parsers under study we propose to measure
7and evaluate the energy consumption required to carry out XML
8well-formedness checking, under a variety of workloads, and as
9executed on three different Intel processors.
11To begin our study we propose to first investigate each of the XML
12parsers in terms of the Performance Monitoring Counter (PMC) hardware
13events listed in the PMC Hardware Events subsection. Based on the
14findings of previous work \cite{bellosa2001, bertran2010, bircher2007}
15we have chosen several key hardware performance events for which the
16authors indicate a strong correlation with overall performance and
17energy consumption of the application. In addition, we measure the
18runtime counts of SIMD instructions and bitwise operations using the
19Intel Pin binary instrumentation framework. Based on these data we
20gain further insight into XML parser execution characteristics and
21compare and constrast each of the Parabix parser versions against the
22performance of standard industry parsers.
24The foundational work by Bellosa in \cite{bellosa2001} as well as more
25recent work in \cite {bircher2007, bertran2010} demonstrate that
26hardware-usage patterns have a significant impact on the energy
27consumption characteristics of an application \cite{bellosa2001,
28  bircher2007, bertran2010}. Further, the authors demonstrate a strong
29correlation between specific PMC events and energy usage. However, each
30author differs slightly in their opinion of the exact set of PMCs to use.
32The following subsections describe the XML parsers under study, XML
33workloads, the hardware architectures, PMC hardware events selected
34for measurement, and the energy measurement instrumentation set up. We analyze the
35performance of each of the XML parsers under study based on PMC hardware event counts and contrast their energy consumption
36measurements based on direct measurements.
41The XML parsing technologies selected for this study are the Parabix1,
42Parabix2, Xerces-C++, and Expat XML parsers. Parabix1 (parallel bit
43Streams for XML) is our first generation SIMD and Parallel Bit Stream
44technology based XML parser \cite{Parabix1}.  Parabix1 leverages the
45processor built-in {\em bitscan} operation for high-performance XML
46character scanning as well as the SIMD capabilities of modern
47commodity processors to achieve high performance.  Parabix2
48\cite{parabix2} represents the second generation of the Parabix1
49parser. Parabix2 is an open-source XML parser that also leverages
50Parallel Bit Stream technology and the SIMD capabilities of modern
51commodity processors. However, Parabix2 differs from Parabix1 in that
52it employs new parallelization techniques, such as a multiple cursor
53approach to parallel parsing together with bit stream addition
54techniques to advance multiple cursors independently and in
55parallel. Parabix2 delivers dramatic performance improvements over
56traditional byte-at-a-time parsing technology.  Xerces-C++ version
573.1.1 (SAX) \cite{xerces} is a validating open source XML parser
58written in C++ by the Apache project.  Expat version 2.0.1
59\cite{expat} is a non-validating XML parser library written in C.
67File Name               & dewiki.xml            & jawiki.xml            & roads.gml     & po.xml        & soap.xml \\ \hline   
68File Type               & document              & document              & data          & data          & data   \\ \hline     
69File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
70Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
71Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
75\caption{XML Document Characteristics} 
81Markup density is defined as the ratio of the total markup contained
82within an XML file to the total XML document size.  This metric has
83substantial influence on the performance of traditional recursive
84descent XML parser implementations.  We use a mixture of
85document-oriented and data-oriented XML files in our study to provide
86workloads with a full spectrum of markup densities.
88Table \ref{XMLDocChars} shows the document characteristics of the XML
89input files selected for this performance study.  The jawiki.xml and
90dewiki.xml XML files represent document-oriented XML inputs and
91contain the three-byte and four-byte UTF-8 sequence required for the
92UTF-8 encoding of Japanese and German characters respectively.  The
93remaining data files are data-oriented XML documents and consist
94entirely of single byte $7$-bit encoded ASCII characters.
97\subsection{Platform Hardware}
98\paragraph{Intel \CO{}}
99Intel \CO{} processor, code name Conroe, produced by
100Intel. Table \ref{core2info} gives the hardware description of the
101Intel \CO{} machine.
107Processor & Core2 Duo (2.13GHz) & i3-530 (2.93GHz) & Sandybridge (2.80GHz) \\ \hline
108L1 D Cache & 32KB & 32KB & 32KB \\ \hline       
109L2 Cache & Shared 2MB & 256KB/core & 256KB/core \\ \hline 
110L3 Cache & --- & 4MB  & 6MB \\ \hline 
111Bus or QPI &  1066Mhz Bus & 1333Mhz QPI & 1333Mhz QPI \\ \hline
112Memory  & 2GB & 4GB & 6GB\\ \hline
113Max TDP & 65W & 73W &  95W \\ \hline
115\caption{Platform Hardware Specs} 
118Intel \CITHREE\ processor, code name Nehalem, produced by Intel. The
119intent of the selection of this processor is to serve as an example of a low end server
120processor. Table \ref{i3info} gives the hardware description of the
121Intel \CITHREE\ machine. Intel \CIFIVE\  processor, code name \SB\, produced by
122Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
123Intel \CITHREE\ machine.
124Each of the hardware events selected relates to performance and energy
125features associated with one or more hardware units.  For example,
126total branch mispredictions relate to the branch predictor and branch
127target buffer capacity.
129The set of PMC events used included in this study are as follows.
130Processor Cycles, Branch Instructions, Branch Mispredictions, Integer
131Instructions, SIMD Instructions and Cache Misses.
133\subsection{Energy Measurement}
134  We measure energy consumption using the Fluke i410 current
135clamp applied on the 12V wires that supply power to the processor
136sockets. The clamp detects the magnetic field created by the flowing
137current and converts it into voltage levels (1mV per 1A
138current). The voltage levels are then monitored by an Agilent 34410a
139multimeter at the granularity of 100 samples per second. This
140measurement captures the power to the processor package, including
141cores, caches, Northbridge memory controller, and the quick-path
142interconnects \cite{clamp}.
Note: See TracBrowser for help on using the repository browser.