Ignore:
Timestamp:
Aug 24, 2011, 1:55:27 PM (8 years ago)
Author:
ashriram
Message:

Fixed methodology

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/04-methodology.tex

    r1362 r1365  
    1 \section{Methodology}
     1\section{Evaluation Framework}
    22\label{section:methodology}
    33
    4 In this section we describe our methodology for the measurements and
    5 investigation of XML parser energy consumption and performance.  In
    6 brief, for each of the four XML parsers under study we propose to measure
    7 and evaluate the energy consumption required to carry out XML
    8 well-formedness checking, under a variety of workloads, and as
    9 executed on three different Intel processors.
     4\paragraph{XML Parsers}\label{parsers}
    105
    11 To begin our study we propose to first investigate each of the XML
    12 parsers in terms of the Performance Monitoring Counter (PMC) hardware
    13 events listed in the PMC Hardware Events subsection. Based on the
    14 findings of previous work \cite{bellosa2001, bertran2010, bircher2007}
    15 we have chosen several key hardware performance events for which the
    16 authors indicate a strong correlation with overall performance and
    17 energy consumption of the application. In addition, we measure the
    18 runtime counts of SIMD instructions and bitwise operations using the
    19 Intel Pin binary instrumentation framework. Based on these data we
    20 gain further insight into XML parser execution characteristics and
    21 compare and constrast each of the Parabix parser versions against the
    22 performance of standard industry parsers.
    23 
    24 The foundational work by Bellosa in \cite{bellosa2001} as well as more
    25 recent work in \cite {bircher2007, bertran2010} demonstrate that
    26 hardware-usage patterns have a significant impact on the energy
    27 consumption characteristics of an application \cite{bellosa2001,
    28   bircher2007, bertran2010}. Further, the authors demonstrate a strong
    29 correlation between specific PMC events and energy usage. However, each
    30 author differs slightly in their opinion of the exact set of PMCs to use.
    31 
    32 The following subsections describe the XML parsers under study, XML
    33 workloads, the hardware architectures, PMC hardware events selected
    34 for measurement, and the energy measurement instrumentation set up. We analyze the
    35 performance of each of the XML parsers under study based on PMC hardware event counts and contrast their energy consumption
    36 measurements based on direct measurements.
     6In our evaluation we evaluate Parabix against two widely available
     7software parsers.  Xerces-C++, and Expat XML parsers. Parabix is our
     8open-sourced XML parser that leverages Parallel Bit Stream technology
     9and the SIMD capabilities of modern commodity processors.  Xerces-C++
     10version 3.1.1 (SAX) \cite{xerces} is a validating open source XML
     11parser written in C++ available as part of the the Apache project.
     12Expat version 2.0.1 \cite{expat} is a non-validating XML parser
     13library written in C.
    3714
    3815
    39 \subsection{Parsers}\label{parsers}
     16\paragraph{XML Workloads}\label{workloads}
     17XML is used for a variety of purposes ranging from databases to config
     18files in mobile phones. A key feature of these XML files that affects
     19the overall parsing performance is the \textit{Markup
     20  density}. \textit{Markup density} is defined as the ratio of the
     21total markup contained within an XML file to the total XML document
     22size.  This metric has substantial influence on the performance of
     23traditional recursive descent XML parser implementations.  We use a
     24mixture of document-oriented and data-oriented XML files in our study
     25to provide workloads with a full spectrum of markup densities.
    4026
    41 The XML parsing technologies selected for this study are the Parabix1,
    42 Parabix2, Xerces-C++, and Expat XML parsers. Parabix1 (parallel bit
    43 Streams for XML) is our first generation SIMD and Parallel Bit Stream
    44 technology based XML parser \cite{Parabix1}.  Parabix1 leverages the
    45 processor built-in {\em bitscan} operation for high-performance XML
    46 character scanning as well as the SIMD capabilities of modern
    47 commodity processors to achieve high performance.  Parabix2
    48 \cite{parabix2} represents the second generation of the Parabix1
    49 parser. Parabix2 is an open-source XML parser that also leverages
    50 Parallel Bit Stream technology and the SIMD capabilities of modern
    51 commodity processors. However, Parabix2 differs from Parabix1 in that
    52 it employs new parallelization techniques, such as a multiple cursor
    53 approach to parallel parsing together with bit stream addition
    54 techniques to advance multiple cursors independently and in
    55 parallel. Parabix2 delivers dramatic performance improvements over
    56 traditional byte-at-a-time parsing technology.  Xerces-C++ version
    57 3.1.1 (SAX) \cite{xerces} is a validating open source XML parser
    58 written in C++ by the Apache project.  Expat version 2.0.1
    59 \cite{expat} is a non-validating XML parser library written in C.
     27Table \ref{XMLDocChars} shows the document characteristics of the XML
     28input files selected for this performance study.  The jawiki.xml and
     29dewiki.xml XML files represent document-oriented XML inputs and
     30contain the three-byte and four-byte UTF-8 sequence required for the
     31UTF-8 encoding of Japanese and German characters respectively.  The
     32remaining data files are data-oriented XML documents and consist
     33entirely of single byte $7$-bit encoded ASCII characters.
    6034
    6135\begin{table*}
     
    7751\end{table*}
    7852
    79 \subsection{Workloads}\label{workloads}
    8053
    81 Markup density is defined as the ratio of the total markup contained
    82 within an XML file to the total XML document size.  This metric has
    83 substantial influence on the performance of traditional recursive
    84 descent XML parser implementations.  We use a mixture of
    85 document-oriented and data-oriented XML files in our study to provide
    86 workloads with a full spectrum of markup densities.
     54\paragraph{Platform Hardware}
     55SSE extensions have been available on commodity Intel processors for
     56over a decade since the Pentium III. They have steadily evolved with
     57improvements in instruction latency, cache interface, and register
     58resources, and the addition domain specific instructions. Here we
     59investigate SIMD extensions across three different generations of
     60intel processors. Table \ref{hwinfo} describes the Intel multicores we
     61investigate. We compare the energy and performance profile of the
     62Parabix under the platforms.  We also analyze the implementation
     63specifics of SIMD extensions under various microarchitecture. We we
     64evalute both the legacy SSE and newer AVX extensions supported by
     65Sandybridge.
    8766
    88 Table \ref{XMLDocChars} shows the document characteristics of the XML
    89 input files selected for this performance study.  The jawiki.xml and
    90 dewiki.xml XML files represent document-oriented XML inputs and
    91 contain the three-byte and four-byte UTF-8 sequence required for the
    92 UTF-8 encoding of Japanese and German characters respectively.  The
    93 remaining data files are data-oriented XML documents and consist
    94 entirely of single byte $7$-bit encoded ASCII characters.
     67We propose to investigate each the execution profiles of XML parsers
     68using the the Performance Monitoring Counter (PMC) hardware event
     69found in the processor. We have chosen several key hardware
     70performance events which provide insight into the profile of our
     71application and indicate if the processor is doing useful
     72work~\cite{bellosa2001, bertran2010}.  The set of performance counters
     73included in our study are Branch instructions, Branch mispredictions,
     74Integer instructions, SIMD instructions, and Cache misses. In
     75addition, we characterize the SIMD operations and study the type and
     76class of SIMD operations using the Intel Pin binary instrumentation
     77framework.
    9578
    9679
    97 \subsection{Platform Hardware}
    98 \paragraph{Intel \CO{}}
    99 Intel \CO{} processor, code name Conroe, produced by
    100 Intel. Table \ref{core2info} gives the hardware description of the
    101 Intel \CO{} machine.
     80
     81
    10282
    10383\begin{table*}[h]
     
    11494\end{tabular}
    11595\caption{Platform Hardware Specs}
     96\label{hwinfo}
    11697\end{table*}
    11798
    118 Intel \CITHREE\ processor, code name Nehalem, produced by Intel. The
    119 intent of the selection of this processor is to serve as an example of a low end server
    120 processor. Table \ref{i3info} gives the hardware description of the
    121 Intel \CITHREE\ machine. Intel \CIFIVE\  processor, code name \SB\, produced by
    122 Intel. Table \ref{sandybridgeinfo} gives the hardware description of the
    123 Intel \CITHREE\ machine.
    124 Each of the hardware events selected relates to performance and energy
    125 features associated with one or more hardware units.  For example,
    126 total branch mispredictions relate to the branch predictor and branch
    127 target buffer capacity.
    12899
    129 The set of PMC events used included in this study are as follows.
    130 Processor Cycles, Branch Instructions, Branch Mispredictions, Integer
    131 Instructions, SIMD Instructions and Cache Misses.
    132100
    133 \subsection{Energy Measurement}
    134   We measure energy consumption using the Fluke i410 current
    135 clamp applied on the 12V wires that supply power to the processor
    136 sockets. The clamp detects the magnetic field created by the flowing
    137 current and converts it into voltage levels (1mV per 1A
    138 current). The voltage levels are then monitored by an Agilent 34410a
    139 multimeter at the granularity of 100 samples per second. This
    140 measurement captures the power to the processor package, including
    141 cores, caches, Northbridge memory controller, and the quick-path
    142 interconnects \cite{clamp}.
     101\paragraph{Energy Measurement}
     102
     103A key benefit of the Parabix parser is its more efficient use of the
     104processor pipeline which reflects in the overall energy usage.  We
     105measure the energy consumption of the processor directly using a
     106current clamp. We apply the Fluke i410 current clamp \cite{clamp} to the 12V wires
     107that supply power to the processor sockets. The clamp detects the
     108magnetic field created by the flowing current and converts it into
     109voltage levels (1mV per 1A current). The voltage levels are then
     110monitored by an Agilent 34410a digital multimeter at the granularity
     111of 100 samples per second. This measurement captures the instantaneous
     112power to the processor package, including cores, caches, northbridge
     113memory controller, and the quick-path interconnects. We obtain samples
     114throughout the entire execution of the program and then calculate overall
     115total energy as  $12V*\sigma^{N_{samples}}_{i=1} Sample_i$.
     116
     117
Note: See TracChangeset for help on using the changeset viewer.