# Changeset 1399 for docs/HPCA2012

Ignore:
Timestamp:
Aug 30, 2011, 6:54:25 PM (8 years ago)
Message:

edits

File:
1 edited

### Legend:

Unmodified
 r1393 \label{section:methodology} \paragraph{XML Parsers}\label{parsers} \paragraph{XML Parsers:}\label{parsers} We evaluate the Parabix XML parser described above against two widely available open-source parsers: Xerces-C \cite{xerces} and Expat \cite{expat}. Each of the parsers is evaluated on the task of implementing the parsing and well-formedness validation requirements of the full XML 1.0 specification\cite{TR:XML}. Xerces-C version 3.1.1 (SAX) is a validating XML parser written in C++ and is available as part of the the Apache project. Expat version 2.0.1 is a stream-oriented non-validating XML parser library written in C. To ensure a fair comparison, we restricted our analysis of Xerces-C to its WFXML scanner to eliminate the cost of non-well-formedness validation and used the SAX interface to avoid the memory cost of DOM tree construction. We evaluate the Parabix XML parser described above against two widely available open-source parsers, Xerces-C++, and Expat.  Each of the parsers is evaluated on the task of implementing the parsing and well-formedness checking requirements of the full XML 1.0 specification\cite{TR:XML}.  Xerces-C++ version 3.1.1 (SAX) \cite{xerces} is a validating open source XML parser written in C++ available as part of the the Apache project.  To ensure a fair comparison, we use the WFXML scanner of Xerces to eliminate the overheads of validation and also use the SAX interface to avoid the overheads costs of DOM tree construction.  Expat version 2.0.1 \cite{expat} is a non-validating XML parser library written in C. \paragraph{XML Workloads}\label{workloads} \paragraph{XML Workloads:}\label{workloads} XML is used for a variety of purposes ranging from databases to config files in mobile phones. A key feature of these XML files that affects the overall parsing performance is the \textit{Markup density}. \textit{Markup density} is defined as the ratio of the total markup contained within an XML file to the total XML document size.  This metric has substantial influence on the performance of traditional recursive descent XML parser implementations.  We use a files in mobile phones. A key predictor of the overall parsing performance of an XML file is its \textit{Markup density} (i.e., the ratio of markup vs. the total XML document size.) This metric has substantial influence on the performance of traditional recursive descent XML parsers.  We use a mixture of document-oriented and data-oriented XML files in our study to  analyze workloads with a full spectrum of markup densities. \paragraph{Platform Hardware} \paragraph{Platform Hardware:} SSE extensions have been available on commodity Intel processors for over a decade since the Pentium III. They have steadily evolved with Sandybridge. We propose to investigate each the execution profiles of XML parsers using the the Performance Monitoring Counter (PMC) hardware event found in the processor. We have chosen several key hardware performance events which provide insight into the profile of our application and indicate if the processor is doing useful work~\cite{bellosa2001, bertran2010}.  The set of performance counters included in our study are Branch instructions, Branch mispredictions, We investigated the execution profiles of each XML parser using the Performance Monitoring Counter (PMC) found in the processor. We chose several key hardware events that provide insight into the profile of each application and indicate if the processor is doing useful work ~\cite{bellosa2001, bertran2010}. The set of events included in our study are: Branch instructions, Branch mispredictions, Integer instructions, SIMD instructions, and Cache misses. In addition, we characterize the SIMD operations and study the type and class of SIMD operations using the Intel Pin binary instrumentation framework. framework. \begin{table*}[h] \paragraph{Energy Measurement} \paragraph{Energy Measurement:} A key benefit of the Parabix parser is its more efficient use of the processor pipeline which reflects in the overall energy usage.  We memory controller, and the quick-path interconnects. We obtain samples throughout the entire execution of the program and then calculate overall total energy as  $12V*\sigma^{N_{samples}}_{i=1} Sample_i$. total energy as  $12V*\sum^{N_{samples}}_{i=1} Sample_i$.