Changeset 1368


Ignore:
Timestamp:
Aug 24, 2011, 3:55:35 PM (8 years ago)
Author:
ksherdy
Message:

large changes to section 2; minor edits to section 3

Location:
docs/HPCA2012
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/02-background.tex

    r1302 r1368  
    22\label{section:background}
    33
     4% This section provides
     5
     6% Extensible Markup Language (XML) is a core technology standard
     7% of the World Wide Web Consortium (W3C) that provides a common
     8% framework for encoding and communicating structured information. 
     9% In applications ranging from Office Open XML in
     10% Microsoft Office to NDFD XML of the NOAA National Weather
     11% Service, from KML in Google Earth to Castor XML in the Martian Rovers,
     12% from ebXML for e-commerce data interchange to RSS for news feeds
     13% from web sites everywhere, XML plays a ubiquitous role in providing
     14% a common framework for data interoperability world-wide and beyond.
     15% As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
     16% "there is essentially no computer in the world, desk-top, hand-held,
     17% or back-room, that doesn't process XML sometimes."
     18
     19
     20
     21%
     22% With this focus on performance however, relatively little attention
     23% has been paid on reducing energy consumption in XML processing.  For example, in addressing
     24% performance through multicore parallelism, one generally must
     25% pay an energy price for performance gains because of the
     26% increased processing required for synchronization.   
     27% This focus on reduction of energy consumption is a key topic in this
     28% paper. We study the energy and performance
     29% characteristics of several XML parsers across three generations
     30% of x86-64 processor technology.  The parsers we consider are
     31% the widely used byte-at-a-time parsers Expat and Xerces, as well the
     32% Parabix1 and Parabix2 parsers based on parallel bit stream technology. 
     33% A compelling result is that the performance benefits of parallel bit stream technology
     34% translate directly and proportionally to substantial energy savings.
     35
     36
    437\subsection{XML}
    5 In 1998, the W3C officially adopted XML as a standard. The defining characteristics of XML are that it can represent virtually any type of information through the use of self-describing markup tags and can easily store semi-structured data in a descriptive fashion. XML markup encodes a description of an XML document's storage layout and logical structure. Because XML was intended to be human-readable, XML markup tags are often verbose by design \cite{TR:XML}.
     38% In 1998, the W3C officially adopted XML as a standard for data interchange.
     39% Today, XML can be found everywhere from mobile websites and cloud computing to server-side database engines.
    640
    7 % <ProductName Language="Chinese">郚件</ProductName> % can't represent in verbose and not really sure if the google auto-translater is correct
    8 
    9 XML files can be classified as ``document-oriented'' or ``data-oriented'' \cite{DuCharme04}. Documented-oriented XML is designed for human readability, such as shown in Figure \ref{fig:sample_xml}; data-oriented XML files are intended to be parsed by machines and omit ``human-friendly'' formatting techniques, such as the use of whitespace and descriptive ``natural language'' naming schemes.  Although the XML specification itself does not distinguish between ``XML for documents'' and ``XML for data'' \cite{TR:XML}, the latter often requires the use of an XML parser to extract the information within. The role of an XML parser is to transform the text-based XML data into application ready data.
     41Extensible Markup Language (XML) is a core technology standard of the World Wide Web Consortium (W3C);
     42it provides a common framework for encoding and communicating structured and semi-structured information. 
     43XML can represent virtually any type of information (i.e., content) in a descriptive fashion.
     44XML markup encodes a description of an XML document's storage layout and logical structure.
     45Because XML is intended to be human-readable, XML markup tags are often verbose by design \cite{TR:XML}.
     46For example, Figure \ref{fig:sample_xml} provides a standard product list encapsulated within an XML document.
     47All content is highlighted in bold. Anything that is not content is considered markup.
    1048
    1149\begin{figure}[h]
    12 {
    13 \scriptsize
    14 \begin{verbatim}
    15  <?xml version="1.0"?>
    16  <Products>
    17   <Product ID="0001">
    18    <ProductName Language="English">Widget</ProductName>
    19    <ProductName Language="French">Bitoniau</ProductName>           
    20    <Company>ABC</Company>
    21    <Price>$19.95</Price>
    22   </Product>
    23  </Products>
    24 \end{verbatim}
    25 }
    26 \caption{Example XML Document}\label{fig:sample_xml}
     50
     51{\scriptsize
     52\begin{center}
     53\begin{tabbing}
     54\hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \\
     55{\ttfamily{\verb`<`Products\verb`>`}}\+ \\
     56{\ttfamily{\verb`<`Product ID=\verb`"`}}{\bfseries\ttfamily{0001}}{\ttfamily{\verb`"`\verb`>`}}\+ \\
     57{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{English}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Widget}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
     58{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{French}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Bitoniau}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
     59{\ttfamily{\verb`<`Company\verb`>`}}{\bfseries\ttfamily{ABC}}{\ttfamily{\verb`<`/Company\verb`>` }}\\
     60{\ttfamily{\verb`<`Price\verb`>`}}{\bfseries\ttfamily{\$19.95}}{\ttfamily{\verb`<`/Price\verb`>` }}\- \\
     61{\ttfamily{\verb`<`/Product\verb`>` }}\- \\
     62{\ttfamily{\verb`<`/Products\verb`>` }}\\
     63\end{tabbing}
     64\end{center}}
     65
     66\caption{Sample XML Document}\label{fig:sample_xml}
    2767\end{figure}
    2868
    2969
    30 %For example, an XML parser for a web browser may take a XML file, apply a style sheet to it, and display it to the end user in an attractive yet informative way; an XML database parser may take a XML file and construct indexes and/or compress the tree into a proprietary format to provide the end user with efficient relational, hierarchical, and/or object-based query access to it.
    3170
    32 %Cameron et al.'s work in \cite{CameronHerdyLin2008} demonstrates that both XML parser selection and XML document markup density have considerable impact on the computational costs of processing XML documents. Computational costs are reported in processor cycles per XML byte; markup density is defined as the ratio of XML markup bytes to the total number of XML document bytes.
     71\subsection{XML Parsers}
     72% However, textual data tends to consist of variable-length items in generally unpredictable patterns \cite{Cameron2010}.
     73Traditional XML parsers process an XML document sequentially, a single byte-at-a-time,
     74from the first to the last character in the source text.
     75Each character is examined to distinguish between the XML-specific markup,
     76such as an opening angle bracket `\verb`<`', and the content held within the document.
     77The character that the parser is currently processing is commonly referred to its {\it cursor position}. As the
     78parser moves its cursor through the source document, it alternates between markup scanning / validation and
     79content processing operations.
     80In other words,
     81traditional XML parsers operate as finite-state machines that use byte comparisons to transition
     82between data and metadata states. Each state transition indicates the context in which to interpret the
     83subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in
     84generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until
     85deemed otherwise.
    3386
    34 \subsection{Traditional XML Parsers}
    35 % However, textual data tends to consist of variable-length items in generally unpredictable patterns \cite{Cameron2010}.
    36 Traditional XML parsers process XML sequentially a single byte-at-a-time. Following this approach, an XML parser processes a source document serially, from the first to the last byte of the source file. Each character of the source text is examined in turn to distinguish between the XML-specific markup, such as an opening angle bracket `<', and the content held within the document. The current character that the parser is processing is commonly referred to using the concept of a current cursor position. As the parser moves the cursor through the source document, the parser alternates between markup scanning, and data validation and processing operations. At each processing step, the parser scans the source document and either locates the expected markup, or reports an error condition and terminates. In other words, traditional XML parsers operate as complex finite-state machines that use byte comparisons to transition between data and metadata states. Each state transition indicates the context in which to interpret the subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until deemed otherwise.
     87A major disadvantage of the sequential byte-at-a-time approach to XML parsing is that processing an XML document
     88requires at least one conditional branch per byte of source text.
     89For example, Xerces-C, which is the most popular open-source C++ based XML parser and the foundation of the
     90Apache XML project \cite{xerces},
     91uses a series of nested switch statements and state-dependent flag tests to control the parsing logic of the program.
     92Our analysis, which we detail in {\bf SECTION XXX}, found that Xerces-C requires between 6 - 13 branches per byte
     93of XML to support this form of control structure (depending on the ratio of markup vs. content).
     94Cache utilization is also significantly reduced due to the manner in which markup and content must be scanned and buffered
     95for future use.
     96For instance, Xerces-C incurs $\sim$100 L1 cache misses per 1000 bytes of XML.
     97% Branch mispredictions are known
     98% to signficantly degrade XML parsing performance in proportion to the markup density of the source document
     99% \cite{CameronHerdyLin2008} (i.e., the ratio of markup vs. content).
    37100
    38 Expat and Xerces-C are popular byte-a-time sequential parsers. Both are C/C++ based and open-source. Expat was originally released in 1998; it is currently used in Mozilla Firefox and provides the core functionality of many additional XML processing tools \cite{expat}. Xerces-C was released in 1999 and is the foundation of the Apache XML project \cite{xerces}.
    39101
    40 % For example, the main loop of Xerces-C well-formedness scanner contains:
    41 %\begin{verbatim}
    42 %   XXXXXXXXXX   XERCES CODE   XXXXXXXXXX
    43 %\end{verbatim}
    44102
    45 A major disadvantage of the sequential byte-at-a-time approach to XML parsing is that each XML character incurs at least one conditional branch. The cumulative effect of branch mispredictions penalties are known to degrade XML parsing performance in proportion to the markup density of the source document \cite{CameronHerdyLin2008} (i.e., the proportion of XML-markup to XML-data).
    46103
    47 \subsection {Parallel XML Parsing}
    48 In general, parallel XML acceleration methods come in one of two forms: multithreaded approaches and SIMD-based techniques.
    49 Multithreaded XML parsers take advantage of multiple cores via number of strategies.
    50 Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09} and speculative p-DFAs \cite{ZhangPanChiu09}.
    51 SIMD XML parsers leverage the SIMD registers to overcome the performance limitations of the sequential byte-at-a-time processing model and its
    52 inherently data dependent branch misprediction rates.  Further, data parallel SIMD instructions allow the processor to perform the same
    53 operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
    54 fall under the SIMD classification. The Parabix parser versions studied are described in further detail in Section \ref{section:parabix}.
     104
     105
     106% Switch block logical
     107% Branches
     108% Cache utilization
     109
     110% \subsection {Parallel XML Parsing}
     111%
     112%
     113% In general, parallel acceleration methods come in two forms: multithreaded approaches and
     114% SIMD-based techniques. Multithreaded XML parsers take advantage of multiple cores via number of strategies.
     115% Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09}
     116% and speculative p-DFAs \cite{ZhangPanChiu09}.
     117
     118% To accelerate XML parsing performance, most of the recent
     119% work has focused on parallelization through the use of multicore
     120% parallelism for chip multiprocessors \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
     121
     122
     123% SIMD XML parsers leverage the SIMD registers to overcome the
     124% performance limitations of the sequential byte-at-a-time processing model and its inherently data dependent
     125% branch misprediction rates.  Further, data parallel SIMD instructions allow the processor to perform the same
     126% operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
     127% fall under the SIMD classification. The Parabix parser versions studied are described in further detail in
     128% Section \ref{section:parabix}.
     129
     130% while SIMD (Single Instruction Multiple Data) parallelism
     131% has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
     132% , as well as to the developers of parallel bit stream technology
     133% \cite{CameronHerdyLin2008, Cameron2009, Cameron2010}.
     134% Each of these approaches has shown considerable performance
     135% benefits over traditional sequential parsing techniques that follow the
     136% byte-at-a-time model.
    55137
    56138%\subsection {SIMD Operations}
  • docs/HPCA2012/03-research.tex

    r1355 r1368  
    4545Rather than viewing it as a stream of bytes or characters and performing per-byte comparisons to
    4646determine when to transition between parsing states, Parabix views data as a set of \emph{bit streams} and uses a mixture of
    47 boolean-logic and integer-based math on those streams to effectively parse many bytes in parallel.
    48 In this paper, we use the notation $\land$, $\lor$ and $\lnot$ to denote the AND, OR and NOT boolean operations.
     47boolean-logic\footnote{In this paper, we use the notation $\land$, $\lor$ and $\lnot$ to denote the AND, OR and NOT boolean operations.} and integer-based math on those streams to effectively parse many bytes in parallel.
    4948Bit streams are the foundation of the Parabix technology. In order to understand how it is possible to
    5049perform complex state-transition-based text processing operations in parallel, understanding what bit streams are,
     
    7271The significance of each bit value is dependent on the type of bit stream.
    7372We view bit streams in one of two ways: $n$ 1-bit values for boolean-logic operations and 1 $n$-bit value for integer-based math.
    74 For simplicity, assume that $n$ is infinitely long w.r.t. the size of the source data. In reality, each bit stream is divided into
    75 a series of $w$-bit blocks, where $w$ is equal to the width of the SIMD registers within the system
    76 (e.g., 64-bits for MMX, 128-bits for SSE/NEON, and 256-bits for AVX).
    77 We discuss how these $\frac{n}{w}$ bit stream segments can be used as infinitely-long bit stream in Section \ref{parallel bit stream compilation}.
     73For simplicity, assume that $n$ is infinitely long (i.e., unbounded) w.r.t. the size of the source data. In reality, each bit stream is divided into
     74a series of $w$-bit blocks, where $w$ is equal to the SIMD register width of the system (e.g., 64, 128, or 256).
     75We discuss how these $\lceil \frac{n}{w} \rceil$ bit blocks can be viewed as an unbounded bit stream in Section \ref{parabix tool chain}.
    7876
    7977The first type of bit streams used in Parabix are refered to as \emph{basis bit streams}, which contain the input source data.
     
    144142% However, a `<' is legal within an XML comment so not every `<' necessarily means that we are opening a new tag.
    145143
    146 Character-class bit streams allow us to perform up to 128 ``comparisons'' in parallel in a single operation (using Intel SSE/ARM NEON)
     144Character-class bit streams allow us to perform up to 128 ``comparisons'' in parallel with a single operation
    147145by using a series of boolean-logic operations to merge multiple basis bit streams into
    148146a single character-class stream that marks the positions of key characters with a $1$.
     
    219217Using this parallel bit stream approach, conditional branch statements used to identify key positions and/or syntax errors at each
    220218each parsing position are mostly eliminated, which, as Section XXX shows, minimizes branch misprediction penalties.
    221 Accurate parsing and parallel lexical analysis is done through processor-friendly equations that requires no speculation nor multithreading.
     219Accurate parsing and parallel lexical analysis is done through processor-friendly equations and requires neither speculation nor multithreading.
    222220
    223221
    224222\subsection{Parabix Tool Chain}
     223\label{parabix tool chain}
    225224
    226225To support the Parabix framework, we are developing tool technology to automate various aspects
     
    269268previous subsection as well as if and while control structures.
    270269Pablo translates these operations to block-at-a-time
    271 code in C/C++, where the block size is the register
    272 width for SIMD operations on the selected target architecture.
     270code in C/C++.
     271%, where the block size is the register width for SIMD operations on the selected target architecture.
    273272The key functionality of Pablo is to arrange for block-to-block
    274273carry bit propagation to implement the long bitstream shift
     
    286285an abstract SIMD machine based on the Inductive Doubling
    287286Instruction Set Architecture (IDISA) \cite{CameronLin2009}.
    288 Originally developed for 128-bit Altivec operations on Power PC
    289 as well as 64-bit MMX and 128-bit SSE operations on Intel,
    290 we have recently extended our library support to include
     287These operations were originally developed for 128-bit Altivec operations on Power PC
     288as well as 64-bit MMX and 128-bit SSE operations on Intel
     289but have recently extended to support
    291290the new 256-bit AVX operations on Intel as well as the 128-bit
    292291NEON operations on the ARM architecture. Further details
Note: See TracChangeset for help on using the changeset viewer.