Ignore:
Timestamp:
Apr 19, 2013, 3:12:47 PM (6 years ago)
Author:
lindanl
Message:

Add figures

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3047 r3048  
    1212<div lang="en" class="article">
    1313<div class="titlepage">
    14 <h2 class="article-title" id="idp75552"></h2>
     14<h2 class="article-title" id="idp118304"></h2>
    1515<div class="author">
    1616<h3 class="author">Nigel Medforth</h3>
     
    6363<div class="abstract">
    6464<p class="title"><b>Abstract</b></p>
    65 <p id="idp76976">Prior research on the acceleration of XML processing using SIMD and multi-core
     65<p id="idp119728">Prior research on the acceleration of XML processing using SIMD and multi-core
    6666            parallelism has lead to a number of interesting research prototypes. This work
    6767            investigates the extent to which the techniques underlying these prototypes could result
     
    7979<p><b>Table of Contents</b></p>
    8080<dl>
    81 <dt><span class="section"><a href="#idp285120" class="toc">Introduction</a></span></dt>
    82 <dt><span class="section"><a href="#idp286912" class="toc">Background</a></span></dt>
     81<dt><span class="section"><a href="#idp328048" class="toc">Introduction</a></span></dt>
     82<dt><span class="section"><a href="#idp329840" class="toc">Background</a></span></dt>
    8383<dd><dl>
    84 <dt><span class="section"><a href="#idp287552" class="toc">Xerces C++ Structure</a></span></dt>
    85 <dt><span class="section"><a href="#idp331216" class="toc">The Parabix Framework</a></span></dt>
    86 <dt><span class="section"><a href="#idp40512" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     84<dt><span class="section"><a href="#idp330480" class="toc">Xerces C++ Structure</a></span></dt>
     85<dt><span class="section"><a href="#idp374240" class="toc">The Parabix Framework</a></span></dt>
     86<dt><span class="section"><a href="#idp83728" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    8787</dl></dd>
    88 <dt><span class="section"><a href="#idp44272" class="toc">Architecture</a></span></dt>
     88<dt><span class="section"><a href="#idp87488" class="toc">Architecture</a></span></dt>
    8989<dd><dl>
    90 <dt><span class="section"><a href="#idp366784" class="toc">Overview</a></span></dt>
    91 <dt><span class="section"><a href="#idp390240" class="toc">Character Set Adapters</a></span></dt>
    92 <dt><span class="section"><a href="#idp398192" class="toc">Combined Parallel Filtering</a></span></dt>
    93 <dt><span class="section"><a href="#idp416848" class="toc">Content Stream</a></span></dt>
    94 <dt><span class="section"><a href="#idp427456" class="toc">Namespace Handling</a></span></dt>
    95 <dt><span class="section"><a href="#idp447600" class="toc">Error Handling</a></span></dt>
     90<dt><span class="section"><a href="#idp410064" class="toc">Overview</a></span></dt>
     91<dt><span class="section"><a href="#idp437488" class="toc">Character Set Adapters</a></span></dt>
     92<dt><span class="section"><a href="#idp446128" class="toc">Combined Parallel Filtering</a></span></dt>
     93<dt><span class="section"><a href="#idp464816" class="toc">Content Stream</a></span></dt>
     94<dt><span class="section"><a href="#idp475424" class="toc">Namespace Handling</a></span></dt>
     95<dt><span class="section"><a href="#idp494768" class="toc">Error Handling</a></span></dt>
    9696</dl></dd>
    97 <dt><span class="section"><a href="#idp458640" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
    98 <dt><span class="section"><a href="#idp475920" class="toc">Performance</a></span></dt>
     97<dt><span class="section"><a href="#idp505808" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
     98<dt><span class="section"><a href="#idp528816" class="toc">Performance</a></span></dt>
    9999<dd><dl>
    100 <dt><span class="section"><a href="#idp478640" class="toc">Xerces C++ SAXCount</a></span></dt>
    101 <dt><span class="section"><a href="#idp499872" class="toc">GML2SVG</a></span></dt>
     100<dt><span class="section"><a href="#idp531536" class="toc">Xerces C++ SAXCount</a></span></dt>
     101<dt><span class="section"><a href="#idp555632" class="toc">GML2SVG</a></span></dt>
    102102</dl></dd>
    103 <dt><span class="section"><a href="#idp501024" class="toc">Conclusion and Future Work</a></span></dt>
     103<dt><span class="section"><a href="#idp556816" class="toc">Conclusion and Future Work</a></span></dt>
    104104</dl>
    105105</div>
    106 <div class="section" id="idp285120">
     106<div class="section" id="idp328048">
    107107<h2 class="title" style="clear: both">Introduction</h2>
    108 <p id="idp285760"></p>
    109 <p id="idp286016"></p>
    110 <p id="idp286272"></p>
    111 <p id="idp286528"></p>
    112 </div>
    113 <div class="section" id="idp286912">
     108<p id="idp328688"></p>
     109<p id="idp328944"></p>
     110<p id="idp329200"></p>
     111<p id="idp329456"></p>
     112</div>
     113<div class="section" id="idp329840">
    114114<h2 class="title" style="clear: both">Background</h2>
    115 <div class="section" id="idp287552">
     115<div class="section" id="idp330480">
    116116<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    117 <p id="idp288192"> The Xerces C++ parser
     117<p id="idp331120"> The Xerces C++ parser
    118118           
    119119           
     
    126126            parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM
    127127            tree-based parsing interface. </p>
    128 <p id="idp290320">
     128<p id="idp333280">
    129129           
    130130           
     
    145145            expected that a comprehensive restructuring is required, involving all aspects of the
    146146            parser. </p>
    147 <div class="table-wrapper" id="idp293248">
     147<div class="table-wrapper" id="idp336272">
    148148<p class="title">Table I</p>
    149 <div class="caption"><p id="idp8080">Execution Time of Top 10 Xerces Functions</p></div>
     149<div class="caption"><p id="idp51264">Execution Time of Top 10 Xerces Functions</p></div>
    150150<table class="table">
    151151<colgroup span="1">
     
    202202</div>
    203203</div>
    204 <div class="section" id="idp331216">
     204<div class="section" id="idp374240">
    205205<h3 class="title" style="clear: both">The Parabix Framework</h3>
    206 <p id="idp331888"> The Parabix (parallel bit stream) framework is a transformative approach to XML
     206<p id="idp374912"> The Parabix (parallel bit stream) framework is a transformative approach to XML
    207207            parsing (and other forms of text processing.) The key idea is to exploit the
    208208            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
     
    229229             multiple
    230230            classes can share the classification cost. </p>
    231 <p id="idp343456">
     231<p id="idp386480">
    232232           
    233233         </p>
    234 <p id="idp347392"> Consider, for example, the XML source data stream shown in the first line of
     234<p id="idp390544"> Consider, for example, the XML source data stream shown in the first line of
    235235            . The remaining lines of this figure show
    236236            several parallel bit streams that are computed in Parabix-style parsing, with each bit
     
    244244            (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
    245245            marking the element names, attribute names and attribute values of tags. </p>
    246 <p id="idp37152"> Two intuitions may help explain how the Parabix approach can lead to improved XML
     246<p id="idp80368"> Two intuitions may help explain how the Parabix approach can lead to improved XML
    247247            parsing performance. The first is that the use of the full register width offers a
    248248            considerable information advantage over sequential byte-at-a-time parsing. That is,
     
    253253            individual decision-bits, an approach that computes many of them in parallel (e.g., 128
    254254            bytes at a time using 128-bit registers) should provide substantial benefit. </p>
    255 <p id="idp38400"> Previous studies have shown that the Parabix approach improves many aspects of XML
     255<p id="idp81616"> Previous studies have shown that the Parabix approach improves many aspects of XML
    256256            processing, including transcoding \cite{Cameron2008}, character classification and
    257257            validation, tag parsing and well-formedness checking. The first Parabix parser used
     
    262262            \cite{HPCA2012}. Although these research prototypes handled the full syntax of
    263263            schema-less XML documents, they lacked the functionality required by full XML parsers. </p>
    264 <p id="idp39664"> Commercial XML processors support transcoding of multiple character sets and can
     264<p id="idp82880"> Commercial XML processors support transcoding of multiple character sets and can
    265265            parse and validate against multiple document vocabularies. Additionally, they provide
    266266            API facilities beyond those found in research prototypes, including the widely used SAX,
    267267            SAX2 and DOM interfaces. </p>
    268268</div>
    269 <div class="section" id="idp40512">
     269<div class="section" id="idp83728">
    270270<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    271 <p id="idp41152"> Xerces—like all traditional XML parsers—processes XML documents
     271<p id="idp84368"> Xerces—like all traditional XML parsers—processes XML documents
    272272            sequentially. Each character is examined to distinguish between the XML-specific markup,
    273273            such as a left angle bracket <code class="code">&lt;</code>, and the content held within the
    274274            document. As the parser progresses through the document, it alternates between markup
    275275            scanning, validation and content processing modes. </p>
    276 <p id="idp42720"> In other words, Xerces belongs to an equivalent class applications termed FSM
     276<p id="idp85936"> In other words, Xerces belongs to an equivalent class applications termed FSM
    277277            applications\footnote{ Herein FSM applications are considered software systems whose
    278278            behaviour is defined by the inputs, current state and the events associated with
     
    280280            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
    281281            character could induce a state transition. </p>
    282 <p id="idp43632"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
     282<p id="idp86848"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
    283283            text is transformed into a set of lexical bitstreams, which undergo a series of
    284284            operations that can be grouped into logical layers, e.g., transposition, character
     
    289289</div>
    290290</div>
    291 <div class="section" id="idp44272">
     291<div class="section" id="idp87488">
    292292<h2 class="title" style="clear: both">Architecture</h2>
    293 <div class="section" id="idp366784">
     293<div class="section" id="idp410064">
    294294<h3 class="title" style="clear: both">Overview</h3>
    295 <p id="idp367680"> icXML is more than an optimized version of Xerces. Many components were grouped,
     295<p id="idp410960"> icXML is more than an optimized version of Xerces. Many components were grouped,
    296296            restructured and rearchitected with pipeline parallelism in mind. In this section, we
    297297            highlight the core differences between the two systems. As shown in Figure
     
    316316            and assesses whether the final output matches the user-defined DTD and schema grammar(s)
    317317            before passing it to the end-user. </p>
    318 <p id="idp373536">
    319            
    320          </p>
    321 <p id="idp374400"> In icXML functions are grouped into logical components. As shown in Figure
     318<div class="figure" id="xerces-arch">
     319<p class="title">Figure 1: Xerces Architecture</p>
     320<div class="figure-contents">
     321<div class="mediaobject" id="idp417584"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
     322<div class="caption"></div>
     323</div>
     324</div>
     325<p id="idp419840"> In icXML functions are grouped into logical components. As shown in Figure
    322326            \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the
    323327            Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which
     
    338342            described in Section \ref{section:arch:errorhandling}. From here, two data-independent
    339343            branches exist: the Symbol Resolver and Content Preparation Unit. </p>
    340 <p id="idp379008"> A typical XML file contains few unique element and attribute names—but
     344<p id="idp423856"> A typical XML file contains few unique element and attribute names—but
    341345            each of them will occur frequently. icXML stores these as distinct data structures,
    342346            called symbols, each with their own global identifier (GID). Using the symbol marker
     
    344348               Resolver</span> scans through the raw data to produce a sequence of GIDs, called
    345349            the <span class="ital">symbol stream</span>. </p>
    346 <p id="idp381504"> The final components of the Parabix Subsystem are the <span class="ital">Content
     350<p id="idp426304"> The final components of the Parabix Subsystem are the <span class="ital">Content
    347351               Preparation Unit</span> and <span class="ital">Content Stream
    348352            Generator</span>. The former takes the (transposed) basis bitstreams and selectively
    349353            filters them, according to the information provided by the Parallel Markup Parser, and
    350354            the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p>
    351 <p id="idp384416"> Combined, the symbol and content stream form icXML's compressed IR of the XML
     355<p id="idp429216"> Combined, the symbol and content stream form icXML's compressed IR of the XML
    352356            document. The <span class="ital">Markup Processor</span>~parses the IR to
    353357            validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that
     
    358362            discussed in Section \ref{section:arch:namespacehandling}. Finally, the <span class="ital">Validation</span> layer implements the Xerces's validator. However,
    359363            preprocessing associated with each symbol greatly reduces the work of this stage. </p>
    360 <p id="idp389200">
    361            
    362          </p>
    363 </div>
    364 <div class="section" id="idp390240">
     364<div class="figure" id="icxml-arch">
     365<p class="title">Figure 2: icXML Architecture</p>
     366<div class="figure-contents">
     367<div class="mediaobject" id="idp435040"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
     368<div class="caption"></div>
     369</div>
     370</div>
     371</div>
     372<div class="section" id="idp437488">
    365373<h3 class="title" style="clear: both">Character Set Adapters</h3>
    366 <p id="idp390880"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
     374<p id="idp438160"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
    367375            Xerces itself and provide the end-consumer with a single encoding format. In the
    368376            important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant,
     
    371379            other cases, transcoding may involve table look-up operations for each byte of input. In
    372380            any case, transcoding imposes at least a cost of buffer copying. </p>
    373 <p id="idp391936"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
     381<p id="idp439872"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
    374382            transcoding costs. Given a specified input encoding, a CSA is responsible for checking
    375383            that input code units represent valid characters, mapping the characters of the encoding
     
    377385            item streams), as well as supporting ultimate transcoding requirements. All of this work
    378386            is performed using the parallel bitstream representation of the source input. </p>
    379 <p id="idp392912"> An important observation is that many character sets are an extension to the legacy
     387<p id="idp440848"> An important observation is that many character sets are an extension to the legacy
    380388            7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8,
    381389            UTF-16 and many others. Furthermore, all significant characters for parsing XML are
    382390            confined to the ASCII repertoire. Thus, a single common set of lexical item calculations
    383391            serves to compute lexical item streams for all such ASCII-based character sets. </p>
    384 <p id="idp393792"> A second observation is that—regardless of which character set is
     392<p id="idp441728"> A second observation is that—regardless of which character set is
    385393            used—quite often all of the characters in a particular block of input will be
    386394            within the ASCII range. This is a very simple test to perform using the bitstream
     
    389397            be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    390398            UTF-16 form are each set to zero in this case. </p>
    391 <p id="idp395712"> A third observation is that repeated transcoding of the names of XML elements,
     399<p id="idp443648"> A third observation is that repeated transcoding of the names of XML elements,
    392400            attributes and so on can be avoided by using a look-up mechanism. That is, the first
    393401            occurrence of each symbol is stored in a look-up table mapping the input encoding to a
     
    396404            symbol look up is required to apply various XML validation rules, there is achieves the
    397405            effect of transcoding each occurrence without additional cost. </p>
    398 <p id="idp396768"> The cost of individual character transcoding is avoided whenever a block of input is
     406<p id="idp444704"> The cost of individual character transcoding is avoided whenever a block of input is
    399407            confined to the ASCII subset and for all but the first occurrence of any XML element or
    400408            attribute name. Furthermore, when transcoding is required, the parallel bitstream
     
    407415            using bit scan operations. </p>
    408416</div>
    409 <div class="section" id="idp398192">
     417<div class="section" id="idp446128">
    410418<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    411 <p id="idp398880"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
     419<p id="idp446816"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
    412420            bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two
    413421            Chinese characters <code class="code">䜠奜</code> are represented as two
     
    423431            may then be completed by applying parallel deletion and inverse transposition of the
    424432            UTF-16 bitstreams\cite{Cameron2008}. </p>
    425 <p id="idp403040">
     433<p id="idp450976">
    426434           
    427435           
     
    434442           
    435443         </p>
    436 <p id="idp406976"> Rather than immediately paying the costs of deletion and transposition just for
     444<p id="idp454912"> Rather than immediately paying the costs of deletion and transposition just for
    437445            transcoding, however, icXML defers these steps so that the deletion masks for several
    438446            stages of processing may be combined. In particular, this includes core XML requirements
     
    447455           
    448456         </p>
    449 <p id="idp409616"> In essence, the deletion masks for transcoding and for line break normalization each
     457<p id="idp457552"> In essence, the deletion masks for transcoding and for line break normalization each
    450458            represent a bitwise filter; these filters can be combined using bitwise-or so that the
    451459            parallel deletion algorithm need only be applied once. </p>
    452 <p id="idp410800"> A further application of combined filtering is the processing of XML character and
     460<p id="idp458672"> A further application of combined filtering is the processing of XML character and
    453461            entity references. Consider, for example, the references <code class="code">&amp;</code> or
    454462               <code class="code">&lt;</code>. which must be replaced in XML processing with the single
     
    463471            UTF-16 code unit. In the case, that this is not true, it is addressed in
    464472            post-processing. </p>
    465 <p id="idp415520"> The final step of combined filtering occurs during the process of reducing markup
     473<p id="idp463488"> The final step of combined filtering occurs during the process of reducing markup
    466474            data to tag bytes preceding each significant XML transition as described in
    467475            section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying
     
    473481            Haswell architecture. </p>
    474482</div>
    475 <div class="section" id="idp416848">
     483<div class="section" id="idp464816">
    476484<h3 class="title" style="clear: both">Content Stream</h3>
    477 <p id="idp417520"> A relatively-unique concept for icXML is the use of a filtered content stream.
     485<p id="idp465488"> A relatively-unique concept for icXML is the use of a filtered content stream.
    478486            Rather that parsing an XML document in its original format, the input is transformed
    479487            into one that is easier for the parser to iterate through and produce the sequential
     
    483491           
    484492            through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p>
    485 <p id="idp419920"> Combined with the symbol stream, the parser traverses the content stream to
     493<p id="idp467888"> Combined with the symbol stream, the parser traverses the content stream to
    486494            effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following
    487495               <code class="code">&gt;</code> indicates that a start tag without any attributes is the first
     
    495503            null character in the content stream in parallel, which in turn means the parser can
    496504            directly jump to the end of every string without scanning for it. </p>
    497 <p id="idp423312"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
     505<p id="idp471280"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
    498506            existence of an attribute. Because all of the intra-element was performed in the Parabix
    499507            Subsystem, this must be a legal attribute. Since attributes can only occur within start
     
    509517            that the appropriate scope-nesting rules have been applied. </p>
    510518</div>
    511 <div class="section" id="idp427456">
     519<div class="section" id="idp475424">
    512520<h3 class="title" style="clear: both">Namespace Handling</h3>
    513 <p id="idp428544"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
     521<p id="idp476512"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
    514522            together. It is especially important when a vocabulary application-dependant meaning,
    515523            such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound
     
    528536            uniquely-named items because the current vocabulary is determined by the namespace(s)
    529537            that are in-scope. </p>
    530 <p id="idp435712">
     538<p id="idp483632">
    531539           
    532540         </p>
    533 <p id="idp436256"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
     541<p id="idp484176"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
    534542            persist for the lifetime of the application through the use of a global URI pool. Xerces
    535543            maintains a stack of namespace scopes that is pushed (popped) every time a start tag
     
    539547            those that declare a set of namespaces upfront and never change them, and (2) those that
    540548            repeatedly modify the namespaces in predictable patterns. </p>
    541 <p id="idp438720"> For that reason, icXML contains an independent namespace stack and utilizes bit
     549<p id="idp485888"> For that reason, icXML contains an independent namespace stack and utilizes bit
    542550            vectors to cheaply perform
    543551             When a prefix is
     
    553561            found using a bit-scan intrinsic. A namespace binding table, similar to Table
    554562            \ref{tbl:namespace1}, provides the actual URI ID. </p>
    555 <p id="idp443152">
     563<p id="idp490320">
    556564           
    557565         </p>
    558 <p id="idp443696">
     566<p id="idp490864">
    559567           
    560568           
     
    562570           
    563571         </p>
    564 <p id="idp446144"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
     572<p id="idp493312"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
    565573            any modification to the currently visible namespaces is calculated and stored within a
    566574            stack of bit vectors denoting the locally modified namespace bindings. When an end tag
     
    571579         </p>
    572580</div>
    573 <div class="section" id="idp447600">
     581<div class="section" id="idp494768">
    574582<h3 class="title" style="clear: both">Error Handling</h3>
    575 <p id="idp448272">
     583<p id="idp495440">
    576584           
    577585            Xerces outputs error messages in two ways: through the programmer API and as thrown
     
    582590            \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and
    583591            Markup Processor, each with its own system for detecting and producing error messages. </p>
    584 <p id="idp449904"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
     592<p id="idp497072"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
    585593            a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking
    586594            the byte-position of an error within a block, and the type of error is determined by the
     
    615623            detected, the sum of those skipped positions is subtracted from the distance to
    616624            determine the actual column number. </p>
    617 <p id="idp455392"> The Markup Processor is a state-driven machine. As such, error detection within it
     625<p id="idp502560"> The Markup Processor is a state-driven machine. As such, error detection within it
    618626            is very similar to Xerces. However, reporting the correct line/column is a much more
    619627            difficult problem. The Markup Processor parses the content stream, which is a series of
     
    629637</div>
    630638</div>
    631 <div class="section" id="idp458640">
     639<div class="section" id="idp505808">
    632640<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    633 <p id="idp459280"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
     641<p id="idp506448"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
    634642         application. These are "embarrassingly
    635643         sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to
     
    639647         well into the general model of pipeline parallelism, in which each thread is in charge of a
    640648         single module or group of modules. </p>
    641 <p id="idp461136"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
     649<p id="idp508304"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
    642650         and the Markup Processor into distinct logical layers into two separate stages. The
    643651         resultant application, <span class="ital">icXML-p</span>, is a course-grained
     
    660668            <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can
    661669         reuse the memory space. </p>
    662 <p id="idp470208">
    663          
     670<p id="idp517424">
     671        <div class="figure" id="threads_timeline1">
     672<p class="title">Figure 3: Thread Balance in Two-Stage Pipelines</p>
     673<div class="figure-contents">
     674<div class="mediaobject" id="idp518816"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>
     675<div class="caption"></div>
     676</div>
     677</div>
     678        <div class="figure" id="threads_timeline2">
     679<p class="title">Figure 4: Thread Balance in Two-Stage Pipelines</p>
     680<div class="figure-contents">
     681<div class="mediaobject" id="idp522240"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>
     682<div class="caption"></div>
     683</div>
     684</div>
    664685      </p>
    665 <p id="idp470752"> Overall, our design is intended to benefit a range of applications. Conceptually, we
     686<p id="idp524656"> Overall, our design is intended to benefit a range of applications. Conceptually, we
    666687         consider two design points. The first, the parsing performed by the Parabix Subsystem
    667688         dominates at 67% of the overall cost, with the cost of application processing (including
     
    669690         scenario, the cost of application processing dominates at 60%, while the cost of XML
    670691         parsing represents an overhead of 40%. </p>
    671 <p id="idp472672"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
     692<p id="idp525568"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
    672693         100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement
    673694         of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the
     
    677698         about 33% of the original work. In this case, Amdahl's law predicts that we could expect up
    678699         to a 3x speedup at best. </p>
    679 <p id="idp473792"> At the other extreme of our design range, we consider an application in which core
     700<p id="idp526688"> At the other extreme of our design range, we consider an application in which core
    680701         parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the
    681702         corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most
     
    683704         the entire latency of parsing within the serial time required by the application. In this
    684705         case, we achieve an overall speedup in processing time by 1.67x. </p>
    685 <p id="idp474736"> Although the structure of the Parabix Subsystem allows division of the work into
     706<p id="idp527632"> Although the structure of the Parabix Subsystem allows division of the work into
    686707         several pipeline stages and has been demonstrated to be effective for four pipeline stages
    687708         in a research prototype \cite{HPCA2012}, our analysis here suggests that the further
     
    691712         the cost of application logic that could match reductions in core parsing cost. </p>
    692713</div>
    693 <div class="section" id="idp475920">
     714<div class="section" id="idp528816">
    694715<h2 class="title" style="clear: both">Performance</h2>
    695 <p id="idp476592"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
     716<p id="idp529488"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
    696717         Xerces C++ SAXCount sample application, and a real world GML to SVG transformation
    697718         application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy
     
    699720         L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu
    700721         12.04 (Linux). </p>
    701 <p id="idp477504"> We analyzed the execution profiles of each XML parser using the performance counters
     722<p id="idp530400"> We analyzed the execution profiles of each XML parser using the performance counters
    702723         found in the processor. We chose several key hardware events that provide insight into the
    703724         profile of each application and indicate if the processor is doing useful work. The set of
     
    707728         collection of hardware performance monitoring statistics. In addition, we used the Linux
    708729         perf \cite{perf} utility to collect per core hardware events. </p>
    709 <div class="section" id="idp478640">
     730<div class="section" id="idp531536">
    710731<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    711 <p id="idp479312"> Xerces comes with sample applications that demonstrate salient features of the
     732<p id="idp532208"> Xerces comes with sample applications that demonstrate salient features of the
    712733            parser. SAXCount is the simplest such application: it counts the elements, attributes
    713734            and characters of a given XML file using the (event based) SAX API and prints out the
    714735            totals. </p>
    715 <p id="idp480016"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
     736<p id="idp532912"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
    716737            selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented
    717738            XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the
    718739            UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML
    719740            documents and consist entirely of single byte encoded ASCII characters.
    720   <div class="table-wrapper" id="idp480752">
     741  <div class="table-wrapper" id="idp533648">
    721742<p class="title">Table II</p>
    722 <div class="caption"><p id="idp481264">XML Document Characteristics</p></div>
     743<div class="caption"><p id="idp534160">XML Document Characteristics</p></div>
    723744<table class="table">
    724745<colgroup span="1">
     
    769790</div>           
    770791</p>
    771 <p id="idp496800"> A key predictor of the overall parsing performance of an XML file is markup
     792<p id="idp549696"> A key predictor of the overall parsing performance of an XML file is markup
    772793            density\footnote{ Markup Density: the ratio of markup bytes used to define the structure
    773794            of the document vs. its file size.}. This metric has substantial influence on the
     
    776797            of document-oriented and data-oriented XML files to analyze performance over a spectrum
    777798            of markup densities. </p>
    778 <p id="idp497808"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
     799<p id="idp550704"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
    779800            in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over
    780801            Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve
     
    783804            icXML-p performs better as markup-density increases because the work performed by each
    784805            stage is well balanced in this application. </p>
    785 <p id="idp498848">
    786            
     806<p id="idp551744">
     807        <div class="figure" id="perf_SAX">
     808<p class="title">Figure 5: SAXCount Performance Comparison</p>
     809<div class="figure-contents">
     810<div class="mediaobject" id="idp553088"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
     811<div class="caption"></div>
     812</div>
     813</div>
    787814         </p>
    788815</div>
    789 <div class="section" id="idp499872">
     816<div class="section" id="idp555632">
    790817<h3 class="title" style="clear: both">GML2SVG</h3>
    791 <p id="idp500512"></p>
    792 </div>
    793 </div>
    794 <div class="section" id="idp501024">
     818<p id="idp556304"></p>
     819</div>
     820</div>
     821<div class="section" id="idp556816">
    795822<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    796 <p id="idp501712"> This paper is the first case study documenting the significant performance benefits
     823<p id="idp557504"> This paper is the first case study documenting the significant performance benefits
    797824         that may be realized through the integration of parallel bitstream technology into existing
    798825         widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined
     
    804831         technologies, this is an important case study demonstrating the general feasibility of
    805832         these techniques. </p>
    806 <p id="idp502992"> The further development of icXML to move beyond 2-stage pipeline parallelism is
     833<p id="idp559680"> The further development of icXML to move beyond 2-stage pipeline parallelism is
    807834         ongoing, with realistic prospects for four reasonably balanced stages within the library.
    808835         For applications such as GML2SVG which are dominated by time spent on XML parsing, such a
    809836         multistage pipelined parsing library should offer substantial benefits. </p>
    810 <p id="idp503760"> The example of XML parsing may be considered prototypical of finite-state machines
     837<p id="idp560448"> The example of XML parsing may be considered prototypical of finite-state machines
    811838         applications which have sometimes been considered "embarassingly
    812839         sequential" and so difficult to parallelize that "nothing
     
    814841         point in making the case that parallelization can indeed be helpful across a broad array of
    815842         application types. </p>
    816 <p id="idp505136"> To overcome the software engineering challenges in applying parallel bitstream
     843<p id="idp561824"> To overcome the software engineering challenges in applying parallel bitstream
    817844         technology to existing software systems, it is clear that better library and tool support
    818845         is needed. The techniques used in the implementation of icXML and documented in this paper
     
    821848      </p>
    822849</div>
    823 <div class="bibliography" id="idp506624">
     850<div class="bibliography" id="idp563312">
    824851<h2 class="title" style="clear:both">Bibliography</h2>
    825852<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
Note: See TracChangeset for help on using the changeset viewer.