Changeset 3053 for docs/Balisage13


Ignore:
Timestamp:
Apr 19, 2013, 4:07:55 PM (6 years ago)
Author:
cameron
Message:

LB normalization figure

Location:
docs/Balisage13/Bal2013came0601
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3052 r3053  
    274274</div>
    275275<div id="mast"><div class="content">
    276 <h2 class="article-title" id="idp74240"></h2>
     276<h2 class="article-title" id="idp66752"></h2>
    277277<div class="author">
    278278<h3 class="author">Nigel Medforth</h3>
     
    324324</div>
    325325<div class="mast-box">
    326 <p class="title"><a href="javascript:toggle('idp75360')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75360"></a> <span onclick="javascript:toggle('idp75360');return true">Abstract</span></p>
    327 <div class="folder" id="folder-idp75360" style="display:none"><p id="idp75664">Prior research on the acceleration of XML processing using SIMD and multi-core
     326<p class="title"><a href="javascript:toggle('idp67872')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67872"></a> <span onclick="javascript:toggle('idp67872');return true">Abstract</span></p>
     327<div class="folder" id="folder-idp67872" style="display:none"><p id="idp68176">Prior research on the acceleration of XML processing using SIMD and multi-core
    328328            parallelism has lead to a number of interesting research prototypes. This work
    329329            investigates the extent to which the techniques underlying these prototypes could result
     
    339339<p><b>Table of Contents</b></p>
    340340<dl>
    341 <dt><span class="section"><a href="#idp284144" class="toc">Introduction</a></span></dt>
    342 <dt><span class="section"><a href="#idp285936" class="toc">Background</a></span></dt>
     341<dt><span class="section"><a href="#idp275888" class="toc">Introduction</a></span></dt>
     342<dt><span class="section"><a href="#idp277680" class="toc">Background</a></span></dt>
    343343<dd><dl>
    344 <dt><span class="section"><a href="#idp286576" class="toc">Xerces C++ Structure</a></span></dt>
    345 <dt><span class="section"><a href="#idp330416" class="toc">The Parabix Framework</a></span></dt>
    346 <dt><span class="section"><a href="#idp404448" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     344<dt><span class="section"><a href="#idp278320" class="toc">Xerces C++ Structure</a></span></dt>
     345<dt><span class="section"><a href="#idp321360" class="toc">The Parabix Framework</a></span></dt>
     346<dt><span class="section"><a href="#idp411088" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    347347</dl></dd>
    348 <dt><span class="section"><a href="#idp408832" class="toc">Architecture</a></span></dt>
     348<dt><span class="section"><a href="#idp415472" class="toc">Architecture</a></span></dt>
    349349<dd><dl>
    350 <dt><span class="section"><a href="#idp409472" class="toc">Overview</a></span></dt>
    351 <dt><span class="section"><a href="#idp437120" class="toc">Character Set Adapters</a></span></dt>
    352 <dt><span class="section"><a href="#idp446016" class="toc">Combined Parallel Filtering</a></span></dt>
    353 <dt><span class="section"><a href="#idp474496" class="toc">Content Stream</a></span></dt>
    354 <dt><span class="section"><a href="#idp485104" class="toc">Namespace Handling</a></span></dt>
    355 <dt><span class="section"><a href="#idp528320" class="toc">Error Handling</a></span></dt>
     350<dt><span class="section"><a href="#idp416112" class="toc">Overview</a></span></dt>
     351<dt><span class="section"><a href="#idp444144" class="toc">Character Set Adapters</a></span></dt>
     352<dt><span class="section"><a href="#idp451856" class="toc">Combined Parallel Filtering</a></span></dt>
     353<dt><span class="section"><a href="#idp467808" class="toc">Content Stream</a></span></dt>
     354<dt><span class="section"><a href="#idp478528" class="toc">Namespace Handling</a></span></dt>
     355<dt><span class="section"><a href="#idp521984" class="toc">Error Handling</a></span></dt>
    356356</dl></dd>
    357 <dt><span class="section"><a href="#idp538544" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
    358 <dt><span class="section"><a href="#idp560656" class="toc">Performance</a></span></dt>
     357<dt><span class="section"><a href="#idp532208" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
     358<dt><span class="section"><a href="#idp554320" class="toc">Performance</a></span></dt>
    359359<dd><dl>
    360 <dt><span class="section"><a href="#idp563376" class="toc">Xerces C++ SAXCount</a></span></dt>
    361 <dt><span class="section"><a href="#idp588176" class="toc">GML2SVG</a></span></dt>
     360<dt><span class="section"><a href="#idp557040" class="toc">Xerces C++ SAXCount</a></span></dt>
     361<dt><span class="section"><a href="#idp581280" class="toc">GML2SVG</a></span></dt>
    362362</dl></dd>
    363 <dt><span class="section"><a href="#idp605296" class="toc">Conclusion and Future Work</a></span></dt>
     363<dt><span class="section"><a href="#idp598304" class="toc">Conclusion and Future Work</a></span></dt>
    364364</dl>
    365365</div>
    366366<div class="mast-box">
    367 <p class="title"><a href="javascript:toggle('idp77088')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77088"></a> <span onclick="javascript:toggle('idp77088');return true">Nigel Medforth</span></p>
    368 <div class="folder" id="folder-idp77088" style="display:none">
     367<p class="title"><a href="javascript:toggle('idp69600')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp69600"></a> <span onclick="javascript:toggle('idp69600');return true">Nigel Medforth</span></p>
     368<div class="folder" id="folder-idp69600" style="display:none">
    369369<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>&gt;</code></h5>
    370370<div class="affiliation">
     
    377377</div>
    378378<div class="personblurb">
    379 <p id="idp58880">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
     379<p id="idp51296">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
    380380               developer of icXML. He earned a Bachelor of Technology in Information Technology at
    381381               Kwantlen Polytechnic University in 2009 and was awarded the Dean’s Medal for
    382382               Outstanding Achievement.</p>
    383 <p id="idp59888">Nigel is currently researching ways to leverage both the Parabix framework and
     383<p id="idp52304">Nigel is currently researching ways to leverage both the Parabix framework and
    384384               stream-processing models to further accelerate XML parsing within icXML.</p>
    385385</div>
     
    387387</div>
    388388<div class="mast-box">
    389 <p class="title"><a href="javascript:toggle('idp63552')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63552"></a> <span onclick="javascript:toggle('idp63552');return true">Dan Lin</span></p>
    390 <div class="folder" id="folder-idp63552" style="display:none">
     389<p class="title"><a href="javascript:toggle('idp55968')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp55968"></a> <span onclick="javascript:toggle('idp55968');return true">Dan Lin</span></p>
     390<div class="folder" id="folder-idp55968" style="display:none">
    391391<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>&gt;</code></h5>
    392392<div class="affiliation">
     
    394394<p class="orgname">Simon Fraser University </p>
    395395</div>
    396 <div class="personblurb"><p id="idp65264">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
     396<div class="personblurb"><p id="idp57680">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
    397397             in Computing Science at Simon Fraser University in 2010. Her research focus on on high
    398398             performance algorithms that exploit parallelization strategies on various multicore platforms.
     
    401401</div>
    402402<div class="mast-box">
    403 <p class="title"><a href="javascript:toggle('idp67824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67824"></a> <span onclick="javascript:toggle('idp67824');return true">Kenneth Herdy</span></p>
    404 <div class="folder" id="folder-idp67824" style="display:none">
     403<p class="title"><a href="javascript:toggle('idp60240')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp60240"></a> <span onclick="javascript:toggle('idp60240');return true">Kenneth Herdy</span></p>
     404<div class="folder" id="folder-idp60240" style="display:none">
    405405<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>&gt;</code></h5>
    406406<div class="affiliation">
     
    409409</div>
    410410<div class="personblurb">
    411 <p id="idp69552"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
     411<p id="idp61968"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
    412412               Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor
    413413               of Science in Computing Science with a Certificate in Spatial Information Systems at
    414414               Simon Fraser University in 2005. </p>
    415 <p id="idp271088"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
     415<p id="idp262752"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
    416416               University with industrial scholarship support from the Natural Sciences and
    417417               Engineering Research Council of Canada, the Mathematics of Information Technology and
     
    423423</div>
    424424<div class="mast-box">
    425 <p class="title"><a href="javascript:toggle('idp273824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp273824"></a> <span onclick="javascript:toggle('idp273824');return true">Rob Cameron</span></p>
    426 <div class="folder" id="folder-idp273824" style="display:none">
     425<p class="title"><a href="javascript:toggle('idp265488')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp265488"></a> <span onclick="javascript:toggle('idp265488');return true">Rob Cameron</span></p>
     426<div class="folder" id="folder-idp265488" style="display:none">
    427427<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
    428428<div class="affiliation">
     
    434434<p class="orgname">International Characters, Inc.</p>
    435435</div>
    436 <div class="personblurb"><p id="idp275488">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
     436<div class="personblurb"><p id="idp267152">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
    437437               Sciences at Simon Fraser University. His research interests include programming
    438438               language and software system technology, with a specific focus on high performance
     
    450450<div id="main">
    451451<div class="article">
    452 <h2 class="article-title" id="idp74240"></h2>
    453 <div class="section" id="idp284144">
     452<h2 class="article-title" id="idp66752"></h2>
     453<div class="section" id="idp275888">
    454454<h2 class="title" style="clear: both">Introduction</h2>
    455 <p id="idp284784"></p>
    456 <p id="idp285040"></p>
    457 <p id="idp285296"></p>
    458 <p id="idp285552"></p>
    459 </div>
    460 <div class="section" id="idp285936">
     455<p id="idp276528"></p>
     456<p id="idp276784"></p>
     457<p id="idp277040"></p>
     458<p id="idp277296"></p>
     459</div>
     460<div class="section" id="idp277680">
    461461<h2 class="title" style="clear: both">Background</h2>
    462 <div class="section" id="idp286576">
     462<div class="section" id="idp278320">
    463463<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    464 <p id="idp287216"> The Xerces C++ parser
    465            
    466            
    467             features comprehensive support for a variety of character encodings both
     464<p id="idp278960"> The Xerces C++ parser is a widely-used standards-conformant
     465            XML parser produced as open-source software
     466             by the Apache Software Foundation.
     467            It features comprehensive support for a variety of character encodings both
    468468            commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple
    469469            XML vocabularies through the XML namespace mechanism, as well as complete
     
    473473            parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM
    474474            tree-based parsing interface. </p>
    475 <p id="idp289440">
    476            
    477            
    478              Xerces,
     475<p id="idp280224">
     476            Xerces,
    479477            like all traditional parsers, processes XML documents sequentially a byte-at-a-time from
    480478            the first to the last byte of input data. Each byte passes through several processing
     
    492490            expected that a comprehensive restructuring is required, involving all aspects of the
    493491            parser. </p>
    494 <div class="table-wrapper" id="idp292400">
     492<div class="table-wrapper" id="idp283344">
    495493<p class="title">Table I</p>
    496 <div class="caption"><p id="idm848128">Execution Time of Top 10 Xerces Functions</p></div>
     494<div class="caption"><p id="idm855552">Execution Time of Top 10 Xerces Functions</p></div>
    497495<table class="table">
    498496<colgroup span="1">
     
    549547</div>
    550548</div>
    551 <div class="section" id="idp330416">
     549<div class="section" id="idp321360">
    552550<h3 class="title" style="clear: both">The Parabix Framework</h3>
    553 <p id="idp331088"> The Parabix (parallel bit stream) framework is a transformative approach to XML
     551<p id="idp322032"> The Parabix (parallel bit stream) framework is a transformative approach to XML
    554552            parsing (and other forms of text processing.) The key idea is to exploit the
    555553            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
    556554            data from long blocks of input data by using one register bit per single input byte. To
    557             facilitate this, the input data is first transposed into a set of basis bit streams. In
     555            facilitate this, the input data is first transposed into a set of basis bit streams.
     556              For example Table II shows  the ASCII bytes for the string "<code class="code">b7&lt;A</code>" with
     557                the corresponding  8 basis bit streams, b<sub>0</sub> through  b<sub>7</sub> shown in Table III.
     558--&gt;
    558559           
    559560            Boolean-logic operations\footnote{∧, \√ and ¬ denote the
     
    576577             multiple
    577578            classes can share the classification cost. </p>
    578 <div class="table-wrapper" id="idp342656">
     579<div class="table-wrapper" id="idp334416">
    579580<p class="title">Table II</p>
    580 <div class="caption"><p id="idp343168">XML Source Data</p></div>
     581<div class="caption"><p id="idp334928">XML Source Data</p></div>
    581582<table class="table">
    582583<colgroup span="1">
     
    605606</table>
    606607</div>
    607 <div class="table-wrapper" id="idp358560">
     608<div class="table-wrapper" id="idp350320">
    608609<p class="title">Table III</p>
    609 <div class="caption"><p id="idp359072">8-bit ASCII Basis Bit Streams</p></div>
     610<div class="caption"><p id="idp350832">8-bit ASCII Basis Bit Streams</p></div>
    610611<table class="table">
    611612<colgroup span="1">
     
    674675</table>
    675676</div>
    676 <p id="idp399232"> Consider, for example, the XML source data stream shown in the first line of Table II.
     677<p id="idp390992"> Consider, for example, the XML source data stream shown in the first line of Table IV.
    677678The remaining lines of this figure show
    678679            several parallel bit streams that are computed in Parabix-style parsing, with each bit
     
    682683            brackets that represent tag openers in XML. The second and third streams show a
    683684            partition of the tag openers into start tag marks and end tag marks depending on the
    684             character immediately following the opener (i.e., <code class="code">"/"</code>) or
     685            character immediately following the opener (i.e., "<code class="code">/</code>") or
    685686            not. The remaining three lines show streams that can be computed in subsequent parsing
    686687            (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
    687688            marking the element names, attribute names and attribute values of tags. </p>
    688 <p id="idp401088"> Two intuitions may help explain how the Parabix approach can lead to improved XML
     689<div class="table-wrapper" id="idp393904">
     690<p class="title">Table IV</p>
     691<div class="caption"><p id="idp394416">XML Source Data and Derived Parallel Bit Streams</p></div>
     692<table class="table">
     693<colgroup span="1">
     694<col align="centre" valign="top" span="1">
     695<col align="left" valign="top" span="1">
     696</colgroup>
     697<tbody>
     698<tr>
     699<td> Source Data </td>
     700<td> <code class="code"> &lt;document&gt;fee&lt;element a1='fie' a2 = 'foe'&gt;&lt;/element&gt;fum&lt;/document&gt; </code>
     701</td>
     702</tr>
     703<tr>
     704<td> Tag Openers </td>
     705<td> <code class="code">1____________1____________________________1____________1__________</code>
     706</td>
     707</tr>
     708<tr>
     709<td> Start Tag Marks </td>
     710<td> <code class="code">_1____________1___________________________________________________</code>
     711</td>
     712</tr>
     713<tr>
     714<td> End Tag Marks </td>
     715<td> <code class="code">___________________________________________1____________1_________</code>
     716</td>
     717</tr>
     718<tr>
     719<td> Empty Tag Marks </td>
     720<td> <code class="code">__________________________________________________________________</code>
     721</td>
     722</tr>
     723<tr>
     724<td> Element Names </td>
     725<td> <code class="code">_11111111_____1111111_____________________________________________</code>
     726</td>
     727</tr>
     728<tr>
     729<td> Attribute Names </td>
     730<td> <code class="code">______________________11_______11_________________________________</code>
     731</td>
     732</tr>
     733<tr>
     734<td> Attribute Values </td>
     735<td> <code class="code">__________________________111________111__________________________</code>
     736</td>
     737</tr>
     738</tbody>
     739</table>
     740</div>
     741<p id="idp406864"> Two intuitions may help explain how the Parabix approach can lead to improved XML
    689742            parsing performance. The first is that the use of the full register width offers a
    690743            considerable information advantage over sequential byte-at-a-time parsing. That is,
     
    695748            individual decision-bits, an approach that computes many of them in parallel (e.g., 128
    696749            bytes at a time using 128-bit registers) should provide substantial benefit. </p>
    697 <p id="idp402336"> Previous studies have shown that the Parabix approach improves many aspects of XML
     750<p id="idp408112"> Previous studies have shown that the Parabix approach improves many aspects of XML
    698751            processing, including transcoding \cite{Cameron2008}, character classification and
    699752            validation, tag parsing and well-formedness checking. The first Parabix parser used
     
    704757            \cite{HPCA2012}. Although these research prototypes handled the full syntax of
    705758            schema-less XML documents, they lacked the functionality required by full XML parsers. </p>
    706 <p id="idp403600"> Commercial XML processors support transcoding of multiple character sets and can
     759<p id="idp410240"> Commercial XML processors support transcoding of multiple character sets and can
    707760            parse and validate against multiple document vocabularies. Additionally, they provide
    708761            API facilities beyond those found in research prototypes, including the widely used SAX,
    709762            SAX2 and DOM interfaces. </p>
    710763</div>
    711 <div class="section" id="idp404448">
     764<div class="section" id="idp411088">
    712765<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    713 <p id="idp405088"> Xerces—like all traditional XML parsers—processes XML documents
     766<p id="idp411728"> Xerces—like all traditional XML parsers—processes XML documents
    714767            sequentially. Each character is examined to distinguish between the XML-specific markup,
    715768            such as a left angle bracket <code class="code">&lt;</code>, and the content held within the
    716769            document. As the parser progresses through the document, it alternates between markup
    717770            scanning, validation and content processing modes. </p>
    718 <p id="idp406624"> In other words, Xerces belongs to an equivalent class applications termed FSM
     771<p id="idp413264"> In other words, Xerces belongs to an equivalent class applications termed FSM
    719772            applications\footnote{ Herein FSM applications are considered software systems whose
    720773            behaviour is defined by the inputs, current state and the events associated with
     
    722775            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
    723776            character could induce a state transition. </p>
    724 <p id="idp407536"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
     777<p id="idp414176"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
    725778            text is transformed into a set of lexical bitstreams, which undergo a series of
    726779            operations that can be grouped into logical layers, e.g., transposition, character
     
    731784</div>
    732785</div>
    733 <div class="section" id="idp408832">
     786<div class="section" id="idp415472">
    734787<h2 class="title" style="clear: both">Architecture</h2>
    735 <div class="section" id="idp409472">
     788<div class="section" id="idp416112">
    736789<h3 class="title" style="clear: both">Overview</h3>
    737 <p id="idp410368"> icXML is more than an optimized version of Xerces. Many components were grouped,
     790<p id="idp417008"> icXML is more than an optimized version of Xerces. Many components were grouped,
    738791            restructured and rearchitected with pipeline parallelism in mind. In this section, we
    739792            highlight the core differences between the two systems. As shown in Figure
     
    761814<p class="title">Figure 1: Xerces Architecture</p>
    762815<div class="figure-contents">
    763 <div class="mediaobject" id="idp417008"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
     816<div class="mediaobject" id="idp423824"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
    764817<div class="caption"></div>
    765818</div>
    766819</div>
    767 <p id="idp419264"> In icXML functions are grouped into logical components. As shown in Figure
     820<p id="idp426144"> In icXML functions are grouped into logical components. As shown in Figure
    768821            \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the
    769822            Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which
     
    784837            described in Section \ref{section:arch:errorhandling}. From here, two data-independent
    785838            branches exist: the Symbol Resolver and Content Preparation Unit. </p>
    786 <p id="idp423344"> A typical XML file contains few unique element and attribute names—but
     839<p id="idp430368"> A typical XML file contains few unique element and attribute names—but
    787840            each of them will occur frequently. icXML stores these as distinct data structures,
    788841            called symbols, each with their own global identifier (GID). Using the symbol marker
     
    790843               Resolver</span> scans through the raw data to produce a sequence of GIDs, called
    791844            the <span class="ital">symbol stream</span>. </p>
    792 <p id="idp425936"> The final components of the Parabix Subsystem are the <span class="ital">Content
     845<p id="idp432960"> The final components of the Parabix Subsystem are the <span class="ital">Content
    793846               Preparation Unit</span> and <span class="ital">Content Stream
    794847            Generator</span>. The former takes the (transposed) basis bitstreams and selectively
    795848            filters them, according to the information provided by the Parallel Markup Parser, and
    796849            the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p>
    797 <p id="idp428848"> Combined, the symbol and content stream form icXML's compressed IR of the XML
     850<p id="idp435872"> Combined, the symbol and content stream form icXML's compressed IR of the XML
    798851            document. The <span class="ital">Markup Processor</span>~parses the IR to
    799852            validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that
     
    807860<p class="title">Figure 2: icXML Architecture</p>
    808861<div class="figure-contents">
    809 <div class="mediaobject" id="idp434672"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
     862<div class="mediaobject" id="idp441696"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
    810863<div class="caption"></div>
    811864</div>
    812865</div>
    813866</div>
    814 <div class="section" id="idp437120">
     867<div class="section" id="idp444144">
    815868<h3 class="title" style="clear: both">Character Set Adapters</h3>
    816 <p id="idp437792"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
     869<p id="idp444816"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
    817870            Xerces itself and provide the end-consumer with a single encoding format. In the
    818871            important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant,
     
    821874            other cases, transcoding may involve table look-up operations for each byte of input. In
    822875            any case, transcoding imposes at least a cost of buffer copying. </p>
    823 <p id="idp438848"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
     876<p id="idp445872"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
    824877            transcoding costs. Given a specified input encoding, a CSA is responsible for checking
    825878            that input code units represent valid characters, mapping the characters of the encoding
     
    827880            item streams), as well as supporting ultimate transcoding requirements. All of this work
    828881            is performed using the parallel bitstream representation of the source input. </p>
    829 <p id="idp439824"> An important observation is that many character sets are an extension to the legacy
     882<p id="idp446848"> An important observation is that many character sets are an extension to the legacy
    830883            7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8,
    831884            UTF-16 and many others. Furthermore, all significant characters for parsing XML are
    832885            confined to the ASCII repertoire. Thus, a single common set of lexical item calculations
    833886            serves to compute lexical item streams for all such ASCII-based character sets. </p>
    834 <p id="idp440704"> A second observation is that—regardless of which character set is
     887<p id="idp447728"> A second observation is that—regardless of which character set is
    835888            used—quite often all of the characters in a particular block of input will be
    836889            within the ASCII range. This is a very simple test to perform using the bitstream
     
    839892            be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    840893            UTF-16 form are each set to zero in this case. </p>
    841 <p id="idp442624"> A third observation is that repeated transcoding of the names of XML elements,
     894<p id="idp449376"> A third observation is that repeated transcoding of the names of XML elements,
    842895            attributes and so on can be avoided by using a look-up mechanism. That is, the first
    843896            occurrence of each symbol is stored in a look-up table mapping the input encoding to a
     
    846899            symbol look up is required to apply various XML validation rules, there is achieves the
    847900            effect of transcoding each occurrence without additional cost. </p>
    848 <p id="idp443680"> The cost of individual character transcoding is avoided whenever a block of input is
     901<p id="idp450432"> The cost of individual character transcoding is avoided whenever a block of input is
    849902            confined to the ASCII subset and for all but the first occurrence of any XML element or
    850903            attribute name. Furthermore, when transcoding is required, the parallel bitstream
     
    857910            using bit scan operations. </p>
    858911</div>
    859 <div class="section" id="idp446016">
     912<div class="section" id="idp451856">
    860913<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    861 <p id="idp446704"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
     914<p id="idp452544"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
    862915            bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two
    863916            Chinese characters <code class="code">䜠奜</code> are represented as two
     
    873926            may then be completed by applying parallel deletion and inverse transposition of the
    874927            UTF-16 bitstreams\cite{Cameron2008}. </p>
    875 <div class="table-wrapper" id="idp450864">
    876 <p class="title">Table IV</p>
    877 <div class="caption"><p id="idp451376">XML Source Data and Derived Parallel Bit Streams</p></div>
    878 <table class="table">
    879 <colgroup span="1">
    880 <col align="centre" valign="top" span="1">
    881 <col align="left" valign="top" span="1">
    882 </colgroup>
    883 <tbody>
    884 <tr>
    885 <td> Source Data </td>
    886 <td> <code class="code"> &lt;document&gt;fee&lt;element a1='fie' a2 = 'foe'&gt;&lt;/element&gt;fum&lt;/document&gt; </code>
    887 </td>
    888 </tr>
    889 <tr>
    890 <td> Tag Openers </td>
    891 <td> <code class="code">1____________1____________________________1____________1__________</code>
    892 </td>
    893 </tr>
    894 <tr>
    895 <td> Start Tag Marks </td>
    896 <td> <code class="code">_1____________1___________________________________________________</code>
    897 </td>
    898 </tr>
    899 <tr>
    900 <td> End Tag Marks </td>
    901 <td> <code class="code">___________________________________________1____________1_________</code>
    902 </td>
    903 </tr>
    904 <tr>
    905 <td> Empty Tag Marks </td>
    906 <td> <code class="code">__________________________________________________________________</code>
    907 </td>
    908 </tr>
    909 <tr>
    910 <td> Element Names </td>
    911 <td> <code class="code">_11111111_____1111111_____________________________________________</code>
    912 </td>
    913 </tr>
    914 <tr>
    915 <td> Attribute Names </td>
    916 <td> <code class="code">______________________11_______11_________________________________</code>
    917 </td>
    918 </tr>
    919 <tr>
    920 <td> Attribute Values </td>
    921 <td> <code class="code">__________________________111________111__________________________</code>
    922 </td>
    923 </tr>
    924 </tbody>
    925 </table>
    926 </div>
    927 <p id="idp464464"> Rather than immediately paying the costs of deletion and transposition just for
     928<p id="idp456704"> Rather than immediately paying the costs of deletion and transposition just for
    928929            transcoding, however, icXML defers these steps so that the deletion masks for several
    929930            stages of processing may be combined. In particular, this includes core XML requirements
     
    936937            after the marked CR as shown by the Pablo source code in Figure
    937938            \ref{fig:LBnormalization}.
    938            
     939              <div class="figure" id="idp458944">
     940<p class="title">Figure 3</p>
     941<div class="figure-contents">
     942<div class="caption">Line Break Normalization Logic</div>
     943<pre class="programlisting" id="idp459632">
     944# XML 1.0 line-break normalization rules.
     945if lex.CR:
     946# Modify CR (#x0D) to LF (#x0A)
     947  u16lo.bit_5 ^= lex.CR
     948  u16lo.bit_6 ^= lex.CR
     949  u16lo.bit_7 ^= lex.CR
     950  CRLF = pablo.Advance(lex.CR) &amp; lex.LF
     951  callouts.delmask |= CRLF
     952# Adjust LF streams for line/column tracker
     953  lex.LF |= lex.CR
     954  lex.LF ^= CRLF
     955</pre>
     956</div>
     957</div>
    939958         </p>
    940 <p id="idp466976"> In essence, the deletion masks for transcoding and for line break normalization each
     959<p id="idp461008"> In essence, the deletion masks for transcoding and for line break normalization each
    941960            represent a bitwise filter; these filters can be combined using bitwise-or so that the
    942961            parallel deletion algorithm need only be applied once. </p>
    943 <p id="idp468288"> A further application of combined filtering is the processing of XML character and
     962<p id="idp461664"> A further application of combined filtering is the processing of XML character and
    944963            entity references. Consider, for example, the references <code class="code">&amp;</code> or
    945964               <code class="code">&lt;</code>. which must be replaced in XML processing with the single
     
    954973            UTF-16 code unit. In the case, that this is not true, it is addressed in
    955974            post-processing. </p>
    956 <p id="idp473168"> The final step of combined filtering occurs during the process of reducing markup
     975<p id="idp466480"> The final step of combined filtering occurs during the process of reducing markup
    957976            data to tag bytes preceding each significant XML transition as described in
    958977            section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying
     
    964983            Haswell architecture. </p>
    965984</div>
    966 <div class="section" id="idp474496">
     985<div class="section" id="idp467808">
    967986<h3 class="title" style="clear: both">Content Stream</h3>
    968 <p id="idp475168"> A relatively-unique concept for icXML is the use of a filtered content stream.
     987<p id="idp468480"> A relatively-unique concept for icXML is the use of a filtered content stream.
    969988            Rather that parsing an XML document in its original format, the input is transformed
    970989            into one that is easier for the parser to iterate through and produce the sequential
     
    974993           
    975994            through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p>
    976 <p id="idp477568"> Combined with the symbol stream, the parser traverses the content stream to
     995<p id="idp470992"> Combined with the symbol stream, the parser traverses the content stream to
    977996            effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following
    978997               <code class="code">&gt;</code> indicates that a start tag without any attributes is the first
     
    9861005            null character in the content stream in parallel, which in turn means the parser can
    9871006            directly jump to the end of every string without scanning for it. </p>
    988 <p id="idp480960"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
     1007<p id="idp474384"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
    9891008            existence of an attribute. Because all of the intra-element was performed in the Parabix
    9901009            Subsystem, this must be a legal attribute. Since attributes can only occur within start
     
    10001019            that the appropriate scope-nesting rules have been applied. </p>
    10011020</div>
    1002 <div class="section" id="idp485104">
     1021<div class="section" id="idp478528">
    10031022<h3 class="title" style="clear: both">Namespace Handling</h3>
    1004 <p id="idp486192"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
     1023<p id="idp479616"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
    10051024            together. It is especially important when a vocabulary application-dependant meaning,
    10061025            such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound
     
    10191038            uniquely-named items because the current vocabulary is determined by the namespace(s)
    10201039            that are in-scope. </p>
    1021 <div class="table-wrapper" id="idp493312">
     1040<div class="table-wrapper" id="idp486784">
    10221041<p class="title">Table V</p>
    1023 <div class="caption"><p id="idp493824">XML Namespace Example</p></div>
     1042<div class="caption"><p id="idp487296">XML Namespace Example</p></div>
    10241043<table class="table">
    10251044<colgroup span="1">
     
    10551074</table>
    10561075</div>
    1057 <p id="idp502800"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
     1076<p id="idp496272"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
    10581077            persist for the lifetime of the application through the use of a global URI pool. Xerces
    10591078            maintains a stack of namespace scopes that is pushed (popped) every time a start tag
     
    10631082            those that declare a set of namespaces upfront and never change them, and (2) those that
    10641083            repeatedly modify the namespaces in predictable patterns. </p>
    1065 <p id="idp503936"> For that reason, icXML contains an independent namespace stack and utilizes bit
     1084<p id="idp497408"> For that reason, icXML contains an independent namespace stack and utilizes bit
    10661085            vectors to cheaply perform
    10671086             When a prefix is
     
    10771096            found using a bit-scan intrinsic. A namespace binding table, similar to Table
    10781097            \ref{tbl:namespace1}, provides the actual URI ID. </p>
    1079 <div class="table-wrapper" id="idp508352">
     1098<div class="table-wrapper" id="idp501872">
    10801099<p class="title">Table VI</p>
    1081 <div class="caption"><p id="idp508864">Namespace Binding Table Example</p></div>
     1100<div class="caption"><p id="idp502384">Namespace Binding Table Example</p></div>
    10821101<table class="table">
    10831102<colgroup span="1">
     
    11201139</table>
    11211140</div>
    1122 <p id="idp524960">
     1141<p id="idp518624">
    11231142           
    11241143           
     
    11261145           
    11271146         </p>
    1128 <p id="idp526864"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
     1147<p id="idp520528"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
    11291148            any modification to the currently visible namespaces is calculated and stored within a
    11301149            stack of bit vectors denoting the locally modified namespace bindings. When an end tag
     
    11351154         </p>
    11361155</div>
    1137 <div class="section" id="idp528320">
     1156<div class="section" id="idp521984">
    11381157<h3 class="title" style="clear: both">Error Handling</h3>
    1139 <p id="idp528992">
     1158<p id="idp522656">
    11401159           
    11411160            Xerces outputs error messages in two ways: through the programmer API and as thrown
     
    11461165            \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and
    11471166            Markup Processor, each with its own system for detecting and producing error messages. </p>
    1148 <p id="idp530624"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
     1167<p id="idp524288"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
    11491168            a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking
    11501169            the byte-position of an error within a block, and the type of error is determined by the
     
    11791198            detected, the sum of those skipped positions is subtracted from the distance to
    11801199            determine the actual column number. </p>
    1181 <p id="idp536112"> The Markup Processor is a state-driven machine. As such, error detection within it
     1200<p id="idp529776"> The Markup Processor is a state-driven machine. As such, error detection within it
    11821201            is very similar to Xerces. However, reporting the correct line/column is a much more
    11831202            difficult problem. The Markup Processor parses the content stream, which is a series of
     
    11931212</div>
    11941213</div>
    1195 <div class="section" id="idp538544">
     1214<div class="section" id="idp532208">
    11961215<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    1197 <p id="idp539248"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
     1216<p id="idp532912"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
    11981217         application. These are "embarrassingly
    11991218         sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to
     
    12031222         well into the general model of pipeline parallelism, in which each thread is in charge of a
    12041223         single module or group of modules. </p>
    1205 <p id="idp541104"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
     1224<p id="idp534768"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
    12061225         and the Markup Processor into distinct logical layers into two separate stages. The
    12071226         resultant application, <span class="ital">icXML-p</span>, is a course-grained
     
    12241243            <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can
    12251244         reuse the memory space. </p>
    1226 <p id="idp550224">
     1245<p id="idp544496">
    12271246        <div class="figure" id="threads_timeline1">
    1228 <p class="title">Figure 3: Thread Balance in Two-Stage Pipelines</p>
     1247<p class="title">Figure 4: Thread Balance in Two-Stage Pipelines</p>
    12291248<div class="figure-contents">
    1230 <div class="mediaobject" id="idp551616"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>
    1231 <div class="mediaobject" id="idp553392"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>
     1249<div class="mediaobject" id="idp545936"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>
     1250<div class="mediaobject" id="idp547712"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>
    12321251<div class="caption"></div>
    12331252</div>
    12341253</div>
    12351254      </p>
    1236 <p id="idp555840"> Overall, our design is intended to benefit a range of applications. Conceptually, we
     1255<p id="idp550160"> Overall, our design is intended to benefit a range of applications. Conceptually, we
    12371256         consider two design points. The first, the parsing performed by the Parabix Subsystem
    12381257         dominates at 67% of the overall cost, with the cost of application processing (including
     
    12401259         scenario, the cost of application processing dominates at 60%, while the cost of XML
    12411260         parsing represents an overhead of 40%. </p>
    1242 <p id="idp556752"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
     1261<p id="idp551072"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
    12431262         100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement
    12441263         of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the
     
    12481267         about 33% of the original work. In this case, Amdahl's law predicts that we could expect up
    12491268         to a 3x speedup at best. </p>
    1250 <p id="idp557872"> At the other extreme of our design range, we consider an application in which core
     1269<p id="idp552192"> At the other extreme of our design range, we consider an application in which core
    12511270         parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the
    12521271         corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most
     
    12541273         the entire latency of parsing within the serial time required by the application. In this
    12551274         case, we achieve an overall speedup in processing time by 1.67x. </p>
    1256 <p id="idp558816"> Although the structure of the Parabix Subsystem allows division of the work into
     1275<p id="idp553136"> Although the structure of the Parabix Subsystem allows division of the work into
    12571276         several pipeline stages and has been demonstrated to be effective for four pipeline stages
    12581277         in a research prototype \cite{HPCA2012}, our analysis here suggests that the further
     
    12621281         the cost of application logic that could match reductions in core parsing cost. </p>
    12631282</div>
    1264 <div class="section" id="idp560656">
     1283<div class="section" id="idp554320">
    12651284<h2 class="title" style="clear: both">Performance</h2>
    1266 <p id="idp561328"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
     1285<p id="idp554992"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
    12671286         Xerces C++ SAXCount sample application, and a real world GML to SVG transformation
    12681287         application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy
     
    12701289         L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu
    12711290         12.04 (Linux). </p>
    1272 <p id="idp562240"> We analyzed the execution profiles of each XML parser using the performance counters
     1291<p id="idp555904"> We analyzed the execution profiles of each XML parser using the performance counters
    12731292         found in the processor. We chose several key hardware events that provide insight into the
    12741293         profile of each application and indicate if the processor is doing useful work. The set of
     
    12781297         collection of hardware performance monitoring statistics. In addition, we used the Linux
    12791298         perf \cite{perf} utility to collect per core hardware events. </p>
    1280 <div class="section" id="idp563376">
     1299<div class="section" id="idp557040">
    12811300<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    1282 <p id="idp564048"> Xerces comes with sample applications that demonstrate salient features of the
     1301<p id="idp557712"> Xerces comes with sample applications that demonstrate salient features of the
    12831302            parser. SAXCount is the simplest such application: it counts the elements, attributes
    12841303            and characters of a given XML file using the (event based) SAX API and prints out the
    12851304            totals. </p>
    1286 <p id="idp564752"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
     1305<p id="idp558416"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
    12871306            selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented
    12881307            XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the
    12891308            UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML
    12901309            documents and consist entirely of single byte encoded ASCII characters.
    1291   <div class="table-wrapper" id="idp565488">
     1310  <div class="table-wrapper" id="idp559152">
    12921311<p class="title">Table VII</p>
    1293 <div class="caption"><p id="idp566000">XML Document Characteristics</p></div>
     1312<div class="caption"><p id="idp559664">XML Document Characteristics</p></div>
    12941313<table class="table">
    12951314<colgroup span="1">
     
    13401359</div>           
    13411360</p>
    1342 <p id="idp581584"> A key predictor of the overall parsing performance of an XML file is markup
     1361<p id="idp575296"> A key predictor of the overall parsing performance of an XML file is markup
    13431362            density\footnote{ Markup Density: the ratio of markup bytes used to define the structure
    13441363            of the document vs. its file size.}. This metric has substantial influence on the
     
    13471366            of document-oriented and data-oriented XML files to analyze performance over a spectrum
    13481367            of markup densities. </p>
    1349 <p id="idp583200"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
     1368<p id="idp576304"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
    13501369            in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over
    13511370            Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve
     
    13541373            icXML-p performs better as markup-density increases because the work performed by each
    13551374            stage is well balanced in this application. </p>
    1356 <p id="idp584240">
     1375<p id="idp577344">
    13571376        <div class="figure" id="perf_SAX">
    1358 <p class="title">Figure 4: SAXCount Performance Comparison</p>
     1377<p class="title">Figure 5: SAXCount Performance Comparison</p>
    13591378<div class="figure-contents">
    1360 <div class="mediaobject" id="idp585632"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
     1379<div class="mediaobject" id="idp578736"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
    13611380<div class="caption"></div>
    13621381</div>
     
    13641383         </p>
    13651384</div>
    1366 <div class="section" id="idp588176">
     1385<div class="section" id="idp581280">
    13671386<h3 class="title" style="clear: both">GML2SVG</h3>
    1368 <p id="idp588848">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
     1387<p id="idp581952">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
    13691388was chosen.   This application transforms geospatially encoded data represented using
    13701389an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography}
     
    13781397a known XML format for the purpose of analysis and restructuring to meet
    13791398the requirements of an alternative format.</p>
    1380 <p id="idp590224">Our GML to SVG data translations are executed on GML source data
     1399<p id="idp583280">Our GML to SVG data translations are executed on GML source data
    13811400modelling the city of Vancouver, British Columbia, Canada.
    13821401The GML source document set
     
    13861405213.4 MB of source GML data generates 91.9 MB of target SVG data.</p>
    13871406<div class="figure" id="perf_GML2SVG">
    1388 <p class="title">Figure 5: Performance Comparison for GML2SVG</p>
     1407<p class="title">Figure 6: Performance Comparison for GML2SVG</p>
    13891408<div class="figure-contents">
    1390 <div class="mediaobject" id="idp592304"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
     1409<div class="mediaobject" id="idp585312"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
    13911410<div class="caption"></div>
    13921411</div>
    13931412</div>
    1394 <p id="idp594592">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
     1413<p id="idp587600">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
    13951414the Xerces, icXML and icXML-p.   
    13961415On the GML workload with this application, single-thread icXML
     
    13991418Using icXML-p, a further throughput increase to 111 MB/sec was recorded,
    14001419approximately a 2X speedup.</p>
    1401 <p id="idp595408">An important aspect of icXML is the replacement of much branch-laden
     1420<p id="idp588416">An important aspect of icXML is the replacement of much branch-laden
    14021421sequential code inside Xerces with straight-line SIMD code using far
    14031422fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
     
    14081427less overloaded and able to increase the successful branch prediction rate.</p>
    14091428<div class="figure" id="branchmiss_GML2SVG">
    1410 <p class="title">Figure 6: Comparative Branch Misprediction Rate</p>
     1429<p class="title">Figure 7: Comparative Branch Misprediction Rate</p>
    14111430<div class="figure-contents">
    1412 <div class="mediaobject" id="idp597520"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
     1431<div class="mediaobject" id="idp590528"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
    14131432<div class="caption"></div>
    14141433</div>
    14151434</div>
    1416 <p id="idp599808">The behaviour of the three versions with respect to L1 cache misses per kB is shown
     1435<p id="idp592816">The behaviour of the three versions with respect to L1 cache misses per kB is shown
    14171436in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
    14181437and data-cache performance with the improvements in instruction-cache
     
    14241443significant benefit.</p>
    14251444<div class="figure" id="cachemiss_GML2SVG">
    1426 <p class="title">Figure 7: Comparative Cache Miss Rate</p>
     1445<p class="title">Figure 8: Comparative Cache Miss Rate</p>
    14271446<div class="figure-contents">
    1428 <div class="mediaobject" id="idp601936"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
     1447<div class="mediaobject" id="idp594944"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
    14291448<div class="caption"></div>
    14301449</div>
    14311450</div>
    1432 <p id="idp604224">One caveat with this study is that the GML2SVG application did not exhibit
     1451<p id="idp597232">One caveat with this study is that the GML2SVG application did not exhibit
    14331452a relative balance of processing between application code and Xerces library
    14341453code reaching the 33% figure.  This suggests that for this application and
     
    14381457</div>
    14391458</div>
    1440 <div class="section" id="idp605296">
     1459<div class="section" id="idp598304">
    14411460<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    1442 <p id="idp605984"> This paper is the first case study documenting the significant performance benefits
     1461<p id="idp598992"> This paper is the first case study documenting the significant performance benefits
    14431462         that may be realized through the integration of parallel bitstream technology into existing
    14441463         widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined
     
    14501469         technologies, this is an important case study demonstrating the general feasibility of
    14511470         these techniques. </p>
    1452 <p id="idp607264"> The further development of icXML to move beyond 2-stage pipeline parallelism is
     1471<p id="idp600272"> The further development of icXML to move beyond 2-stage pipeline parallelism is
    14531472         ongoing, with realistic prospects for four reasonably balanced stages within the library.
    14541473         For applications such as GML2SVG which are dominated by time spent on XML parsing, such a
    14551474         multistage pipelined parsing library should offer substantial benefits. </p>
    1456 <p id="idp608032"> The example of XML parsing may be considered prototypical of finite-state machines
     1475<p id="idp601040"> The example of XML parsing may be considered prototypical of finite-state machines
    14571476         applications which have sometimes been considered "embarassingly
    14581477         sequential" and so difficult to parallelize that "nothing
     
    14601479         point in making the case that parallelization can indeed be helpful across a broad array of
    14611480         application types. </p>
    1462 <p id="idp609408"> To overcome the software engineering challenges in applying parallel bitstream
     1481<p id="idp602416"> To overcome the software engineering challenges in applying parallel bitstream
    14631482         technology to existing software systems, it is clear that better library and tool support
    14641483         is needed. The techniques used in the implementation of icXML and documented in this paper
     
    14671486      </p>
    14681487</div>
    1469 <div class="bibliography" id="idp611376">
     1488<div class="bibliography" id="idp603904">
    14701489<h2 class="title" style="clear:both">Bibliography</h2>
    14711490<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3052 r3053  
    127127      <!--
    128128      <legalnotice>
    129          <para>Copyright &#x000A9; 2009 Robert D. Cameron, Kenneth S. Herdy and Ehsan Amiri.
     129         <para>Copyright &#x000A9; 2013 Nigel Medforth, Dan Lin, Kenneth S. Herdy, Robert D. Cameron  and Arrvindh Shriraman.
    130130            This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative
    131131            Works 2.5 Canada License.</para>
     
    149149      <section>
    150150         <title>Xerces C++ Structure</title>
    151          <para> The Xerces C++ parser <!-- is a widely-used standards-conformant -->
    152             <!-- XML parser produced as open-source software -->
    153             <!-- by the Apache Software Foundation. -->
    154             <!-- It --> features comprehensive support for a variety of character encodings both
     151         <para> The Xerces C++ parser is a widely-used standards-conformant
     152            XML parser produced as open-source software
     153             by the Apache Software Foundation.
     154            It features comprehensive support for a variety of character encodings both
    155155            commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple
    156156            XML vocabularies through the XML namespace mechanism, as well as complete
     
    161161            tree-based parsing interface. </para>
    162162         <para>
    163             <!--What is the story behind the xerces-profile picture? should it contain one single file or several from our test suite?-->
    164             <!--Our test suite does not have any grammars in it; ergo, processing those files will give a poor indication of the cost of using grammars-->
    165             <!--Should we show a val-grind summary of a few files in a linechart form?--> Xerces,
     163            Xerces,
    166164            like all traditional parsers, processes XML documents sequentially a byte-at-a-time from
    167165            the first to the last byte of input data. Each byte passes through several processing
     
    208206            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
    209207            data from long blocks of input data by using one register bit per single input byte. To
    210             facilitate this, the input data is first transposed into a set of basis bit streams. In <!--FIGURE REF Figure~\ref{fig:BitStreamsExample}, the ASCII string ``{\ttfamily b7\verb|<|A}''
    211 is represented as 8 basis bit streams, $\tt b<subscript>{0 \ldots 7}$.
     208            facilitate this, the input data is first transposed into a set of basis bit streams.
     209              For example Table II shows  the ASCII bytes for the string "<code>b7&lt;A</code>" with
     210                the corresponding  8 basis bit streams, b<subscript>0</subscript> through  b<subscript>7</subscript> shown in Table III.
    212211-->
    213212            <!-- The bits used to construct $\tt <subscript>7</subscript>$ have been highlighted in this example. -->
     
    279278         <!-- process, intra-element well-formedness validation is performed on each block -->
    280279         <!-- of text. -->
    281          <para> Consider, for example, the XML source data stream shown in the first line of Table II.
     280         <para> Consider, for example, the XML source data stream shown in the first line of Table IV.
    282281The remaining lines of this figure show
    283282            several parallel bit streams that are computed in Parabix-style parsing, with each bit
     
    287286            brackets that represent tag openers in XML. The second and third streams show a
    288287            partition of the tag openers into start tag marks and end tag marks depending on the
    289             character immediately following the opener (i.e., <code>&quot;/&quot;</code>) or
     288            character immediately following the opener (i.e., &quot;<code>/</code>&quot;) or
    290289            not. The remaining three lines show streams that can be computed in subsequent parsing
    291290            (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
    292291            marking the element names, attribute names and attribute values of tags. </para>
     292            <table>
     293                  <caption>
     294                     <para>XML Source Data and Derived Parallel Bit Streams</para>
     295                  </caption>
     296                  <colgroup>
     297                     <col align="centre" valign="top"/>
     298                     <col align="left" valign="top"/>
     299                  </colgroup>
     300                  <tbody>
     301          <tr><td> Source Data </td><td> <code> <![CDATA[<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>]]> </code></td></tr>
     302          <tr><td> Tag Openers </td><td> <code>1____________1____________________________1____________1__________</code></td></tr>
     303           <tr><td> Start Tag Marks </td><td> <code>_1____________1___________________________________________________</code></td></tr>
     304           <tr><td> End Tag Marks </td><td> <code>___________________________________________1____________1_________</code></td></tr>
     305           <tr><td> Empty Tag Marks </td><td> <code>__________________________________________________________________</code></td></tr>
     306           <tr><td> Element Names </td><td> <code>_11111111_____1111111_____________________________________________</code></td></tr>
     307           <tr><td> Attribute Names </td><td> <code>______________________11_______11_________________________________</code></td></tr>
     308           <tr><td> Attribute Values </td><td> <code>__________________________111________111__________________________</code></td></tr>
     309                  </tbody>
     310               </table>         
     311
    293312         <para> Two intuitions may help explain how the Parabix approach can lead to improved XML
    294313            parsing performance. The first is that the use of the full register width offers a
     
    491510            may then be completed by applying parallel deletion and inverse transposition of the
    492511            UTF-16 bitstreams\cite{Cameron2008}. </para>
    493 <table>
    494                   <caption>
    495                      <para>XML Source Data and Derived Parallel Bit Streams</para>
    496                   </caption>
    497                   <colgroup>
    498                      <col align="centre" valign="top"/>
    499                      <col align="left" valign="top"/>
    500                   </colgroup>
    501                   <tbody>
    502           <tr><td> Source Data </td><td> <code> <![CDATA[<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>]]> </code></td></tr>
    503           <tr><td> Tag Openers </td><td> <code>1____________1____________________________1____________1__________</code></td></tr>
    504            <tr><td> Start Tag Marks </td><td> <code>_1____________1___________________________________________________</code></td></tr>
    505            <tr><td> End Tag Marks </td><td> <code>___________________________________________1____________1_________</code></td></tr>
    506            <tr><td> Empty Tag Marks </td><td> <code>__________________________________________________________________</code></td></tr>
    507            <tr><td> Element Names </td><td> <code>_11111111_____1111111_____________________________________________</code></td></tr>
    508            <tr><td> Attribute Names </td><td> <code>______________________11_______11_________________________________</code></td></tr>
    509            <tr><td> Attribute Values </td><td> <code>__________________________111________111__________________________</code></td></tr>
    510                   </tbody>
    511                </table>         
    512512         <para> Rather than immediately paying the costs of deletion and transposition just for
    513513            transcoding, however, icXML defers these steps so that the deletion masks for several
     
    521521            after the marked CR as shown by the Pablo source code in Figure
    522522            \ref{fig:LBnormalization}.
    523             <!-- FIGURE
    524 \begin{figure}
    525 \begin{verbatim}
     523              <figure>
     524                <caption>Line Break Normalization Logic</caption>
     525  <programlisting>
    526526# XML 1.0 line-break normalization rules.
    527527if lex.CR:
     
    530530  u16lo.bit_6 ^= lex.CR
    531531  u16lo.bit_7 ^= lex.CR
    532   CRLF = pablo.Advance(lex.CR) & lex.LF
     532  CRLF = pablo.Advance(lex.CR) &amp; lex.LF
    533533  callouts.delmask |= CRLF
    534534# Adjust LF streams for line/column tracker
    535535  lex.LF |= lex.CR
    536536  lex.LF ^= CRLF
    537 \end{verbatim}
    538 \caption{Line Break Normalization Logic}\label{fig:LBnormalization}
    539 \end{figure}
    540 -->
     537</programlisting>
     538</figure>
    541539         </para>
    542540         <para> In essence, the deletion masks for transcoding and for line break normalization each
Note: See TracChangeset for help on using the changeset viewer.