Changeset 3052 for docs


Ignore:
Timestamp:
Apr 19, 2013, 3:56:55 PM (6 years ago)
Author:
lindanl
Message:

changes to GML2SVG section

Location:
docs/Balisage13/Bal2013came0601
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3051 r3052  
    274274</div>
    275275<div id="mast"><div class="content">
    276 <h2 class="article-title" id="idp74624"></h2>
     276<h2 class="article-title" id="idp74240"></h2>
    277277<div class="author">
    278278<h3 class="author">Nigel Medforth</h3>
     
    324324</div>
    325325<div class="mast-box">
    326 <p class="title"><a href="javascript:toggle('idp75744')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75744"></a> <span onclick="javascript:toggle('idp75744');return true">Abstract</span></p>
    327 <div class="folder" id="folder-idp75744" style="display:none"><p id="idp76048">Prior research on the acceleration of XML processing using SIMD and multi-core
     326<p class="title"><a href="javascript:toggle('idp75360')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75360"></a> <span onclick="javascript:toggle('idp75360');return true">Abstract</span></p>
     327<div class="folder" id="folder-idp75360" style="display:none"><p id="idp75664">Prior research on the acceleration of XML processing using SIMD and multi-core
    328328            parallelism has lead to a number of interesting research prototypes. This work
    329329            investigates the extent to which the techniques underlying these prototypes could result
     
    339339<p><b>Table of Contents</b></p>
    340340<dl>
    341 <dt><span class="section"><a href="#idp284352" class="toc">Introduction</a></span></dt>
    342 <dt><span class="section"><a href="#idp286144" class="toc">Background</a></span></dt>
     341<dt><span class="section"><a href="#idp284144" class="toc">Introduction</a></span></dt>
     342<dt><span class="section"><a href="#idp285936" class="toc">Background</a></span></dt>
    343343<dd><dl>
    344 <dt><span class="section"><a href="#idp286784" class="toc">Xerces C++ Structure</a></span></dt>
    345 <dt><span class="section"><a href="#idp330512" class="toc">The Parabix Framework</a></span></dt>
    346 <dt><span class="section"><a href="#idp404544" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     344<dt><span class="section"><a href="#idp286576" class="toc">Xerces C++ Structure</a></span></dt>
     345<dt><span class="section"><a href="#idp330416" class="toc">The Parabix Framework</a></span></dt>
     346<dt><span class="section"><a href="#idp404448" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    347347</dl></dd>
    348 <dt><span class="section"><a href="#idp408928" class="toc">Architecture</a></span></dt>
     348<dt><span class="section"><a href="#idp408832" class="toc">Architecture</a></span></dt>
    349349<dd><dl>
    350 <dt><span class="section"><a href="#idp409568" class="toc">Overview</a></span></dt>
    351 <dt><span class="section"><a href="#idp437296" class="toc">Character Set Adapters</a></span></dt>
    352 <dt><span class="section"><a href="#idp446192" class="toc">Combined Parallel Filtering</a></span></dt>
    353 <dt><span class="section"><a href="#idp474608" class="toc">Content Stream</a></span></dt>
    354 <dt><span class="section"><a href="#idp485264" class="toc">Namespace Handling</a></span></dt>
    355 <dt><span class="section"><a href="#idp528864" class="toc">Error Handling</a></span></dt>
     350<dt><span class="section"><a href="#idp409472" class="toc">Overview</a></span></dt>
     351<dt><span class="section"><a href="#idp437120" class="toc">Character Set Adapters</a></span></dt>
     352<dt><span class="section"><a href="#idp446016" class="toc">Combined Parallel Filtering</a></span></dt>
     353<dt><span class="section"><a href="#idp474496" class="toc">Content Stream</a></span></dt>
     354<dt><span class="section"><a href="#idp485104" class="toc">Namespace Handling</a></span></dt>
     355<dt><span class="section"><a href="#idp528320" class="toc">Error Handling</a></span></dt>
    356356</dl></dd>
    357 <dt><span class="section"><a href="#idp539088" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
    358 <dt><span class="section"><a href="#idp562720" class="toc">Performance</a></span></dt>
     357<dt><span class="section"><a href="#idp538544" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
     358<dt><span class="section"><a href="#idp560656" class="toc">Performance</a></span></dt>
    359359<dd><dl>
    360 <dt><span class="section"><a href="#idp565440" class="toc">Xerces C++ SAXCount</a></span></dt>
    361 <dt><span class="section"><a href="#idp590288" class="toc">GML2SVG</a></span></dt>
     360<dt><span class="section"><a href="#idp563376" class="toc">Xerces C++ SAXCount</a></span></dt>
     361<dt><span class="section"><a href="#idp588176" class="toc">GML2SVG</a></span></dt>
    362362</dl></dd>
    363 <dt><span class="section"><a href="#idp607904" class="toc">Conclusion and Future Work</a></span></dt>
     363<dt><span class="section"><a href="#idp605296" class="toc">Conclusion and Future Work</a></span></dt>
    364364</dl>
    365365</div>
    366366<div class="mast-box">
    367 <p class="title"><a href="javascript:toggle('idp77472')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77472"></a> <span onclick="javascript:toggle('idp77472');return true">Nigel Medforth</span></p>
    368 <div class="folder" id="folder-idp77472" style="display:none">
     367<p class="title"><a href="javascript:toggle('idp77088')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77088"></a> <span onclick="javascript:toggle('idp77088');return true">Nigel Medforth</span></p>
     368<div class="folder" id="folder-idp77088" style="display:none">
    369369<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>&gt;</code></h5>
    370370<div class="affiliation">
     
    377377</div>
    378378<div class="personblurb">
    379 <p id="idp59168">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
     379<p id="idp58880">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
    380380               developer of icXML. He earned a Bachelor of Technology in Information Technology at
    381381               Kwantlen Polytechnic University in 2009 and was awarded the Dean’s Medal for
    382382               Outstanding Achievement.</p>
    383 <p id="idp60176">Nigel is currently researching ways to leverage both the Parabix framework and
     383<p id="idp59888">Nigel is currently researching ways to leverage both the Parabix framework and
    384384               stream-processing models to further accelerate XML parsing within icXML.</p>
    385385</div>
     
    387387</div>
    388388<div class="mast-box">
    389 <p class="title"><a href="javascript:toggle('idp63840')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63840"></a> <span onclick="javascript:toggle('idp63840');return true">Dan Lin</span></p>
    390 <div class="folder" id="folder-idp63840" style="display:none">
     389<p class="title"><a href="javascript:toggle('idp63552')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63552"></a> <span onclick="javascript:toggle('idp63552');return true">Dan Lin</span></p>
     390<div class="folder" id="folder-idp63552" style="display:none">
    391391<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>&gt;</code></h5>
    392392<div class="affiliation">
     
    394394<p class="orgname">Simon Fraser University </p>
    395395</div>
    396 <div class="personblurb"><p id="idp65552">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
     396<div class="personblurb"><p id="idp65264">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
    397397             in Computing Science at Simon Fraser University in 2010. Her research focus on on high
    398398             performance algorithms that exploit parallelization strategies on various multicore platforms.
     
    401401</div>
    402402<div class="mast-box">
    403 <p class="title"><a href="javascript:toggle('idp68112')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp68112"></a> <span onclick="javascript:toggle('idp68112');return true">Kenneth Herdy</span></p>
    404 <div class="folder" id="folder-idp68112" style="display:none">
     403<p class="title"><a href="javascript:toggle('idp67824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67824"></a> <span onclick="javascript:toggle('idp67824');return true">Kenneth Herdy</span></p>
     404<div class="folder" id="folder-idp67824" style="display:none">
    405405<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>&gt;</code></h5>
    406406<div class="affiliation">
     
    409409</div>
    410410<div class="personblurb">
    411 <p id="idp270560"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
     411<p id="idp69552"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
    412412               Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor
    413413               of Science in Computing Science with a Certificate in Spatial Information Systems at
    414414               Simon Fraser University in 2005. </p>
    415 <p id="idp271296"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
     415<p id="idp271088"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
    416416               University with industrial scholarship support from the Natural Sciences and
    417417               Engineering Research Council of Canada, the Mathematics of Information Technology and
     
    423423</div>
    424424<div class="mast-box">
    425 <p class="title"><a href="javascript:toggle('idp274032')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp274032"></a> <span onclick="javascript:toggle('idp274032');return true">Rob Cameron</span></p>
    426 <div class="folder" id="folder-idp274032" style="display:none">
     425<p class="title"><a href="javascript:toggle('idp273824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp273824"></a> <span onclick="javascript:toggle('idp273824');return true">Rob Cameron</span></p>
     426<div class="folder" id="folder-idp273824" style="display:none">
    427427<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
    428428<div class="affiliation">
     
    434434<p class="orgname">International Characters, Inc.</p>
    435435</div>
    436 <div class="personblurb"><p id="idp275696">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
     436<div class="personblurb"><p id="idp275488">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
    437437               Sciences at Simon Fraser University. His research interests include programming
    438438               language and software system technology, with a specific focus on high performance
     
    450450<div id="main">
    451451<div class="article">
    452 <h2 class="article-title" id="idp74624"></h2>
    453 <div class="section" id="idp284352">
     452<h2 class="article-title" id="idp74240"></h2>
     453<div class="section" id="idp284144">
    454454<h2 class="title" style="clear: both">Introduction</h2>
    455 <p id="idp284992"></p>
    456 <p id="idp285248"></p>
    457 <p id="idp285504"></p>
    458 <p id="idp285760"></p>
    459 </div>
    460 <div class="section" id="idp286144">
     455<p id="idp284784"></p>
     456<p id="idp285040"></p>
     457<p id="idp285296"></p>
     458<p id="idp285552"></p>
     459</div>
     460<div class="section" id="idp285936">
    461461<h2 class="title" style="clear: both">Background</h2>
    462 <div class="section" id="idp286784">
     462<div class="section" id="idp286576">
    463463<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    464 <p id="idp287424"> The Xerces C++ parser
     464<p id="idp287216"> The Xerces C++ parser
    465465           
    466466           
     
    473473            parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM
    474474            tree-based parsing interface. </p>
    475 <p id="idp289584">
     475<p id="idp289440">
    476476           
    477477           
     
    492492            expected that a comprehensive restructuring is required, involving all aspects of the
    493493            parser. </p>
    494 <div class="table-wrapper" id="idp292544">
     494<div class="table-wrapper" id="idp292400">
    495495<p class="title">Table I</p>
    496 <div class="caption"><p id="idm847680">Execution Time of Top 10 Xerces Functions</p></div>
     496<div class="caption"><p id="idm848128">Execution Time of Top 10 Xerces Functions</p></div>
    497497<table class="table">
    498498<colgroup span="1">
     
    549549</div>
    550550</div>
    551 <div class="section" id="idp330512">
     551<div class="section" id="idp330416">
    552552<h3 class="title" style="clear: both">The Parabix Framework</h3>
    553 <p id="idp331184"> The Parabix (parallel bit stream) framework is a transformative approach to XML
     553<p id="idp331088"> The Parabix (parallel bit stream) framework is a transformative approach to XML
    554554            parsing (and other forms of text processing.) The key idea is to exploit the
    555555            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
     
    576576             multiple
    577577            classes can share the classification cost. </p>
    578 <div class="table-wrapper" id="idp342752">
     578<div class="table-wrapper" id="idp342656">
    579579<p class="title">Table II</p>
    580 <div class="caption"><p id="idp343264">XML Source Data</p></div>
     580<div class="caption"><p id="idp343168">XML Source Data</p></div>
    581581<table class="table">
    582582<colgroup span="1">
     
    605605</table>
    606606</div>
    607 <div class="table-wrapper" id="idp358656">
     607<div class="table-wrapper" id="idp358560">
    608608<p class="title">Table III</p>
    609 <div class="caption"><p id="idp359168">8-bit ASCII Basis Bit Streams</p></div>
     609<div class="caption"><p id="idp359072">8-bit ASCII Basis Bit Streams</p></div>
    610610<table class="table">
    611611<colgroup span="1">
     
    674674</table>
    675675</div>
    676 <p id="idp399328"> Consider, for example, the XML source data stream shown in the first line of Table II.
     676<p id="idp399232"> Consider, for example, the XML source data stream shown in the first line of Table II.
    677677The remaining lines of this figure show
    678678            several parallel bit streams that are computed in Parabix-style parsing, with each bit
     
    686686            (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
    687687            marking the element names, attribute names and attribute values of tags. </p>
    688 <p id="idp401184"> Two intuitions may help explain how the Parabix approach can lead to improved XML
     688<p id="idp401088"> Two intuitions may help explain how the Parabix approach can lead to improved XML
    689689            parsing performance. The first is that the use of the full register width offers a
    690690            considerable information advantage over sequential byte-at-a-time parsing. That is,
     
    695695            individual decision-bits, an approach that computes many of them in parallel (e.g., 128
    696696            bytes at a time using 128-bit registers) should provide substantial benefit. </p>
    697 <p id="idp402432"> Previous studies have shown that the Parabix approach improves many aspects of XML
     697<p id="idp402336"> Previous studies have shown that the Parabix approach improves many aspects of XML
    698698            processing, including transcoding \cite{Cameron2008}, character classification and
    699699            validation, tag parsing and well-formedness checking. The first Parabix parser used
     
    704704            \cite{HPCA2012}. Although these research prototypes handled the full syntax of
    705705            schema-less XML documents, they lacked the functionality required by full XML parsers. </p>
    706 <p id="idp403696"> Commercial XML processors support transcoding of multiple character sets and can
     706<p id="idp403600"> Commercial XML processors support transcoding of multiple character sets and can
    707707            parse and validate against multiple document vocabularies. Additionally, they provide
    708708            API facilities beyond those found in research prototypes, including the widely used SAX,
    709709            SAX2 and DOM interfaces. </p>
    710710</div>
    711 <div class="section" id="idp404544">
     711<div class="section" id="idp404448">
    712712<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    713 <p id="idp405184"> Xerces—like all traditional XML parsers—processes XML documents
     713<p id="idp405088"> Xerces—like all traditional XML parsers—processes XML documents
    714714            sequentially. Each character is examined to distinguish between the XML-specific markup,
    715715            such as a left angle bracket <code class="code">&lt;</code>, and the content held within the
    716716            document. As the parser progresses through the document, it alternates between markup
    717717            scanning, validation and content processing modes. </p>
    718 <p id="idp406720"> In other words, Xerces belongs to an equivalent class applications termed FSM
     718<p id="idp406624"> In other words, Xerces belongs to an equivalent class applications termed FSM
    719719            applications\footnote{ Herein FSM applications are considered software systems whose
    720720            behaviour is defined by the inputs, current state and the events associated with
     
    722722            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
    723723            character could induce a state transition. </p>
    724 <p id="idp407632"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
     724<p id="idp407536"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
    725725            text is transformed into a set of lexical bitstreams, which undergo a series of
    726726            operations that can be grouped into logical layers, e.g., transposition, character
     
    731731</div>
    732732</div>
    733 <div class="section" id="idp408928">
     733<div class="section" id="idp408832">
    734734<h2 class="title" style="clear: both">Architecture</h2>
    735 <div class="section" id="idp409568">
     735<div class="section" id="idp409472">
    736736<h3 class="title" style="clear: both">Overview</h3>
    737 <p id="idp410464"> icXML is more than an optimized version of Xerces. Many components were grouped,
     737<p id="idp410368"> icXML is more than an optimized version of Xerces. Many components were grouped,
    738738            restructured and rearchitected with pipeline parallelism in mind. In this section, we
    739739            highlight the core differences between the two systems. As shown in Figure
     
    761761<p class="title">Figure 1: Xerces Architecture</p>
    762762<div class="figure-contents">
    763 <div class="mediaobject" id="idp417104"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
     763<div class="mediaobject" id="idp417008"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
    764764<div class="caption"></div>
    765765</div>
    766766</div>
    767 <p id="idp419360"> In icXML functions are grouped into logical components. As shown in Figure
     767<p id="idp419264"> In icXML functions are grouped into logical components. As shown in Figure
    768768            \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the
    769769            Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which
     
    784784            described in Section \ref{section:arch:errorhandling}. From here, two data-independent
    785785            branches exist: the Symbol Resolver and Content Preparation Unit. </p>
    786 <p id="idp423520"> A typical XML file contains few unique element and attribute names—but
     786<p id="idp423344"> A typical XML file contains few unique element and attribute names—but
    787787            each of them will occur frequently. icXML stores these as distinct data structures,
    788788            called symbols, each with their own global identifier (GID). Using the symbol marker
     
    790790               Resolver</span> scans through the raw data to produce a sequence of GIDs, called
    791791            the <span class="ital">symbol stream</span>. </p>
    792 <p id="idp426112"> The final components of the Parabix Subsystem are the <span class="ital">Content
     792<p id="idp425936"> The final components of the Parabix Subsystem are the <span class="ital">Content
    793793               Preparation Unit</span> and <span class="ital">Content Stream
    794794            Generator</span>. The former takes the (transposed) basis bitstreams and selectively
    795795            filters them, according to the information provided by the Parallel Markup Parser, and
    796796            the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p>
    797 <p id="idp429024"> Combined, the symbol and content stream form icXML's compressed IR of the XML
     797<p id="idp428848"> Combined, the symbol and content stream form icXML's compressed IR of the XML
    798798            document. The <span class="ital">Markup Processor</span>~parses the IR to
    799799            validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that
     
    807807<p class="title">Figure 2: icXML Architecture</p>
    808808<div class="figure-contents">
    809 <div class="mediaobject" id="idp434848"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
     809<div class="mediaobject" id="idp434672"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
    810810<div class="caption"></div>
    811811</div>
    812812</div>
    813813</div>
    814 <div class="section" id="idp437296">
     814<div class="section" id="idp437120">
    815815<h3 class="title" style="clear: both">Character Set Adapters</h3>
    816 <p id="idp437968"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
     816<p id="idp437792"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
    817817            Xerces itself and provide the end-consumer with a single encoding format. In the
    818818            important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant,
     
    821821            other cases, transcoding may involve table look-up operations for each byte of input. In
    822822            any case, transcoding imposes at least a cost of buffer copying. </p>
    823 <p id="idp439024"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
     823<p id="idp438848"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
    824824            transcoding costs. Given a specified input encoding, a CSA is responsible for checking
    825825            that input code units represent valid characters, mapping the characters of the encoding
     
    827827            item streams), as well as supporting ultimate transcoding requirements. All of this work
    828828            is performed using the parallel bitstream representation of the source input. </p>
    829 <p id="idp440000"> An important observation is that many character sets are an extension to the legacy
     829<p id="idp439824"> An important observation is that many character sets are an extension to the legacy
    830830            7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8,
    831831            UTF-16 and many others. Furthermore, all significant characters for parsing XML are
    832832            confined to the ASCII repertoire. Thus, a single common set of lexical item calculations
    833833            serves to compute lexical item streams for all such ASCII-based character sets. </p>
    834 <p id="idp440880"> A second observation is that—regardless of which character set is
     834<p id="idp440704"> A second observation is that—regardless of which character set is
    835835            used—quite often all of the characters in a particular block of input will be
    836836            within the ASCII range. This is a very simple test to perform using the bitstream
     
    839839            be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    840840            UTF-16 form are each set to zero in this case. </p>
    841 <p id="idp442800"> A third observation is that repeated transcoding of the names of XML elements,
     841<p id="idp442624"> A third observation is that repeated transcoding of the names of XML elements,
    842842            attributes and so on can be avoided by using a look-up mechanism. That is, the first
    843843            occurrence of each symbol is stored in a look-up table mapping the input encoding to a
     
    846846            symbol look up is required to apply various XML validation rules, there is achieves the
    847847            effect of transcoding each occurrence without additional cost. </p>
    848 <p id="idp443856"> The cost of individual character transcoding is avoided whenever a block of input is
     848<p id="idp443680"> The cost of individual character transcoding is avoided whenever a block of input is
    849849            confined to the ASCII subset and for all but the first occurrence of any XML element or
    850850            attribute name. Furthermore, when transcoding is required, the parallel bitstream
     
    857857            using bit scan operations. </p>
    858858</div>
    859 <div class="section" id="idp446192">
     859<div class="section" id="idp446016">
    860860<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    861 <p id="idp446880"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
     861<p id="idp446704"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
    862862            bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two
    863863            Chinese characters <code class="code">䜠奜</code> are represented as two
     
    873873            may then be completed by applying parallel deletion and inverse transposition of the
    874874            UTF-16 bitstreams\cite{Cameron2008}. </p>
    875 <div class="table-wrapper" id="idp451040">
     875<div class="table-wrapper" id="idp450864">
    876876<p class="title">Table IV</p>
    877 <div class="caption"><p id="idp451552">XML Source Data and Derived Parallel Bit Streams</p></div>
     877<div class="caption"><p id="idp451376">XML Source Data and Derived Parallel Bit Streams</p></div>
    878878<table class="table">
    879879<colgroup span="1">
     
    925925</table>
    926926</div>
    927 <p id="idp464640"> Rather than immediately paying the costs of deletion and transposition just for
     927<p id="idp464464"> Rather than immediately paying the costs of deletion and transposition just for
    928928            transcoding, however, icXML defers these steps so that the deletion masks for several
    929929            stages of processing may be combined. In particular, this includes core XML requirements
     
    938938           
    939939         </p>
    940 <p id="idp467152"> In essence, the deletion masks for transcoding and for line break normalization each
     940<p id="idp466976"> In essence, the deletion masks for transcoding and for line break normalization each
    941941            represent a bitwise filter; these filters can be combined using bitwise-or so that the
    942942            parallel deletion algorithm need only be applied once. </p>
    943 <p id="idp468464"> A further application of combined filtering is the processing of XML character and
     943<p id="idp468288"> A further application of combined filtering is the processing of XML character and
    944944            entity references. Consider, for example, the references <code class="code">&amp;</code> or
    945945               <code class="code">&lt;</code>. which must be replaced in XML processing with the single
     
    954954            UTF-16 code unit. In the case, that this is not true, it is addressed in
    955955            post-processing. </p>
    956 <p id="idp473280"> The final step of combined filtering occurs during the process of reducing markup
     956<p id="idp473168"> The final step of combined filtering occurs during the process of reducing markup
    957957            data to tag bytes preceding each significant XML transition as described in
    958958            section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying
     
    964964            Haswell architecture. </p>
    965965</div>
    966 <div class="section" id="idp474608">
     966<div class="section" id="idp474496">
    967967<h3 class="title" style="clear: both">Content Stream</h3>
    968 <p id="idp475280"> A relatively-unique concept for icXML is the use of a filtered content stream.
     968<p id="idp475168"> A relatively-unique concept for icXML is the use of a filtered content stream.
    969969            Rather that parsing an XML document in its original format, the input is transformed
    970970            into one that is easier for the parser to iterate through and produce the sequential
     
    974974           
    975975            through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p>
    976 <p id="idp477680"> Combined with the symbol stream, the parser traverses the content stream to
     976<p id="idp477568"> Combined with the symbol stream, the parser traverses the content stream to
    977977            effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following
    978978               <code class="code">&gt;</code> indicates that a start tag without any attributes is the first
     
    986986            null character in the content stream in parallel, which in turn means the parser can
    987987            directly jump to the end of every string without scanning for it. </p>
    988 <p id="idp481120"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
     988<p id="idp480960"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
    989989            existence of an attribute. Because all of the intra-element was performed in the Parabix
    990990            Subsystem, this must be a legal attribute. Since attributes can only occur within start
     
    10001000            that the appropriate scope-nesting rules have been applied. </p>
    10011001</div>
    1002 <div class="section" id="idp485264">
     1002<div class="section" id="idp485104">
    10031003<h3 class="title" style="clear: both">Namespace Handling</h3>
    1004 <p id="idp486352"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
     1004<p id="idp486192"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
    10051005            together. It is especially important when a vocabulary application-dependant meaning,
    10061006            such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound
     
    10191019            uniquely-named items because the current vocabulary is determined by the namespace(s)
    10201020            that are in-scope. </p>
    1021 <div class="table-wrapper" id="idp493520">
     1021<div class="table-wrapper" id="idp493312">
    10221022<p class="title">Table V</p>
    1023 <div class="caption"><p id="idp494032">XML Namespace Example</p></div>
     1023<div class="caption"><p id="idp493824">XML Namespace Example</p></div>
    10241024<table class="table">
    10251025<colgroup span="1">
     
    10551055</table>
    10561056</div>
    1057 <p id="idp503056"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
     1057<p id="idp502800"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
    10581058            persist for the lifetime of the application through the use of a global URI pool. Xerces
    10591059            maintains a stack of namespace scopes that is pushed (popped) every time a start tag
     
    10631063            those that declare a set of namespaces upfront and never change them, and (2) those that
    10641064            repeatedly modify the namespaces in predictable patterns. </p>
    1065 <p id="idp504192"> For that reason, icXML contains an independent namespace stack and utilizes bit
     1065<p id="idp503936"> For that reason, icXML contains an independent namespace stack and utilizes bit
    10661066            vectors to cheaply perform
    10671067             When a prefix is
     
    10771077            found using a bit-scan intrinsic. A namespace binding table, similar to Table
    10781078            \ref{tbl:namespace1}, provides the actual URI ID. </p>
    1079 <div class="table-wrapper" id="idp508608">
     1079<div class="table-wrapper" id="idp508352">
    10801080<p class="title">Table VI</p>
    1081 <div class="caption"><p id="idp509120">Namespace Binding Table Example</p></div>
     1081<div class="caption"><p id="idp508864">Namespace Binding Table Example</p></div>
    10821082<table class="table">
    10831083<colgroup span="1">
     
    11201120</table>
    11211121</div>
    1122 <p id="idp525504">
     1122<p id="idp524960">
    11231123           
    11241124           
     
    11261126           
    11271127         </p>
    1128 <p id="idp527408"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
     1128<p id="idp526864"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
    11291129            any modification to the currently visible namespaces is calculated and stored within a
    11301130            stack of bit vectors denoting the locally modified namespace bindings. When an end tag
     
    11351135         </p>
    11361136</div>
    1137 <div class="section" id="idp528864">
     1137<div class="section" id="idp528320">
    11381138<h3 class="title" style="clear: both">Error Handling</h3>
    1139 <p id="idp529536">
     1139<p id="idp528992">
    11401140           
    11411141            Xerces outputs error messages in two ways: through the programmer API and as thrown
     
    11461146            \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and
    11471147            Markup Processor, each with its own system for detecting and producing error messages. </p>
    1148 <p id="idp531168"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
     1148<p id="idp530624"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
    11491149            a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking
    11501150            the byte-position of an error within a block, and the type of error is determined by the
     
    11791179            detected, the sum of those skipped positions is subtracted from the distance to
    11801180            determine the actual column number. </p>
    1181 <p id="idp536656"> The Markup Processor is a state-driven machine. As such, error detection within it
     1181<p id="idp536112"> The Markup Processor is a state-driven machine. As such, error detection within it
    11821182            is very similar to Xerces. However, reporting the correct line/column is a much more
    11831183            difficult problem. The Markup Processor parses the content stream, which is a series of
     
    11931193</div>
    11941194</div>
    1195 <div class="section" id="idp539088">
     1195<div class="section" id="idp538544">
    11961196<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    1197 <p id="idp539792"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
     1197<p id="idp539248"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
    11981198         application. These are "embarrassingly
    11991199         sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to
     
    12031203         well into the general model of pipeline parallelism, in which each thread is in charge of a
    12041204         single module or group of modules. </p>
    1205 <p id="idp541648"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
     1205<p id="idp541104"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
    12061206         and the Markup Processor into distinct logical layers into two separate stages. The
    12071207         resultant application, <span class="ital">icXML-p</span>, is a course-grained
     
    12241224            <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can
    12251225         reuse the memory space. </p>
    1226 <p id="idp550720">
     1226<p id="idp550224">
    12271227        <div class="figure" id="threads_timeline1">
    12281228<p class="title">Figure 3: Thread Balance in Two-Stage Pipelines</p>
    12291229<div class="figure-contents">
    1230 <div class="mediaobject" id="idp552112"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>
     1230<div class="mediaobject" id="idp551616"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>
     1231<div class="mediaobject" id="idp553392"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>
    12311232<div class="caption"></div>
    12321233</div>
    12331234</div>
    1234         <div class="figure" id="threads_timeline2">
    1235 <p class="title">Figure 4: Thread Balance in Two-Stage Pipelines</p>
    1236 <div class="figure-contents">
    1237 <div class="mediaobject" id="idp555488"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>
    1238 <div class="caption"></div>
    1239 </div>
    1240 </div>
    12411235      </p>
    1242 <p id="idp557904"> Overall, our design is intended to benefit a range of applications. Conceptually, we
     1236<p id="idp555840"> Overall, our design is intended to benefit a range of applications. Conceptually, we
    12431237         consider two design points. The first, the parsing performed by the Parabix Subsystem
    12441238         dominates at 67% of the overall cost, with the cost of application processing (including
     
    12461240         scenario, the cost of application processing dominates at 60%, while the cost of XML
    12471241         parsing represents an overhead of 40%. </p>
    1248 <p id="idp558816"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
     1242<p id="idp556752"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
    12491243         100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement
    12501244         of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the
     
    12541248         about 33% of the original work. In this case, Amdahl's law predicts that we could expect up
    12551249         to a 3x speedup at best. </p>
    1256 <p id="idp559936"> At the other extreme of our design range, we consider an application in which core
     1250<p id="idp557872"> At the other extreme of our design range, we consider an application in which core
    12571251         parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the
    12581252         corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most
     
    12601254         the entire latency of parsing within the serial time required by the application. In this
    12611255         case, we achieve an overall speedup in processing time by 1.67x. </p>
    1262 <p id="idp560880"> Although the structure of the Parabix Subsystem allows division of the work into
     1256<p id="idp558816"> Although the structure of the Parabix Subsystem allows division of the work into
    12631257         several pipeline stages and has been demonstrated to be effective for four pipeline stages
    12641258         in a research prototype \cite{HPCA2012}, our analysis here suggests that the further
     
    12681262         the cost of application logic that could match reductions in core parsing cost. </p>
    12691263</div>
    1270 <div class="section" id="idp562720">
     1264<div class="section" id="idp560656">
    12711265<h2 class="title" style="clear: both">Performance</h2>
    1272 <p id="idp563392"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
     1266<p id="idp561328"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
    12731267         Xerces C++ SAXCount sample application, and a real world GML to SVG transformation
    12741268         application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy
     
    12761270         L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu
    12771271         12.04 (Linux). </p>
    1278 <p id="idp564304"> We analyzed the execution profiles of each XML parser using the performance counters
     1272<p id="idp562240"> We analyzed the execution profiles of each XML parser using the performance counters
    12791273         found in the processor. We chose several key hardware events that provide insight into the
    12801274         profile of each application and indicate if the processor is doing useful work. The set of
     
    12841278         collection of hardware performance monitoring statistics. In addition, we used the Linux
    12851279         perf \cite{perf} utility to collect per core hardware events. </p>
    1286 <div class="section" id="idp565440">
     1280<div class="section" id="idp563376">
    12871281<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    1288 <p id="idp566112"> Xerces comes with sample applications that demonstrate salient features of the
     1282<p id="idp564048"> Xerces comes with sample applications that demonstrate salient features of the
    12891283            parser. SAXCount is the simplest such application: it counts the elements, attributes
    12901284            and characters of a given XML file using the (event based) SAX API and prints out the
    12911285            totals. </p>
    1292 <p id="idp566864"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
     1286<p id="idp564752"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
    12931287            selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented
    12941288            XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the
    12951289            UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML
    12961290            documents and consist entirely of single byte encoded ASCII characters.
    1297   <div class="table-wrapper" id="idp567600">
     1291  <div class="table-wrapper" id="idp565488">
    12981292<p class="title">Table VII</p>
    1299 <div class="caption"><p id="idp568112">XML Document Characteristics</p></div>
     1293<div class="caption"><p id="idp566000">XML Document Characteristics</p></div>
    13001294<table class="table">
    13011295<colgroup span="1">
     
    13461340</div>           
    13471341</p>
    1348 <p id="idp583696"> A key predictor of the overall parsing performance of an XML file is markup
     1342<p id="idp581584"> A key predictor of the overall parsing performance of an XML file is markup
    13491343            density\footnote{ Markup Density: the ratio of markup bytes used to define the structure
    13501344            of the document vs. its file size.}. This metric has substantial influence on the
     
    13531347            of document-oriented and data-oriented XML files to analyze performance over a spectrum
    13541348            of markup densities. </p>
    1355 <p id="idp585312"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
     1349<p id="idp583200"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
    13561350            in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over
    13571351            Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve
     
    13601354            icXML-p performs better as markup-density increases because the work performed by each
    13611355            stage is well balanced in this application. </p>
    1362 <p id="idp586352">
     1356<p id="idp584240">
    13631357        <div class="figure" id="perf_SAX">
    1364 <p class="title">Figure 5: SAXCount Performance Comparison</p>
     1358<p class="title">Figure 4: SAXCount Performance Comparison</p>
    13651359<div class="figure-contents">
    1366 <div class="mediaobject" id="idp587744"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
     1360<div class="mediaobject" id="idp585632"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
    13671361<div class="caption"></div>
    13681362</div>
     
    13701364         </p>
    13711365</div>
    1372 <div class="section" id="idp590288">
     1366<div class="section" id="idp588176">
    13731367<h3 class="title" style="clear: both">GML2SVG</h3>
    1374 <p id="idp590960">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
     1368<p id="idp588848">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
    13751369was chosen.   This application transforms geospatially encoded data represented using
    13761370an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography}
     
    13841378a known XML format for the purpose of analysis and restructuring to meet
    13851379the requirements of an alternative format.</p>
    1386 <p id="idp592288">Our GML to SVG data translations are executed on GML source data
     1380<p id="idp590224">Our GML to SVG data translations are executed on GML source data
    13871381modelling the city of Vancouver, British Columbia, Canada.
    13881382The GML source document set
     
    13921386213.4 MB of source GML data generates 91.9 MB of target SVG data.</p>
    13931387<div class="figure" id="perf_GML2SVG">
    1394 <p class="title">Figure 6: Performance Comparison for GML2SVG</p>
     1388<p class="title">Figure 5: Performance Comparison for GML2SVG</p>
    13951389<div class="figure-contents">
    1396 <div class="mediaobject" id="idp594320"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
     1390<div class="mediaobject" id="idp592304"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
    13971391<div class="caption"></div>
    13981392</div>
    13991393</div>
    1400 <p id="idp596608">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
    1401 the Xerces, \icXML{} and \icXMLp{}.   
    1402 On the GML workload with this application, single-thread \icXML{}
    1403 achieved about a 50\% acceleration over Xerces,
     1394<p id="idp594592">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
     1395the Xerces, icXML and icXML-p.   
     1396On the GML workload with this application, single-thread icXML
     1397achieved about a 50% acceleration over Xerces,
    14041398increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   
    1405 Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,
     1399Using icXML-p, a further throughput increase to 111 MB/sec was recorded,
    14061400approximately a 2X speedup.</p>
    1407 <p id="idp597440">An important aspect of \icXML{} is the replacement of much branch-laden
     1401<p id="idp595408">An important aspect of icXML is the replacement of much branch-laden
    14081402sequential code inside Xerces with straight-line SIMD code using far
    14091403fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
    14101404improvement in branching behaviour, with a dramatic reduction in branch misses per kB.
    1411 It is also interesting to note that \icXMLp{} goes even further.   
     1405It is also interesting to note that icXML-p goes even further.   
    14121406In essence, in using pipeline parallelism to split the instruction
    14131407stream onto separate cores, the branch target buffers on each core are
    14141408less overloaded and able to increase the successful branch prediction rate.</p>
    14151409<div class="figure" id="branchmiss_GML2SVG">
    1416 <p class="title">Figure 7: Comparative Branch Misprediction Rate</p>
     1410<p class="title">Figure 6: Comparative Branch Misprediction Rate</p>
    14171411<div class="figure-contents">
    1418 <div class="mediaobject" id="idp600144"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
     1412<div class="mediaobject" id="idp597520"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
    14191413<div class="caption"></div>
    14201414</div>
    14211415</div>
    1422 <p id="idp602432">The behaviour of the three versions with respect to L1 cache misses per kB is shown
     1416<p id="idp599808">The behaviour of the three versions with respect to L1 cache misses per kB is shown
    14231417in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
    14241418and data-cache performance with the improvements in instruction-cache
    1425 behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
     1419behaviour the most dramatic.   Single-threaded icXML shows substantially improved
    14261420performance over Xerces on both measures.   
    1427 Although \icXMLp{} is slightly worse \wrt{} data-cache performance,
     1421Although icXML-p is slightly worse with respect to data-cache performance,
    14281422this is more than offset by a further dramatic reduction in instruction-cache miss rate.
    14291423Again partitioning the instruction stream through the pipeline parallelism model has
    14301424significant benefit.</p>
    14311425<div class="figure" id="cachemiss_GML2SVG">
    1432 <p class="title">Figure 8: Comparative Cache Miss Rate</p>
     1426<p class="title">Figure 7: Comparative Cache Miss Rate</p>
    14331427<div class="figure-contents">
    1434 <div class="mediaobject" id="idp604544"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
     1428<div class="mediaobject" id="idp601936"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
    14351429<div class="caption"></div>
    14361430</div>
    14371431</div>
    1438 <p id="idp606832">One caveat with this study is that the GML2SVG application did not exhibit
     1432<p id="idp604224">One caveat with this study is that the GML2SVG application did not exhibit
    14391433a relative balance of processing between application code and Xerces library
    1440 code reaching the 33\% figure.  This suggests that for this application and
     1434code reaching the 33% figure.  This suggests that for this application and
    14411435possibly others, further separating the logical layers of the
    1442 \icXML{} engine into different pipeline stages could well offer significant benefit.
     1436icXML engine into different pipeline stages could well offer significant benefit.
    14431437This remains an area of ongoing work.</p>
    14441438</div>
    14451439</div>
    1446 <div class="section" id="idp607904">
     1440<div class="section" id="idp605296">
    14471441<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    1448 <p id="idp608592"> This paper is the first case study documenting the significant performance benefits
     1442<p id="idp605984"> This paper is the first case study documenting the significant performance benefits
    14491443         that may be realized through the integration of parallel bitstream technology into existing
    14501444         widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined
     
    14561450         technologies, this is an important case study demonstrating the general feasibility of
    14571451         these techniques. </p>
    1458 <p id="idp609872"> The further development of icXML to move beyond 2-stage pipeline parallelism is
     1452<p id="idp607264"> The further development of icXML to move beyond 2-stage pipeline parallelism is
    14591453         ongoing, with realistic prospects for four reasonably balanced stages within the library.
    14601454         For applications such as GML2SVG which are dominated by time spent on XML parsing, such a
    14611455         multistage pipelined parsing library should offer substantial benefits. </p>
    1462 <p id="idp610640"> The example of XML parsing may be considered prototypical of finite-state machines
     1456<p id="idp608032"> The example of XML parsing may be considered prototypical of finite-state machines
    14631457         applications which have sometimes been considered "embarassingly
    14641458         sequential" and so difficult to parallelize that "nothing
     
    14661460         point in making the case that parallelization can indeed be helpful across a broad array of
    14671461         application types. </p>
    1468 <p id="idp612016"> To overcome the software engineering challenges in applying parallel bitstream
     1462<p id="idp609408"> To overcome the software engineering challenges in applying parallel bitstream
    14691463         technology to existing software systems, it is clear that better library and tool support
    14701464         is needed. The techniques used in the implementation of icXML and documented in this paper
     
    14731467      </p>
    14741468</div>
    1475 <div class="bibliography" id="idp613504">
     1469<div class="bibliography" id="idp611376">
    14761470<h2 class="title" style="clear:both">Bibliography</h2>
    14771471<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3051 r3052  
    798798            </imageobject>
    799799          </mediaobject>
    800           <caption>
    801           </caption>
    802         </figure>
    803         <figure xml:id="threads_timeline2">
    804           <title>Thread Balance in Two-Stage Pipelines</title>
    805800          <mediaobject>
    806801            <imageobject>
     
    952947       
    953948<para>Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
    954 the Xerces, \icXML{} and \icXMLp{}.   
    955 On the GML workload with this application, single-thread \icXML{}
    956 achieved about a 50\% acceleration over Xerces,
     949the Xerces, icXML and icXML-p.   
     950On the GML workload with this application, single-thread icXML
     951achieved about a 50% acceleration over Xerces,
    957952increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   
    958 Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,
     953Using icXML-p, a further throughput increase to 111 MB/sec was recorded,
    959954approximately a 2X speedup.</para>
    960955
    961 <para>An important aspect of \icXML{} is the replacement of much branch-laden
     956<para>An important aspect of icXML is the replacement of much branch-laden
    962957sequential code inside Xerces with straight-line SIMD code using far
    963958fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
    964959improvement in branching behaviour, with a dramatic reduction in branch misses per kB.
    965 It is also interesting to note that \icXMLp{} goes even further.   
     960It is also interesting to note that icXML-p goes even further.   
    966961In essence, in using pipeline parallelism to split the instruction
    967962stream onto separate cores, the branch target buffers on each core are
     
    982977in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
    983978and data-cache performance with the improvements in instruction-cache
    984 behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
     979behaviour the most dramatic.   Single-threaded icXML shows substantially improved
    985980performance over Xerces on both measures.   
    986 Although \icXMLp{} is slightly worse \wrt{} data-cache performance,
     981Although icXML-p is slightly worse with respect to data-cache performance,
    987982this is more than offset by a further dramatic reduction in instruction-cache miss rate.
    988983Again partitioning the instruction stream through the pipeline parallelism model has
     
    1002997<para>One caveat with this study is that the GML2SVG application did not exhibit
    1003998a relative balance of processing between application code and Xerces library
    1004 code reaching the 33\% figure.  This suggests that for this application and
     999code reaching the 33% figure.  This suggests that for this application and
    10051000possibly others, further separating the logical layers of the
    1006 \icXML{} engine into different pipeline stages could well offer significant benefit.
     1001icXML engine into different pipeline stages could well offer significant benefit.
    10071002This remains an area of ongoing work.</para>
    10081003      </section>
Note: See TracChangeset for help on using the changeset viewer.