Changeset 3040 for docs/Balisage13


Ignore:
Timestamp:
Apr 18, 2013, 8:20:23 PM (6 years ago)
Author:
ksherdy
Message:

Added .css files to output directory. Formatting updates.

Location:
docs/Balisage13
Files:
18 added
1 deleted
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3039 r3040  
    1212<div lang="en" class="article">
    1313<div class="titlepage">
    14 <h2 class="article-title" id="idp197760"></h2>
     14<h2 class="article-title" id="idp270784"></h2>
    1515<div class="author">
    1616<h3 class="author">Nigel Medforth</h3>
     
    5555<div class="abstract">
    5656<p class="title"><b>Abstract</b></p>
    57 <p id="idp286432">Prior research on the acceleration of XML processing
     57<p id="idp438976">Prior research on the acceleration of XML processing
    5858using SIMD and multi-core parallelism has lead to
    5959a number of interesting research prototypes.  This work
     
    7676<p><b>Table of Contents</b></p>
    7777<dl>
    78 <dt><span class="section"><a href="#idp273984" class="toc">Introduction</a></span></dt>
    79 <dt><span class="section"><a href="#idp274904" class="toc">Background</a></span></dt>
     78<dt><span class="section"><a href="#idp426544" class="toc">Introduction</a></span></dt>
     79<dt><span class="section"><a href="#idp427464" class="toc">Background</a></span></dt>
    8080<dd><dl>
    81 <dt><span class="section"><a href="#idp275240" class="toc">Xerces C++ Structure</a></span></dt>
    82 <dt><span class="section"><a href="#idp280344" class="toc">The Parabix Framework</a></span></dt>
    83 <dt><span class="section"><a href="#idp416432" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     81<dt><span class="section"><a href="#idp427800" class="toc">Xerces C++ Structure</a></span></dt>
     82<dt><span class="section"><a href="#idp432856" class="toc">The Parabix Framework</a></span></dt>
     83<dt><span class="section"><a href="#idp584560" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    8484</dl></dd>
    85 <dt><span class="section"><a href="#idp419384" class="toc">Architecture</a></span></dt>
     85<dt><span class="section"><a href="#idp587512" class="toc">Architecture</a></span></dt>
    8686<dd><dl>
    87 <dt><span class="section"><a href="#idp419704" class="toc">Overview</a></span></dt>
    88 <dt><span class="section"><a href="#idp429312" class="toc">Character Set Adapters</a></span></dt>
    89 <dt><span class="section"><a href="#idp435504" class="toc">Combined Parallel Filtering</a></span></dt>
    90 <dt><span class="section"><a href="#idp249920" class="toc">Content Stream</a></span></dt>
    91 <dt><span class="section"><a href="#idp254672" class="toc">Namespace Handling</a></span></dt>
    92 <dt><span class="section"><a href="#idp469568" class="toc">Error Handling</a></span></dt>
     87<dt><span class="section"><a href="#idp587832" class="toc">Overview</a></span></dt>
     88<dt><span class="section"><a href="#idp609600" class="toc">Character Set Adapters</a></span></dt>
     89<dt><span class="section"><a href="#idp616344" class="toc">Combined Parallel Filtering</a></span></dt>
     90<dt><span class="section"><a href="#idp628552" class="toc">Content Stream</a></span></dt>
     91<dt><span class="section"><a href="#idp634584" class="toc">Namespace Handling</a></span></dt>
     92<dt><span class="section"><a href="#idp642952" class="toc">Error Handling</a></span></dt>
    9393</dl></dd>
    94 <dt><span class="section"><a href="#idp476680" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
    95 <dt><span class="section"><a href="#idp483472" class="toc">Performance</a></span></dt>
     94<dt><span class="section"><a href="#idp650360" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
     95<dt><span class="section"><a href="#idp657176" class="toc">Performance</a></span></dt>
    9696<dd><dl>
    97 <dt><span class="section"><a href="#idp485352" class="toc">Xerces C++ SAXCount</a></span></dt>
    98 <dt><span class="section"><a href="#idp490336" class="toc">GML2SVG</a></span></dt>
     97<dt><span class="section"><a href="#idp659048" class="toc">Xerces C++ SAXCount</a></span></dt>
     98<dt><span class="section"><a href="#idp663432" class="toc">GML2SVG</a></span></dt>
    9999</dl></dd>
    100 <dt><span class="section"><a href="#idp490912" class="toc">Conclusion and Future Work</a></span></dt>
     100<dt><span class="section"><a href="#idp664008" class="toc">Conclusion and Future Work</a></span></dt>
    101101</dl>
    102102</div>
    103 <div class="section" id="idp273984">
     103<div class="section" id="idp426544">
    104104<h2 class="title" style="clear: both">Introduction</h2>
    105 <p id="idp274328"></p>
    106 <p id="idp274456"></p>
    107 <p id="idp274584"></p>
    108 <p id="idp274712"></p>
    109 </div>
    110 <div class="section" id="idp274904">
     105<p id="idp426888"></p>
     106<p id="idp427016"></p>
     107<p id="idp427144"></p>
     108<p id="idp427272"></p>
     109</div>
     110<div class="section" id="idp427464">
    111111<h2 class="title" style="clear: both">Background</h2>
    112 <div class="section" id="idp275240">
     112<div class="section" id="idp427800">
    113113<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    114 <p id="idp275584">
     114<p id="idp428144">
    115115The Xerces C++ parser
    116116
     
    130130parsing as well as a DOM tree-based parsing interface.
    131131</p>
    132 <p id="idp277232">
     132<p id="idp429744">
    133133
    134134
     
    151151is required, involving all aspects of the parser.
    152152</p>
    153 <p id="idp279408">
    154 
    155 
    156 
    157 
    158 </p>
    159 </div>
    160 <div class="section" id="idp280344">
     153<p id="idp431920">
     154
     155
     156
     157
     158</p>
     159</div>
     160<div class="section" id="idp432856">
    161161<h3 class="title" style="clear: both">The Parabix Framework</h3>
    162 <p id="idp280664">
     162<p id="idp433176">
    163163The Parabix (parallel bit stream) framework is a transformative approach to XML parsing
    164164(and other forms of text processing.) The key idea is to exploit the availability of wide
     
    168168In
    169169
    170 Boolean-logic operations\footnote{$∧$, $\√$ and $¬$ denote the boolean AND, OR and NOT operators.}
    171 are used to classify the input bits into a set of {\it character-class bit streams}, which identify key
     170Boolean-logic operations\footnote{∧, \√ and ¬ denote the boolean AND, OR and NOT operators.}
     171are used to classify the input bits into a set of <span class="ital">character-class bit streams</span>, which identify key
    172172characters (or groups of characters) with a $1$.
    173173For example, one of the fundamental characters in XML is a left-angle bracket.
    174 A character is a<code class="code">&lt; if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land
    175 b_3) \land (b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1</code>.
     174A character is a<code class="code">&lt; if and only if ¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧
     175b<sub>3</sub>) ∧ (b<sub>4</sub> ∧ b<sub>5</sub>) ∧ ¬ (b<sub>6</sub> √ b<sub>7</sub>) = 1</code>.
    176176Similarly, a character is numeric
    177 <code class="code">[0-9] if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))</code>.
     177<code class="code">[0-9] if and only if $¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧ b<sub>3</sub>) ∧ ¬(b<sub>4</sub> ∧ (b<sub>5</sub> √ b<sub>6</sub>))</code>.
    178178An important observation here is that ranges of characters may
    179179require fewer operations than individual characters and
     
    181181multiple classes can share the classification cost.
    182182</p>
    183 <p id="idp409632">
    184 
    185 </p>
    186 <p id="idp411072">
     183<p id="idp577560">
     184
     185</p>
     186<p id="idp579000">
    187187Consider, for example, the XML source data stream shown in the first line of .
    188188The remaining lines of this figure show several parallel bit streams that are computed in Parabix-style
     
    198198lines show streams that can be computed in subsequent
    199199parsing (using the technique
    200 of \bitstream{} addition \cite{cameron-EuroPar2011}), namely streams marking the element names,
     200of bitstream addition \cite{cameron-EuroPar2011}), namely streams marking the element names,
    201201attribute names and attribute values of tags. 
    202202</p>
    203 <p id="idp413944">
     203<p id="idp580288">
    204204Two intuitions may help explain how the Parabix approach can lead
    205205to improved XML parsing performance. The first is that
     
    216216should provide substantial benefit.
    217217</p>
    218 <p id="idp414904">
     218<p id="idp580480">
    219219Previous studies have shown that the Parabix approach improves many aspects of XML processing,
    220220including transcoding \cite{Cameron2008}, character classification and validation,
     
    223223sequential scanning loops for individual characters \cite{CameronHerdyLin2008}.
    224224Recent work has incorporated a method of parallel
    225 scanning using \bitstream{} addition \cite{cameron-EuroPar2011}, as
     225scanning using bitstream addition \cite{cameron-EuroPar2011}, as
    226226well as combining SIMD methods with 4-stage pipeline parallelism to further improve
    227227throughput \cite{HPCA2012}.
     
    229229they lacked the functionality required by full XML parsers.
    230230</p>
    231 <p id="idp415880">
     231<p id="idp580672">
    232232Commercial XML processors support transcoding of multiple character sets and can parse and
    233233validate against multiple document vocabularies.
     
    236236</p>
    237237</div>
    238 <div class="section" id="idp416432">
     238<div class="section" id="idp584560">
    239239<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    240 <p id="idp416752">
     240<p id="idp584880">
    241241Xerces—like all traditional XML parsers—processes XML documents sequentially.
    242242Each character is examined to distinguish between the
     
    246246validation and content processing modes.
    247247</p>
    248 <p id="idp417824">
     248<p id="idp585952">
    249249In other words, Xerces belongs to an equivalent class applications termed FSM applications\footnote{
    250250  Herein FSM applications are considered software systems whose behaviour is defined by the inputs,
     
    253253Unfortunately, textual data tends to be unpredictable and any character could induce a state transition.
    254254</p>
    255 <p id="idp418488">
     255<p id="idp586616">
    256256Parabix-style XML parsers utilize a concept of layered processing.
    257 A block of source text is transformed into a set of lexical \bitstream{}s,
     257A block of source text is transformed into a set of lexical bitstreams,
    258258which undergo a series of operations that can be grouped into logical layers,
    259259e.g., transposition, character classification, and lexical analysis.
     
    264264</div>
    265265</div>
    266 <div class="section" id="idp419384">
     266<div class="section" id="idp587512">
    267267<h2 class="title" style="clear: both">Architecture</h2>
    268 <div class="section" id="idp419704">
     268<div class="section" id="idp587832">
    269269<h3 class="title" style="clear: both">Overview</h3>
    270 <p id="idp420152">
    271 \icXML{} is more than an optimized version of Xerces. Many components were grouped, restructured and
     270<p id="idp588280">
     271icXML is more than an optimized version of Xerces. Many components were grouped, restructured and
    272272rearchitected with pipeline parallelism in mind.
    273273In this section, we highlight the core differences between the two systems.
    274274As shown in Figure \ref{fig:xerces-arch}, Xerces
    275275is comprised of five main modules: the transcoder, reader, scanner, namespace binder, and validator.
    276 The {\it Transcoder} converts source data into UTF-16 before Xerces parses it as XML;
     276The <span class="ital">Transcoder</span> converts source data into UTF-16 before Xerces parses it as XML;
    277277the majority of the character set encoding validation is performed as a byproduct of this process.
    278 The {\it Reader} is responsible for the streaming and buffering of all raw and transcoded (UTF-16) text.
     278The <span class="ital">Reader</span> is responsible for the streaming and buffering of all raw and transcoded (UTF-16) text.
    279279It tracks the current line/column position,
    280280
    281281performs line-break normalization and validates context-specific character set issues,
    282282such as tokenization of qualified-names.
    283 The {\it Scanner} pulls data through the reader and constructs the intermediate representation (IR)
     283The <span class="ital">Scanner</span> pulls data through the reader and constructs the intermediate representation (IR)
    284284of the document; it deals with all issues related to entity expansion, validates
    285285the XML well-formedness constraints and any character set encoding issues that cannot
    286286be completely handled by the reader or transcoder (e.g., surrogate characters, validation
    287287and normalization of character references, etc.)
    288 The {\it Namespace Binder} is a core piece of the element stack.
     288The <span class="ital">Namespace Binder</span> is a core piece of the element stack.
    289289It handles namespace scoping issues between different XML vocabularies.
    290290This allows the scanner to properly select the correct schema grammar structures.
    291 The {\it Validator} takes the IR produced by the Scanner (and
     291The <span class="ital">Validator</span> takes the IR produced by the Scanner (and
    292292potentially annotated by the Namespace Binder) and assesses whether the final output matches
    293293the user-defined DTD and schema grammar(s) before passing it to the end-user.
    294294</p>
    295 <p id="idp422304">
    296 
    297 </p>
    298 <p id="idp422560">
    299 In \icXML{} functions are grouped into logical components.
    300 As shown in Figure \ref{fig:icxml-arch}, two major categories exist: (1) the \PS{} and (2) the \MP{}.
    301 All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which represents data as a set of parallel \bitstream{}s.
    302 The {\it Character Set Adapter}, discussed in Section \ref{arch:character-set-adapter},
     295<p id="idp591904">
     296
     297</p>
     298<p id="idp592160">
     299In icXML functions are grouped into logical components.
     300As shown in Figure \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the Markup Processor.
     301All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which represents data as a set of parallel bitstreams.
     302The <span class="ital">Character Set Adapter</span>, discussed in Section \ref{arch:character-set-adapter},
    303303mirrors Xerces's Transcoder duties; however instead of producing UTF-16 it produces a
    304 set of lexical \bitstream{}s, similar to those shown in Figure \ref{fig:parabix1}.
    305 These lexical \bitstream{}s are later transformed into UTF-16 in the \CSG{},
     304set of lexical bitstreams, similar to those shown in Figure \ref{fig:parabix1}.
     305These lexical bitstreams are later transformed into UTF-16 in the Content Stream Generator,
    306306after additional processing is performed.
    307 The first precursor to producing UTF-16 is the {\it Parallel Markup Parser} phase.
    308 It takes the lexical streams and produces a set of marker \bitstream{}s in which a 1-bit identifies
    309 significant positions within the input data. One \bitstream{} for each of the critical piece of information is created, such as
     307The first precursor to producing UTF-16 is the <span class="ital">Parallel Markup Parser</span> phase.
     308It takes the lexical streams and produces a set of marker bitstreams in which a 1-bit identifies
     309significant positions within the input data. One bitstream for each of the critical piece of information is created, such as
    310310the beginning and ending of start tags, end tags, element names, attribute names, attribute values and content.
    311311Intra-element well-formedness validation is performed as an artifact of this process.
    312 Like Xerces, \icXML{} must provide the Line and Column position of each error.
    313 The {\it Line-Column Tracker} uses the lexical information to keep track of the document position(s) through the use of an
     312Like Xerces, icXML must provide the Line and Column position of each error.
     313The <span class="ital">Line-Column Tracker</span> uses the lexical information to keep track of the document position(s) through the use of an
    314314optimized population count algorithm, described in Section \ref{section:arch:errorhandling}.
    315315From here, two data-independent branches exist: the Symbol Resolver and Content Preparation Unit.
    316316</p>
    317 <p id="idp426184">
     317<p id="idp404616">
    318318A typical XML file contains few unique element and attribute names—but each of them will occur frequently.
    319 \icXML{} stores these as distinct data structures, called symbols, each with their own global identifier (GID).
    320 Using the symbol marker streams produced by the Parallel Markup Parser, the {\it Symbol Resolver} scans through
    321 the raw data to produce a sequence of GIDs, called the {\it symbol stream}.
    322 </p>
    323 <p id="idp427352">
    324 The final components of the \PS{} are the {\it Content Preparation Unit} and {\it \CSG{}}.
    325 The former takes the (transposed) basis \bitstream{}s and selectively filters them, according to the
     319icXML stores these as distinct data structures, called symbols, each with their own global identifier (GID).
     320Using the symbol marker streams produced by the Parallel Markup Parser, the <span class="ital">Symbol Resolver</span> scans through
     321the raw data to produce a sequence of GIDs, called the <span class="ital">symbol stream</span>.
     322</p>
     323<p id="idp406288">
     324The final components of the Parabix Subsystem are the <span class="ital">Content Preparation Unit</span> and <span class="ital">Content Stream Generator</span>.
     325The former takes the (transposed) basis bitstreams and selectively filters them, according to the
    326326information provided by the Parallel Markup Parser, and the latter transforms the
    327 filtered streams into the tagged UTF-16 {\it content stream}, discussed in Section \ref{section:arch:contentstream}.
    328 </p>
    329 <p id="idp427944">
    330 Combined, the symbol and content stream form \icXML{}'s compressed IR of the XML document.
    331 The {\it \MP{}}~parses the IR to validate and produce the sequential output for the end user.
    332 The {\it Final WF checker} performs inter-element well-formedness validation that would be too costly
     327filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}.
     328</p>
     329<p id="idp407840">
     330Combined, the symbol and content stream form icXML's compressed IR of the XML document.
     331The <span class="ital">Markup Processor</span>~parses the IR to validate and produce the sequential output for the end user.
     332The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that would be too costly
    333333to perform in bit space, such as ensuring every start tag has a matching end tag.
    334 Xerces's namespace binding functionality is replaced by the {\it Namespace Processor}. Unlike Xerces,
    335 it is a discrete phase that produces a series of URI identifiers (URI IDs), the {\it URI stream}, which are
     334Xerces's namespace binding functionality is replaced by the <span class="ital">Namespace Processor</span>. Unlike Xerces,
     335it is a discrete phase that produces a series of URI identifiers (URI IDs), the <span class="ital">URI stream</span>, which are
    336336associated with each symbol occurrence.
    337337This is discussed in Section \ref{section:arch:namespacehandling}.
    338 Finally, the {\it Validation} layer implements the Xerces's validator.
     338Finally, the <span class="ital">Validation</span> layer implements the Xerces's validator.
    339339However, preprocessing associated with each symbol greatly reduces the work of this stage.
    340340</p>
    341 <p id="idp428992">
    342 
    343 </p>
    344 </div>
    345 <div class="section" id="idp429312">
     341<p id="idp609280">
     342
     343</p>
     344</div>
     345<div class="section" id="idp609600">
    346346<h3 class="title" style="clear: both">Character Set Adapters</h3>
    347 <p id="idp429936">
     347<p id="idp610216">
    348348In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of Xerces itself and
    349349provide the end-consumer with a single encoding format.
     
    354354transcoding imposes at least a cost of buffer copying.
    355355</p>
    356 <p id="idp430720">
    357 In \icXML{}, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
     356<p id="idp611592">
     357In icXML, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
    358358Given a specified input encoding, a CSA is responsible for checking that
    359359input code units represent valid characters, mapping the characters of the encoding into
    360 the appropriate \bitstream{}s for XML parsing actions (i.e., producing the lexical item
     360the appropriate bitstreams for XML parsing actions (i.e., producing the lexical item
    361361streams), as well as supporting ultimate transcoding requirements.   All of this work
    362 is performed using the parallel \bitstream{} representation of the source input.
    363 </p>
    364 <p id="idp431448">
     362is performed using the parallel bitstream representation of the source input.
     363</p>
     364<p id="idp612312">
    365365An important observation is that many character sets are an
    366366extension to the legacy 7-bit ASCII character set.  This includes the
     
    370370serves to compute lexical item streams for all such ASCII-based character sets.
    371371</p>
    372 <p id="idp432080">
     372<p id="idp612944">
    373373A second observation is that—regardless of which character set is used—quite
    374374often all of the characters in a particular block of input will be within the ASCII range.
    375 This is a very simple test to perform using the \bitstream{} representation, simply confirming that the
     375This is a very simple test to perform using the bitstream representation, simply confirming that the
    376376bit 0 stream is zero for the entire block.   For blocks satisfying this test,
    377377all logic dealing with non-ASCII characters can simply be skipped.
    378 Transcoding to UTF-16 becomes trivial as the high eight \bitstream{}s of the
     378Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    379379UTF-16 form are each set to zero in this case.
    380380</p>
    381 <p id="idp433648">
     381<p id="idp614496">
    382382A third observation is that repeated transcoding of the names of XML
    383383elements, attributes and so on can be avoided by using a look-up mechanism.
     
    390390additional cost.
    391391</p>
    392 <p id="idp434432">
     392<p id="idp615280">
    393393The cost of individual character transcoding is avoided whenever a block of input is
    394394confined to the ASCII subset and for all but the first occurrence of any XML element or attribute name.
    395 Furthermore, when transcoding is required, the parallel \bitstream{} representation
     395Furthermore, when transcoding is required, the parallel bitstream representation
    396396supports efficient transcoding operations.   
    397 In the important case of UTF-8 to UTF-16 transcoding, the corresponding UTF-16 \bitstream{}s
     397In the important case of UTF-8 to UTF-16 transcoding, the corresponding UTF-16 bitstreams
    398398can be calculated in bit parallel fashion based on UTF-8 streams \cite{Cameron2008},
    399399and all but the final bytes of multi-byte sequences can be marked for deletion as
     
    404404</p>
    405405</div>
    406 <div class="section" id="idp435504">
     406<div class="section" id="idp616344">
    407407<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    408 <p id="idp435824">
     408<p id="idp616664">
    409409As just mentioned, UTF-8 to UTF-16 transcoding involves marking
    410410all but the last bytes of multi-byte UTF-8 sequences as
     
    425425\verb'110110'.  Using this approach, transcoding may then be
    426426completed by applying parallel deletion and inverse transposition of the
    427 UTF-16 \bitstream{}s\cite{Cameron2008}.
    428 </p>
    429 <p id="idp438672">
    430 
    431 
    432 
    433 
    434 
    435 
    436 
    437 
    438 
    439 </p>
    440 <p id="idp441152">
     427UTF-16 bitstreams\cite{Cameron2008}.
     428</p>
     429<p id="idp619512">
     430
     431
     432
     433
     434
     435
     436
     437
     438
     439</p>
     440<p id="idp621992">
    441441Rather than immediately paying the
    442442costs of deletion and transposition just for transcoding,
    443 however, \icXML{} defers these steps so that the deletion
     443however, icXML defers these steps so that the deletion
    444444masks for several stages of processing may be combined.
    445445In particular, this includes core XML requirements
     
    450450returns (CR), line feeds (LF) and CR-LF combinations
    451451must be normalized to a single LF character in
    452 each case.   In \icXML{}, this is achieved by
     452each case.   In icXML, this is achieved by
    453453first marking CR positions, performing two
    454454bit parallel operations to transform the marked
     
    458458
    459459</p>
    460 <p id="idp441472">
     460<p id="idp622312">
    461461In essence, the deletion masks for transcoding and
    462462for line break normalization each represent a bitwise
     
    465465applied once.
    466466</p>
    467 <p id="idp443472">
     467<p id="idp624608">
    468468A further application of combined filtering
    469469is the processing of XML character and entity
     
    471471which must be replaced in XML processing with 
    472472the single <code class="code">&amp;</code> and <code class="code">&lt;</code> characters, respectively.
    473 The approach in \icXML{} is to mark all but the first character
     473The approach in icXML is to mark all but the first character
    474474positions of each reference for deletion, leaving a
    475475single character position unmodified.  Thus, for the
     
    484484that this is not true, it is addressed in post-processing.
    485485</p>
    486 <p id="idp446576">
     486<p id="idp627584">
    487487The final step of combined filtering occurs during
    488488the process of reducing markup data to tag bytes
    489489preceding each significant XML transition as described
    490 in section~\ref{section:arch:contentstream}.  Overall, \icXML{}
     490in section~\ref{section:arch:contentstream}.  Overall, icXML
    491491avoids separate buffer copying operations for each of the
    492492these filtering steps, paying the cost of parallel
    493493deletion and inverse transposition only once. 
    494 Currently, \icXML{} employs the parallel-prefix compress algorithm
     494Currently, icXML employs the parallel-prefix compress algorithm
    495495of Steele~\cite{HackersDelight}  Performance
    496496is independent of the number of positions deleted.
    497 Future versions of \icXML{} are expected to
     497Future versions of icXML are expected to
    498498take advantage of the parallel extract operation~\cite{HilewitzLee2006}
    499499that Intel is now providing in its Haswell architecture.
    500500</p>
    501501</div>
    502 <div class="section" id="idp249920">
     502<div class="section" id="idp628552">
    503503<h3 class="title" style="clear: both">Content Stream</h3>
    504 <p id="idp250240">
    505 A relatively-unique concept for \icXML{} is the use of a filtered content stream.
     504<p id="idp628872">
     505A relatively-unique concept for icXML is the use of a filtered content stream.
    506506Rather that parsing an XML document in its original format, the input is transformed
    507507into one that is easier for the parser to iterate through and produce the sequential
     
    514514through the parallel filtering algorithm, described in section \ref{sec:parfilter}.
    515515</p>
    516 <p id="idp251840">
     516<p id="idp630400">
    517517Combined with the symbol stream, the parser traverses the content stream to effectively
    518518reconstructs the input document in its output form.
    519 The initial {\tt\it 0} indicates an empty content string. The following \verb|&gt;|
     519The initial <span class="ital">0</span> indicates an empty content string. The following \verb|&gt;|
    520520indicates that a start tag without any attributes is the first element in this text and
    521521the first unused symbol, <code class="code">document</code>, is the element name.
     
    525525accounts for 6.83% of Xerces's execution time.
    526526Additionally, it is cheap to locate the terminal character of each string:
    527 using the String End \bitstream{}, the \PS{} can effectively calculate the offset of each
     527using the String End bitstream, the Parabix Subsystem can effectively calculate the offset of each
    528528null character in the content stream in parallel, which in turn means the parser can
    529529directly jump to the end of every string without scanning for it.
    530530</p>
    531 <p id="idp253376">
     531<p id="idp632232">
    532532Following ``\verb`fee`'' is a \verb`=`, which marks the existence of an attribute.
    533 Because all of the intra-element was performed in the \PS{}, this must be a legal attribute.
     533Because all of the intra-element was performed in the Parabix Subsystem, this must be a legal attribute.
    534534Since attributes can only occur within start tags and must be accompanied by a textual value,
    535535the next symbol in the symbol stream must be the element name of a start tag,
     
    543543</p>
    544544</div>
    545 <div class="section" id="idp254672">
     545<div class="section" id="idp634584">
    546546<h3 class="title" style="clear: both">Namespace Handling</h3>
    547 <p id="idp255320">
     547<p id="idp635192">
    548548In XML, namespaces prevents naming conflicts when multiple vocabularies are used together.
    549549It is especially important when a vocabulary application-dependant meaning, such as when
     
    560560because the current vocabulary is determined by the namespace(s) that are in-scope.
    561561</p>
    562 <p id="idp257760">
    563 
    564 </p>
    565 <p id="idp463944">
    566 In both Xerces and \icXML{}, every URI has a one-to-one mapping to a URI ID.
     562<p id="idp637032">
     563
     564</p>
     565<p id="idp637288">
     566In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID.
    567567These persist for the lifetime of the application through the use of a global URI pool.
    568568Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs
     
    573573(2) those that repeatedly modify the namespaces in predictable patterns.
    574574</p>
    575 <p id="idp464136">
    576 For that reason, \icXML{} contains an independent namespace stack and utilizes bit vectors to cheaply perform
     575<p id="idp637480">
     576For that reason, icXML contains an independent namespace stack and utilizes bit vectors to cheaply perform
    577577
    578578
     
    587587A namespace binding table, similar to Table \ref{tbl:namespace1}, provides the actual URI ID.
    588588</p>
    589 <p id="idp466872">
    590 
    591 </p>
    592 <p id="idp467128">
    593 
    594 
    595 
    596 
    597 </p>
    598 <p id="idp468584">
     589<p id="idp640280">
     590
     591</p>
     592<p id="idp640536">
     593
     594
     595
     596
     597</p>
     598<p id="idp641968">
    599599To ensure that scoping rules are adhered to,
    600600whenever a start tag is encountered, any modification to the currently visible namespaces is calculated and stored
     
    605605</p>
    606606</div>
    607 <div class="section" id="idp469568">
     607<div class="section" id="idp642952">
    608608<h3 class="title" style="clear: both">Error Handling</h3>
    609 <p id="idp469912">
     609<p id="idp643296">
    610610
    611611Xerces outputs error messages in two ways: through the programmer API and as thrown objects for fatal errors.
    612612As Xerces parses a file, it uses context-dependant logic to assess whether the next character is legal;
    613613if not, the current state determines the type and severity of the error.
    614 \icXML{} emits errors in the similar manner—but how it discovers them is substantially different.
    615 Recall that in Figure \ref{fig:icxml-arch}, \icXML{} is divided into two sections: the \PS{} and \MP{},
     614icXML emits errors in the similar manner—but how it discovers them is substantially different.
     615Recall that in Figure \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and Markup Processor,
    616616each with its own system for detecting and producing error messages.
    617617</p>
    618 <p id="idp471056">
    619 Within the \PS{}, all computations are performed in parallel, a block at a time.
    620 Errors are derived as artifacts of \bitstream{} calculations, with a 1-bit marking the byte-position of an error within a block,
     618<p id="idp644416">
     619Within the Parabix Subsystem, all computations are performed in parallel, a block at a time.
     620Errors are derived as artifacts of bitstream calculations, with a 1-bit marking the byte-position of an error within a block,
    621621and the type of error is determined by the equation that discovered it.
    622622The difficulty of error processing in this section is that in Xerces the line and column number must be given
     
    628628Note that typical XML documents are error-free but the calculation of the
    629629line/column position is a constant overhead in Xerces.
    630 To reduce this, \icXML{} pushes the bulk cost of the line/column calculation to the occurrence of the error and
     630To reduce this, icXML pushes the bulk cost of the line/column calculation to the occurrence of the error and
    631631performs the minimal amount of book-keeping necessary to facilitate it.
    632 \icXML{} leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
     632icXML leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
    633633within the Line Column Tracker (LCT).
    634634One of the CSA's major responsibilities is transcoding an input text.
    635635During this process, white-space normalization rules are applied and multi-code-unit and surrogate characters are detected
    636636and validated.
    637 A {\it line-feed \bitstream{}}, which marks the positions of the normalized new lines characters, is a natural derivative of
     637A <span class="ital">line-feed bitstream</span>, which marks the positions of the normalized new lines characters, is a natural derivative of
    638638this process.
    639639Using an optimized population count algorithm, the line count can be summarized cheaply for each valid block of text.
    640640
    641641Column position is more difficult to calculate.
    642 It is possible to scan backwards through the \bitstream{} of new line characters to determine the distance (in code-units)
     642It is possible to scan backwards through the bitstream of new line characters to determine the distance (in code-units)
    643643between the position between which an error was detected and the last line feed. However, this distance may exceed
    644644than the actual character position for the reasons discussed in (2).
    645 To handle this, the CSA generates a {\it skip mask} \bitstream{} by ORing together many relevant \bitstream{}s,
     645To handle this, the CSA generates a <span class="ital">skip mask</span> bitstream by ORing together many relevant bitstreams,
    646646such as all trailing multi-code-unit and surrogate characters, and any characters that were removed during the
    647647normalization process.
     
    649649column number.
    650650</p>
    651 <p id="idp474368">
    652 The \MP{} is a state-driven machine. As such, error detection within it is very similar to Xerces.
     651<p id="idp648656">
     652The Markup Processor is a state-driven machine. As such, error detection within it is very similar to Xerces.
    653653However, reporting the correct line/column is a much more difficult problem.
    654 The \MP{} parses the content stream, which is a series of tagged UTF-16 strings.
     654The Markup Processor parses the content stream, which is a series of tagged UTF-16 strings.
    655655Each string is normalized in accordance with the XML specification.
    656656All symbol data and unnecessary whitespace is eliminated from the stream;
    657657thus its impossible to derive the current location using only the content stream.
    658 To calculate the location, the \MP{} borrows three additional pieces of information from the \PS{}:
    659 the line-feed, skip mask, and a {\it deletion mask stream}, which is a \bitstream{} denoting the (code-unit) position of every
     658To calculate the location, the Markup Processor borrows three additional pieces of information from the Parabix Subsystem:
     659the line-feed, skip mask, and a <span class="ital">deletion mask stream</span>, which is a bitstream denoting the (code-unit) position of every
    660660datum that was suppressed from the source during the production of the content stream.
    661661Armed with these, it is possible to calculate the actual line/column using
    662 the same system as the \PS{} until the sum of the negated deletion mask stream is equal to the current position.
    663 </p>
    664 </div>
    665 </div>
    666 <div class="section" id="idp476680">
     662the same system as the Parabix Subsystem until the sum of the negated deletion mask stream is equal to the current position.
     663</p>
     664</div>
     665</div>
     666<div class="section" id="idp650360">
    667667<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    668 <p id="idp477000">
     668<p id="idp650728">
    669669As discussed in section \ref{background:xerces}, Xerces can be considered a FSM application.
    670670These are ``embarrassingly sequential.''\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize.
    671 However, \icXML{} is designed to organize processing into logical layers.   
    672 In particular, layers within the \PS{} are designed to operate
     671However, icXML is designed to organize processing into logical layers.   
     672In particular, layers within the Parabix Subsystem are designed to operate
    673673over significant segments of input data before passing their outputs on for
    674674subsequent processing.  This fits well into the general model of pipeline
     
    676676of modules.
    677677</p>
    678 <p id="idp477784">
    679 The most straightforward division of work in \icXML{} is to separate
    680 the \PS{} and the \MP{} into distinct logical layers into two separate stages.
    681 The resultant application, {\it\icXMLp{}}, is a course-grained software-pipeline application.
    682 In this case, the \PS{} thread $T_1$ reads 16k of XML input $I$ at a time and produces the
     678<p id="idp651520">
     679The most straightforward division of work in icXML is to separate
     680the Parabix Subsystem and the Markup Processor into distinct logical layers into two separate stages.
     681The resultant application, <span class="ital">icXML-p</span>, is a course-grained software-pipeline application.
     682In this case, the Parabix Subsystem thread $T_1$ reads 16k of XML input $I$ at a time and produces the
    683683content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$.
    684 The \MP{} thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation,
     684The Markup Processor thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation,
    685685and the provides parsed XML data to the application through the Xerces API. 
    686686The shared data structure is implemented using a ring buffer,
     
    693693and must wait for $T_2$ to finish reading the shared data before it can reuse the memory space.
    694694</p>
    695 <p id="idp479272">
    696 
    697 </p>
    698 <p id="idp479528">
     695<p id="idp653392">
     696
     697</p>
     698<p id="idp653664">
    699699Overall, our design is intended to benefit a range of applications.
    700700Conceptually, we consider two design points.
    701 The first, the parsing performed by the \PS{} dominates at 67% of the overall cost,
    702 with the cost of application processing (including the driver logic within the \MP{}) at 33%.   
     701The first, the parsing performed by the Parabix Subsystem dominates at 67% of the overall cost,
     702with the cost of application processing (including the driver logic within the Markup Processor) at 33%.   
    703703The second is almost the opposite scenario, the cost of application processing dominates at 60%,
    704704while the cost of XML parsing represents an overhead of 40%.
    705705</p>
    706 <p id="idp479720">
     706<p id="idp653856">
    707707Our design is predicated on a goal of using the Parabix
    708708framework to achieve a 50% to 100% improvement in the parsing engine itself.   
    709709In a best case scenario,
    710 a 100% improvement of the \PS{} for the design point in which
     710a 100% improvement of the Parabix Subsystem for the design point in which
    711711XML parsing dominates at 67% of the total application cost.
    712 In this case, the single-threaded \icXML{} should achieve a 1.5x speedup over Xerces
     712In this case, the single-threaded icXML should achieve a 1.5x speedup over Xerces
    713713so that the total application cost reduces to 67% of the original. 
    714 However, in \icXMLp{}, our ideal scenario gives us two well-balanced threads
     714However, in \icXML-p, our ideal scenario gives us two well-balanced threads
    715715each performing about 33% of the original work.   
    716716In this case, Amdahl's law predicts that we could expect up to a 3x speedup at best.
    717717</p>
    718 <p id="idp481424">
     718<p id="idp655592">
    719719At the other extreme of our design range, we consider an application
    720720in which core parsing cost is 40%.   Assuming the 2x speedup of
    721 the \PS{} over the corresponding Xerces core, single-threaded
    722 \icXML{} delivers a 25% speedup.   However, the most significant
     721the Parabix Subsystem over the corresponding Xerces core, single-threaded
     722icXML delivers a 25% speedup.   However, the most significant
    723723aspect of our two-stage multi-threaded design then becomes the
    724724ability to hide the entire latency of parsing within the serial time
     
    726726an overall speedup in processing time by 1.67x.
    727727</p>
    728 <p id="idp482624">
    729 Although the structure of the \PS{} allows division of the work into
     728<p id="idp656304">
     729Although the structure of the Parabix Subsystem allows division of the work into
    730730several pipeline stages and has been demonstrated to be effective
    731731for four pipeline stages in a research prototype \cite{HPCA2012},
    732732our analysis here suggests that the further pipelining of work within
    733 the \PS{} is not worthwhile if the cost of application logic is little as
     733the Parabix Subsystem is not worthwhile if the cost of application logic is little as
    73473433% of the end-to-end cost using Xerces.  To achieve benefits of
    735735further parallelization with multi-core technology, there would
     
    738738</p>
    739739</div>
    740 <div class="section" id="idp483472">
     740<div class="section" id="idp657176">
    741741<h2 class="title" style="clear: both">Performance</h2>
    742 <p id="idp483808">
    743 We evaluate \xerces{}, \icXML{}, \icXMLp{} against two benchmarking applications:
     742<p id="idp657512">
     743We evaluate \xerces{}, icXML, \icXML-p against two benchmarking applications:
    744744the Xerces C++ SAXCount sample application,
    745745and a real world GML to SVG transformation application.
     
    7507508 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
    751751</p>
    752 <p id="idp484472">
     752<p id="idp658168">
    753753We analyzed the execution profiles of each XML parser
    754754using the performance counters found in the processor.
     
    764764to collect per core hardware events.
    765765</p>
    766 <div class="section" id="idp485352">
     766<div class="section" id="idp659048">
    767767<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    768 <p id="idp485696">
     768<p id="idp659392">
    769769Xerces comes with sample applications that demonstrate salient features of the parser.
    770770SAXCount is the simplest such application:
     
    772772and prints out the totals.
    773773</p>
    774 <p id="idp486160">
    775 
    776 </p>
    777 <p id="idp486432">
     774<p id="idp659856">
     775
     776</p>
     777<p id="idp660128">
    778778Table \ref{XMLDocChars} shows the document characteristics of the XML input
    779779files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
     
    782782XML documents and consist entirely of single byte encoded ASCII characters.
    783783</p>
    784 <p id="idp486624">
     784<p id="idp660320">
    785785A key predictor of the overall parsing performance of an XML file is markup density\footnote{
    786786  Markup Density: the ratio of markup bytes used to define the structure of the document vs. its file size.}.
     
    791791of markup densities.
    792792</p>
    793 <p id="idp488376">
    794 Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of
     793<p id="idp662072">
     794Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML in terms of
    795795CPU cycles per byte for the SAXCount application.
    796 The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
    797 With two threads on the multicore machine, \icXMLp{} can achieve speedup up to 2.7x.
     796The speedup for icXML over Xerces is 1.3x to 1.8x.
     797With two threads on the multicore machine, \icXML-p can achieve speedup up to 2.7x.
    798798Xerces is substantially slowed by dense markup
    799 but \icXML{} is less affected through a reduction in branches and the use of parallel-processing techniques.
    800 \icXMLp{} performs better as markup-density increases because the work performed by each stage is
     799but icXML is less affected through a reduction in branches and the use of parallel-processing techniques.
     800\icXML-p performs better as markup-density increases because the work performed by each stage is
    801801well balanced in this application.
    802802</p>
    803 <p id="idp489744">
    804 
    805 </p>
    806 </div>
    807 <div class="section" id="idp490336">
     803<p id="idp662840">
     804
     805</p>
     806</div>
     807<div class="section" id="idp663432">
    808808<h3 class="title" style="clear: both">GML2SVG</h3>
    809 <p id="idp490656"></p>
    810 </div>
    811 </div>
    812 <div class="section" id="idp490912">
     809<p id="idp663752"></p>
     810</div>
     811</div>
     812<div class="section" id="idp664008">
    813813<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    814 <p id="idp491264">
     814<p id="idp664360">
    815815This paper is the first case study documenting the significant
    816816performance benefits that may be realized through the integration
    817 of parallel \bitstream{} technology into existing widely-used software libraries.
     817of parallel bitstream technology into existing widely-used software libraries.
    818818In the case of the Xerces-C++ XML parser, the
    819819combined integration of SIMD and multicore parallelism was
    820820shown capable of dramatic producing dramatic increases in
    821821throughput and reductions in branch mispredictions and cache misses.
    822 The modified parser, going under the name \icXML{} is designed
     822The modified parser, going under the name icXML is designed
    823823to provide the full functionality of the original Xerces library
    824824with complete compatibility of APIs.  Although substantial
     
    828828feasibility of these techniques.
    829829</p>
    830 <p id="idp492280">
    831 The further development of \icXML{} to move beyond 2-stage
     830<p id="idp665368">
     831The further development of icXML to move beyond 2-stage
    832832pipeline parallelism is ongoing, with realistic prospects for
    833833four reasonably balanced stages within the library.  For
     
    836836library should offer substantial benefits. 
    837837</p>
    838 <p id="idp492824">
     838<p id="idp665904">
    839839The example of XML parsing may be considered prototypical
    840840of finite-state machines applications which have sometimes
     
    845845indeed be helpful across a broad array of application types.
    846846</p>
    847 <p id="idp493424">
     847<p id="idp666504">
    848848To overcome the software engineering challenges in applying
    849 parallel \bitstream{} technology to existing software systems,
     849parallel bitstream technology to existing software systems,
    850850it is clear that better library and tool support is needed.
    851 The techniques used in the implementation of \icXML{} and
     851The techniques used in the implementation of icXML and
    852852documented in this paper could well be generalized for
    853853applications in other contexts and automated through
    854854the creation of compiler technology specifically supporting
    855 parallel \bitstream{} programming.
    856 </p>
    857 </div>
    858 <div class="bibliography" id="idp494392">
     855parallel bitstream programming.
     856</p>
     857</div>
     858<div class="bibliography" id="idp667464">
    859859<h2 class="title" style="clear:both">Bibliography</h2>
    860860<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3039 r3040  
    215215To facilitate this, the input data is first transposed into a set of basis bit streams.
    216216In <!--FIGURE REF Figure~\ref{fig:BitStreamsExample}, the ASCII string ``{\ttfamily b7\verb|<|A}''
    217 is represented as 8 basis bit streams, $\tt b_{0 \ldots 7}$.
    218 -->
    219 <!-- The bits used to construct $\tt b_7$ have been highlighted in this example. -->
     217is represented as 8 basis bit streams, $\tt b<subscript>{0 \ldots 7}$.
     218-->
     219<!-- The bits used to construct $\tt <subscript>7</subscript>$ have been highlighted in this example. -->
    220220Boolean-logic operations\footnote{&#8743;, \&#8744; and &#172; denote the boolean AND, OR and NOT operators.}
    221 are used to classify the input bits into a set of {\it character-class bit streams}, which identify key
     221are used to classify the input bits into a set of <emphasis role="ital">character-class bit streams</emphasis>, which identify key
    222222characters (or groups of characters) with a $1$.
    223223For example, one of the fundamental characters in XML is a left-angle bracket.
    224 A character is a<code>&lt; if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land
    225 b_3) \land (b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1</code>.
     224A character is a<code>&lt; if and only if &#172;(b<subscript>0</subscript> &#8744; b<subscript>1</subscript>) &#8743; (b<subscript>2</subscript> &#8743;
     225b<subscript>3</subscript>) &#8743; (b<subscript>4</subscript> &#8743; b<subscript>5</subscript>) &#8743; &#172; (b<subscript>6</subscript> &#8744; b<subscript>7</subscript>) = 1</code>.
    226226Similarly, a character is numeric
    227 <code>[0-9] if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))</code>.
     227<code>[0-9] if and only if $&#172;(b<subscript>0</subscript> &#8744; b<subscript>1</subscript>) &#8743; (b<subscript>2</subscript> &#8743; b<subscript>3</subscript>) &#8743; &#172;(b<subscript>4</subscript> &#8743; (b<subscript>5</subscript> &#8744; b<subscript>6</subscript>))</code>.
    228228An important observation here is that ranges of characters may
    229229require fewer operations than individual characters and
     
    243243\begin{center}
    244244\begin{tabular}{r |c |c |c |c |c |c |c |c |}
    245  & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{0}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{1}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{2}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{3}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{4}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{5}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{6}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{7}$}$ \\
     245 & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{0}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{1}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{2}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{3}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{4}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{5}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{6}</subscript>$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b<subscript>{7}</subscript>$}$ \\
    246246 & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \bfseries\ttfamily{0} \\
    247247 & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \bfseries\ttfamily{1} \\
     
    274274lines show streams that can be computed in subsequent
    275275parsing (using the technique
    276 of \bitstream{} addition \cite{cameron-EuroPar2011}), namely streams marking the element names,
     276of bitstream addition \cite{cameron-EuroPar2011}), namely streams marking the element names,
    277277attribute names and attribute values of tags. 
    278278</para>
     
    299299sequential scanning loops for individual characters \cite{CameronHerdyLin2008}.
    300300Recent work has incorporated a method of parallel
    301 scanning using \bitstream{} addition \cite{cameron-EuroPar2011}, as
     301scanning using bitstream addition \cite{cameron-EuroPar2011}, as
    302302well as combining SIMD methods with 4-stage pipeline parallelism to further improve
    303303throughput \cite{HPCA2012}.
     
    331331<para>
    332332Parabix-style XML parsers utilize a concept of layered processing.
    333 A block of source text is transformed into a set of lexical \bitstream{}s,
     333A block of source text is transformed into a set of lexical bitstreams,
    334334which undergo a series of operations that can be grouped into logical layers,
    335335e.g., transposition, character classification, and lexical analysis.
     
    346346<!--\def \CSG{Content Stream Generator}-->
    347347<para>
    348 \icXML{} is more than an optimized version of Xerces. Many components were grouped, restructured and
     348icXML is more than an optimized version of Xerces. Many components were grouped, restructured and
    349349rearchitected with pipeline parallelism in mind.
    350350In this section, we highlight the core differences between the two systems.
    351351As shown in Figure \ref{fig:xerces-arch}, Xerces
    352352is comprised of five main modules: the transcoder, reader, scanner, namespace binder, and validator.
    353 The {\it Transcoder} converts source data into UTF-16 before Xerces parses it as XML;
     353The <emphasis role="ital">Transcoder</emphasis> converts source data into UTF-16 before Xerces parses it as XML;
    354354the majority of the character set encoding validation is performed as a byproduct of this process.
    355 The {\it Reader} is responsible for the streaming and buffering of all raw and transcoded (UTF-16) text.
     355The <emphasis role="ital">Reader</emphasis> is responsible for the streaming and buffering of all raw and transcoded (UTF-16) text.
    356356It tracks the current line/column position,
    357357<!--(which is reported in the unlikely event that the input contains an error), -->
    358358performs line-break normalization and validates context-specific character set issues,
    359359such as tokenization of qualified-names.
    360 The {\it Scanner} pulls data through the reader and constructs the intermediate representation (IR)
     360The <emphasis role="ital">Scanner</emphasis> pulls data through the reader and constructs the intermediate representation (IR)
    361361of the document; it deals with all issues related to entity expansion, validates
    362362the XML well-formedness constraints and any character set encoding issues that cannot
    363363be completely handled by the reader or transcoder (e.g., surrogate characters, validation
    364364and normalization of character references, etc.)
    365 The {\it Namespace Binder} is a core piece of the element stack.
     365The <emphasis role="ital">Namespace Binder</emphasis> is a core piece of the element stack.
    366366It handles namespace scoping issues between different XML vocabularies.
    367367This allows the scanner to properly select the correct schema grammar structures.
    368 The {\it Validator} takes the IR produced by the Scanner (and
     368The <emphasis role="ital">Validator</emphasis> takes the IR produced by the Scanner (and
    369369potentially annotated by the Namespace Binder) and assesses whether the final output matches
    370370the user-defined DTD and schema grammar(s) before passing it to the end-user.
     
    382382</para>
    383383<para>
    384 In \icXML{} functions are grouped into logical components.
    385 As shown in Figure \ref{fig:icxml-arch}, two major categories exist: (1) the \PS{} and (2) the \MP{}.
    386 All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which represents data as a set of parallel \bitstream{}s.
    387 The {\it Character Set Adapter}, discussed in Section \ref{arch:character-set-adapter},
     384In icXML functions are grouped into logical components.
     385As shown in Figure \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the Markup Processor.
     386All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which represents data as a set of parallel bitstreams.
     387The <emphasis role="ital">Character Set Adapter</emphasis>, discussed in Section \ref{arch:character-set-adapter},
    388388mirrors Xerces's Transcoder duties; however instead of producing UTF-16 it produces a
    389 set of lexical \bitstream{}s, similar to those shown in Figure \ref{fig:parabix1}.
    390 These lexical \bitstream{}s are later transformed into UTF-16 in the \CSG{},
     389set of lexical bitstreams, similar to those shown in Figure \ref{fig:parabix1}.
     390These lexical bitstreams are later transformed into UTF-16 in the Content Stream Generator,
    391391after additional processing is performed.
    392 The first precursor to producing UTF-16 is the {\it Parallel Markup Parser} phase.
    393 It takes the lexical streams and produces a set of marker \bitstream{}s in which a 1-bit identifies
    394 significant positions within the input data. One \bitstream{} for each of the critical piece of information is created, such as
     392The first precursor to producing UTF-16 is the <emphasis role="ital">Parallel Markup Parser</emphasis> phase.
     393It takes the lexical streams and produces a set of marker bitstreams in which a 1-bit identifies
     394significant positions within the input data. One bitstream for each of the critical piece of information is created, such as
    395395the beginning and ending of start tags, end tags, element names, attribute names, attribute values and content.
    396396Intra-element well-formedness validation is performed as an artifact of this process.
    397 Like Xerces, \icXML{} must provide the Line and Column position of each error.
    398 The {\it Line-Column Tracker} uses the lexical information to keep track of the document position(s) through the use of an
     397Like Xerces, icXML must provide the Line and Column position of each error.
     398The <emphasis role="ital">Line-Column Tracker</emphasis> uses the lexical information to keep track of the document position(s) through the use of an
    399399optimized population count algorithm, described in Section \ref{section:arch:errorhandling}.
    400400From here, two data-independent branches exist: the Symbol Resolver and Content Preparation Unit.
     
    402402<para>
    403403A typical XML file contains few unique element and attribute names&#8212;but each of them will occur frequently.
    404 \icXML{} stores these as distinct data structures, called symbols, each with their own global identifier (GID).
    405 Using the symbol marker streams produced by the Parallel Markup Parser, the {\it Symbol Resolver} scans through
    406 the raw data to produce a sequence of GIDs, called the {\it symbol stream}.
    407 </para>
    408 <para>
    409 The final components of the \PS{} are the {\it Content Preparation Unit} and {\it \CSG{}}.
    410 The former takes the (transposed) basis \bitstream{}s and selectively filters them, according to the
     404icXML stores these as distinct data structures, called symbols, each with their own global identifier (GID).
     405Using the symbol marker streams produced by the Parallel Markup Parser, the <emphasis role="ital">Symbol Resolver</emphasis> scans through
     406the raw data to produce a sequence of GIDs, called the <emphasis role="ital">symbol stream</emphasis>.
     407</para>
     408<para>
     409The final components of the Parabix Subsystem are the <emphasis role="ital">Content Preparation Unit</emphasis> and <emphasis role="ital">Content Stream Generator</emphasis>.
     410The former takes the (transposed) basis bitstreams and selectively filters them, according to the
    411411information provided by the Parallel Markup Parser, and the latter transforms the
    412 filtered streams into the tagged UTF-16 {\it content stream}, discussed in Section \ref{section:arch:contentstream}.
    413 </para>
    414 <para>
    415 Combined, the symbol and content stream form \icXML{}'s compressed IR of the XML document.
    416 The {\it \MP{}}~parses the IR to validate and produce the sequential output for the end user.
    417 The {\it Final WF checker} performs inter-element well-formedness validation that would be too costly
     412filtered streams into the tagged UTF-16 <emphasis role="ital">content stream</emphasis>, discussed in Section \ref{section:arch:contentstream}.
     413</para>
     414<para>
     415Combined, the symbol and content stream form icXML's compressed IR of the XML document.
     416The <emphasis role="ital">Markup Processor</emphasis>~parses the IR to validate and produce the sequential output for the end user.
     417The <emphasis role="ital">Final WF checker</emphasis> performs inter-element well-formedness validation that would be too costly
    418418to perform in bit space, such as ensuring every start tag has a matching end tag.
    419 Xerces's namespace binding functionality is replaced by the {\it Namespace Processor}. Unlike Xerces,
    420 it is a discrete phase that produces a series of URI identifiers (URI IDs), the {\it URI stream}, which are
     419Xerces's namespace binding functionality is replaced by the <emphasis role="ital">Namespace Processor</emphasis>. Unlike Xerces,
     420it is a discrete phase that produces a series of URI identifiers (URI IDs), the <emphasis role="ital">URI stream</emphasis>, which are
    421421associated with each symbol occurrence.
    422422This is discussed in Section \ref{section:arch:namespacehandling}.
    423 Finally, the {\it Validation} layer implements the Xerces's validator.
     423Finally, the <emphasis role="ital">Validation</emphasis> layer implements the Xerces's validator.
    424424However, preprocessing associated with each symbol greatly reduces the work of this stage.
    425425</para>
     
    430430\includegraphics[height=0.6\textheight,width=0.5\textwidth]{plots/icxml.pdf}
    431431\end{center}
    432 \caption{\icXML{} Architecture}
     432\caption{icXML Architecture}
    433433\label{fig:icxml-arch}
    434434\end{figure}
     
    448448</para>
    449449<para>
    450 In \icXML{}, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
     450In icXML, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
    451451Given a specified input encoding, a CSA is responsible for checking that
    452452input code units represent valid characters, mapping the characters of the encoding into
    453 the appropriate \bitstream{}s for XML parsing actions (i.e., producing the lexical item
     453the appropriate bitstreams for XML parsing actions (i.e., producing the lexical item
    454454streams), as well as supporting ultimate transcoding requirements.   All of this work
    455 is performed using the parallel \bitstream{} representation of the source input.
     455is performed using the parallel bitstream representation of the source input.
    456456</para>
    457457<para>
     
    466466A second observation is that&#8212;regardless of which character set is used&#8212;quite
    467467often all of the characters in a particular block of input will be within the ASCII range.
    468 This is a very simple test to perform using the \bitstream{} representation, simply confirming that the
     468This is a very simple test to perform using the bitstream representation, simply confirming that the
    469469bit 0 stream is zero for the entire block.   For blocks satisfying this test,
    470470all logic dealing with non-ASCII characters can simply be skipped.
    471 Transcoding to UTF-16 becomes trivial as the high eight \bitstream{}s of the
     471Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    472472UTF-16 form are each set to zero in this case.
    473473</para>
     
    486486The cost of individual character transcoding is avoided whenever a block of input is
    487487confined to the ASCII subset and for all but the first occurrence of any XML element or attribute name.
    488 Furthermore, when transcoding is required, the parallel \bitstream{} representation
     488Furthermore, when transcoding is required, the parallel bitstream representation
    489489supports efficient transcoding operations.   
    490 In the important case of UTF-8 to UTF-16 transcoding, the corresponding UTF-16 \bitstream{}s
     490In the important case of UTF-8 to UTF-16 transcoding, the corresponding UTF-16 bitstreams
    491491can be calculated in bit parallel fashion based on UTF-8 streams \cite{Cameron2008},
    492492and all but the final bytes of multi-byte sequences can be marked for deletion as
     
    518518\verb'110110'.  Using this approach, transcoding may then be
    519519completed by applying parallel deletion and inverse transposition of the
    520 UTF-16 \bitstream{}s\cite{Cameron2008}.
     520UTF-16 bitstreams\cite{Cameron2008}.
    521521</para>
    522522<para>
     
    538538Markup Identifiers & \verb`_________1______________1_________1______1_1____________1_________`\\
    539539Deletion Mask & \verb`_11111111_____1111111111_1____1111_11_______11111111_____111111111`\\
    540 Undeleted Data & \verb``{\tt\it 0}\verb`________>fee`{\tt\it 0}\verb`__________=_fie`{\tt\it 0}\verb`____=__foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/________fum`{\tt\it 0}\verb`/_________`
     540Undeleted Data & \verb``<emphasis role="ital">0</emphasis>\verb`________>fee`<emphasis role="ital">0</emphasis>\verb`__________=_fie`<emphasis role="ital">0</emphasis>\verb`____=__foe`{\tt<emphasis role="ital">0</emphasis>\verb`>`<emphasis role="ital">0</emphasis>\verb`/________fum`<emphasis role="ital">0</emphasis>\verb`/_________`
    541541\end{tabular}
    542542\end{center}
     
    549549Rather than immediately paying the
    550550costs of deletion and transposition just for transcoding,
    551 however, \icXML{} defers these steps so that the deletion
     551however, icXML defers these steps so that the deletion
    552552masks for several stages of processing may be combined.
    553553In particular, this includes core XML requirements
     
    558558returns (CR), line feeds (LF) and CR-LF combinations
    559559must be normalized to a single LF character in
    560 each case.   In \icXML{}, this is achieved by
     560each case.   In icXML, this is achieved by
    561561first marking CR positions, performing two
    562562bit parallel operations to transform the marked
     
    596596which must be replaced in XML processing with 
    597597the single <code>&amp;</code> and <code>&lt;</code> characters, respectively.
    598 The approach in \icXML{} is to mark all but the first character
     598The approach in icXML is to mark all but the first character
    599599positions of each reference for deletion, leaving a
    600600single character position unmodified.  Thus, for the
     
    613613the process of reducing markup data to tag bytes
    614614preceding each significant XML transition as described
    615 in section~\ref{section:arch:contentstream}.  Overall, \icXML{}
     615in section~\ref{section:arch:contentstream}.  Overall, icXML
    616616avoids separate buffer copying operations for each of the
    617617these filtering steps, paying the cost of parallel
    618618deletion and inverse transposition only once. 
    619 Currently, \icXML{} employs the parallel-prefix compress algorithm
     619Currently, icXML employs the parallel-prefix compress algorithm
    620620of Steele~\cite{HackersDelight}  Performance
    621621is independent of the number of positions deleted.
    622 Future versions of \icXML{} are expected to
     622Future versions of icXML are expected to
    623623take advantage of the parallel extract operation~\cite{HilewitzLee2006}
    624624that Intel is now providing in its Haswell architecture.
     
    628628                       <title>Content Stream</title>
    629629<para>
    630 A relatively-unique concept for \icXML{} is the use of a filtered content stream.
     630A relatively-unique concept for icXML is the use of a filtered content stream.
    631631Rather that parsing an XML document in its original format, the input is transformed
    632632into one that is easier for the parser to iterate through and produce the sequential
     
    636636is transformed into
    637637<!-- CODE -->
    638 <!--``{\tt\it 0}\verb`>fee`{\tt\it 0}\verb`=fie`{\tt\it 0}\verb`=foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/fum`{\tt\it 0}\verb`/`''-->
     638<!--``<emphasis role="ital">0</emphasis>\verb`>fee`<emphasis role="ital">0</emphasis>\verb`=fie`<emphasis role="ital">0</emphasis>\verb`=foe`<emphasis role="ital">0</emphasis>\verb`>`<emphasis role="ital">0</emphasis>\verb`/fum`<emphasis role="ital">0</emphasis>\verb`/`''-->
    639639through the parallel filtering algorithm, described in section \ref{sec:parfilter}.
    640640</para>
     
    642642Combined with the symbol stream, the parser traverses the content stream to effectively
    643643reconstructs the input document in its output form.
    644 The initial {\tt\it 0} indicates an empty content string. The following \verb|>|
     644The initial <emphasis role="ital">0</emphasis> indicates an empty content string. The following \verb|>|
    645645indicates that a start tag without any attributes is the first element in this text and
    646646the first unused symbol, <code>document</code>, is the element name.
     
    650650accounts for 6.83% of Xerces's execution time.
    651651Additionally, it is cheap to locate the terminal character of each string:
    652 using the String End \bitstream{}, the \PS{} can effectively calculate the offset of each
     652using the String End bitstream, the Parabix Subsystem can effectively calculate the offset of each
    653653null character in the content stream in parallel, which in turn means the parser can
    654654directly jump to the end of every string without scanning for it.
     
    656656<para>
    657657Following ``\verb`fee`'' is a \verb`=`, which marks the existence of an attribute.
    658 Because all of the intra-element was performed in the \PS{}, this must be a legal attribute.
     658Because all of the intra-element was performed in the Parabix Subsystem, this must be a legal attribute.
    659659Since attributes can only occur within start tags and must be accompanied by a textual value,
    660660the next symbol in the symbol stream must be the element name of a start tag,
     
    703703</para>
    704704<para>
    705 In both Xerces and \icXML{}, every URI has a one-to-one mapping to a URI ID.
     705In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID.
    706706These persist for the lifetime of the application through the use of a global URI pool.
    707707Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs
     
    713713</para>
    714714<para>
    715 For that reason, \icXML{} contains an independent namespace stack and utilizes bit vectors to cheaply perform
     715For that reason, icXML contains an independent namespace stack and utilizes bit vectors to cheaply perform
    716716<!-- speculation and scope resolution options with a single XOR operation &#8212; even if many alterations are performed. -->
    717717<!-- performance advantage figure?? average cycles/byte cost? -->
     
    764764As Xerces parses a file, it uses context-dependant logic to assess whether the next character is legal;
    765765if not, the current state determines the type and severity of the error.
    766 \icXML{} emits errors in the similar manner&#8212;but how it discovers them is substantially different.
    767 Recall that in Figure \ref{fig:icxml-arch}, \icXML{} is divided into two sections: the \PS{} and \MP{},
     766icXML emits errors in the similar manner&#8212;but how it discovers them is substantially different.
     767Recall that in Figure \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and Markup Processor,
    768768each with its own system for detecting and producing error messages.
    769769</para>
    770770<para>
    771 Within the \PS{}, all computations are performed in parallel, a block at a time.
    772 Errors are derived as artifacts of \bitstream{} calculations, with a 1-bit marking the byte-position of an error within a block,
     771Within the Parabix Subsystem, all computations are performed in parallel, a block at a time.
     772Errors are derived as artifacts of bitstream calculations, with a 1-bit marking the byte-position of an error within a block,
    773773and the type of error is determined by the equation that discovered it.
    774774The difficulty of error processing in this section is that in Xerces the line and column number must be given
     
    780780Note that typical XML documents are error-free but the calculation of the
    781781line/column position is a constant overhead in Xerces. <!-- that must be maintained in the case that one occurs. -->
    782 To reduce this, \icXML{} pushes the bulk cost of the line/column calculation to the occurrence of the error and
     782To reduce this, icXML pushes the bulk cost of the line/column calculation to the occurrence of the error and
    783783performs the minimal amount of book-keeping necessary to facilitate it.
    784 \icXML{} leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
     784icXML leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
    785785within the Line Column Tracker (LCT).
    786786One of the CSA's major responsibilities is transcoding an input text. <!-- from some encoding format to near-output-ready UTF-16. -->
    787787During this process, white-space normalization rules are applied and multi-code-unit and surrogate characters are detected
    788788and validated.
    789 A {\it line-feed \bitstream{}}, which marks the positions of the normalized new lines characters, is a natural derivative of
     789A <emphasis role="ital">line-feed bitstream</emphasis>, which marks the positions of the normalized new lines characters, is a natural derivative of
    790790this process.
    791791Using an optimized population count algorithm, the line count can be summarized cheaply for each valid block of text.
    792792<!-- The optimization delays the counting process .... -->
    793793Column position is more difficult to calculate.
    794 It is possible to scan backwards through the \bitstream{} of new line characters to determine the distance (in code-units)
     794It is possible to scan backwards through the bitstream of new line characters to determine the distance (in code-units)
    795795between the position between which an error was detected and the last line feed. However, this distance may exceed
    796796than the actual character position for the reasons discussed in (2).
    797 To handle this, the CSA generates a {\it skip mask} \bitstream{} by ORing together many relevant \bitstream{}s,
     797To handle this, the CSA generates a <emphasis role="ital">skip mask</emphasis> bitstream by ORing together many relevant bitstreams,
    798798such as all trailing multi-code-unit and surrogate characters, and any characters that were removed during the
    799799normalization process.
     
    802802</para>
    803803<para>
    804 The \MP{} is a state-driven machine. As such, error detection within it is very similar to Xerces.
     804The Markup Processor is a state-driven machine. As such, error detection within it is very similar to Xerces.
    805805However, reporting the correct line/column is a much more difficult problem.
    806 The \MP{} parses the content stream, which is a series of tagged UTF-16 strings.
     806The Markup Processor parses the content stream, which is a series of tagged UTF-16 strings.
    807807Each string is normalized in accordance with the XML specification.
    808808All symbol data and unnecessary whitespace is eliminated from the stream;
    809809thus its impossible to derive the current location using only the content stream.
    810 To calculate the location, the \MP{} borrows three additional pieces of information from the \PS{}:
    811 the line-feed, skip mask, and a {\it deletion mask stream}, which is a \bitstream{} denoting the (code-unit) position of every
     810To calculate the location, the Markup Processor borrows three additional pieces of information from the Parabix Subsystem:
     811the line-feed, skip mask, and a <emphasis role="ital">deletion mask stream</emphasis>, which is a bitstream denoting the (code-unit) position of every
    812812datum that was suppressed from the source during the production of the content stream.
    813813Armed with these, it is possible to calculate the actual line/column using
    814 the same system as the \PS{} until the sum of the negated deletion mask stream is equal to the current position.
     814the same system as the Parabix Subsystem until the sum of the negated deletion mask stream is equal to the current position.
    815815</para>
    816816            </section>
     
    822822As discussed in section \ref{background:xerces}, Xerces can be considered a FSM application.
    823823These are ``embarrassingly sequential.''\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize.
    824 However, \icXML{} is designed to organize processing into logical layers.   
    825 In particular, layers within the \PS{} are designed to operate
     824However, icXML is designed to organize processing into logical layers.   
     825In particular, layers within the Parabix Subsystem are designed to operate
    826826over significant segments of input data before passing their outputs on for
    827827subsequent processing.  This fits well into the general model of pipeline
     
    830830</para>
    831831<para>
    832 The most straightforward division of work in \icXML{} is to separate
    833 the \PS{} and the \MP{} into distinct logical layers into two separate stages.
    834 The resultant application, {\it\icXMLp{}}, is a course-grained software-pipeline application.
    835 In this case, the \PS{} thread $T_1$ reads 16k of XML input $I$ at a time and produces the
     832The most straightforward division of work in icXML is to separate
     833the Parabix Subsystem and the Markup Processor into distinct logical layers into two separate stages.
     834The resultant application, <emphasis role="ital">icXML-p</emphasis>, is a course-grained software-pipeline application.
     835In this case, the Parabix Subsystem thread $T_1$ reads 16k of XML input $I$ at a time and produces the
    836836content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$.
    837 The \MP{} thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation,
     837The Markup Processor thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation,
    838838and the provides parsed XML data to the application through the Xerces API. 
    839839The shared data structure is implemented using a ring buffer,
     
    866866Overall, our design is intended to benefit a range of applications.
    867867Conceptually, we consider two design points.
    868 The first, the parsing performed by the \PS{} dominates at 67% of the overall cost,
    869 with the cost of application processing (including the driver logic within the \MP{}) at 33%.   
     868The first, the parsing performed by the Parabix Subsystem dominates at 67% of the overall cost,
     869with the cost of application processing (including the driver logic within the Markup Processor) at 33%.   
    870870The second is almost the opposite scenario, the cost of application processing dominates at 60%,
    871871while the cost of XML parsing represents an overhead of 40%.
     
    875875framework to achieve a 50% to 100% improvement in the parsing engine itself.   
    876876In a best case scenario,
    877 a 100% improvement of the \PS{} for the design point in which
     877a 100% improvement of the Parabix Subsystem for the design point in which
    878878XML parsing dominates at 67% of the total application cost.
    879 In this case, the single-threaded \icXML{} should achieve a 1.5x speedup over Xerces
     879In this case, the single-threaded icXML should achieve a 1.5x speedup over Xerces
    880880so that the total application cost reduces to 67% of the original. 
    881 However, in \icXMLp{}, our ideal scenario gives us two well-balanced threads
     881However, in \icXML-p, our ideal scenario gives us two well-balanced threads
    882882each performing about 33% of the original work.   
    883883In this case, Amdahl's law predicts that we could expect up to a 3x speedup at best.
     
    886886At the other extreme of our design range, we consider an application
    887887in which core parsing cost is 40%.   Assuming the 2x speedup of
    888 the \PS{} over the corresponding Xerces core, single-threaded
    889 \icXML{} delivers a 25% speedup.   However, the most significant
     888the Parabix Subsystem over the corresponding Xerces core, single-threaded
     889icXML delivers a 25% speedup.   However, the most significant
    890890aspect of our two-stage multi-threaded design then becomes the
    891891ability to hide the entire latency of parsing within the serial time
     
    894894</para>
    895895<para>
    896 Although the structure of the \PS{} allows division of the work into
     896Although the structure of the Parabix Subsystem allows division of the work into
    897897several pipeline stages and has been demonstrated to be effective
    898898for four pipeline stages in a research prototype \cite{HPCA2012},
    899899our analysis here suggests that the further pipelining of work within
    900 the \PS{} is not worthwhile if the cost of application logic is little as
     900the Parabix Subsystem is not worthwhile if the cost of application logic is little as
    90190133% of the end-to-end cost using Xerces.  To achieve benefits of
    902902further parallelization with multi-core technology, there would
     
    909909                        <title>Performance</title>             
    910910<para>
    911 We evaluate \xerces{}, \icXML{}, \icXMLp{} against two benchmarking applications:
     911We evaluate \xerces{}, icXML, \icXML-p against two benchmarking applications:
    912912the Xerces C++ SAXCount sample application,
    913913and a real world GML to SVG transformation application.
     
    978978</para>
    979979<para>
    980 Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of
     980Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML in terms of
    981981CPU cycles per byte for the SAXCount application.
    982 The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
    983 With two threads on the multicore machine, \icXMLp{} can achieve speedup up to 2.7x.
     982The speedup for icXML over Xerces is 1.3x to 1.8x.
     983With two threads on the multicore machine, \icXML-p can achieve speedup up to 2.7x.
    984984Xerces is substantially slowed by dense markup
    985 but \icXML{} is less affected through a reduction in branches and the use of parallel-processing techniques.
    986 \icXMLp{} performs better as markup-density increases because the work performed by each stage is
     985but icXML is less affected through a reduction in branches and the use of parallel-processing techniques.
     986\icXML-p performs better as markup-density increases because the work performed by each stage is
    987987well balanced in this application.
    988988</para>
     
    10081008This paper is the first case study documenting the significant
    10091009performance benefits that may be realized through the integration
    1010 of parallel \bitstream{} technology into existing widely-used software libraries.
     1010of parallel bitstream technology into existing widely-used software libraries.
    10111011In the case of the Xerces-C++ XML parser, the
    10121012combined integration of SIMD and multicore parallelism was
    10131013shown capable of dramatic producing dramatic increases in
    10141014throughput and reductions in branch mispredictions and cache misses.
    1015 The modified parser, going under the name \icXML{} is designed
     1015The modified parser, going under the name icXML is designed
    10161016to provide the full functionality of the original Xerces library
    10171017with complete compatibility of APIs.  Although substantial
     
    10221022</para>
    10231023<para>
    1024 The further development of \icXML{} to move beyond 2-stage
     1024The further development of icXML to move beyond 2-stage
    10251025pipeline parallelism is ongoing, with realistic prospects for
    10261026four reasonably balanced stages within the library.  For
     
    10401040<para>
    10411041To overcome the software engineering challenges in applying
    1042 parallel \bitstream{} technology to existing software systems,
     1042parallel bitstream technology to existing software systems,
    10431043it is clear that better library and tool support is needed.
    1044 The techniques used in the implementation of \icXML{} and
     1044The techniques used in the implementation of icXML and
    10451045documented in this paper could well be generalized for
    10461046applications in other contexts and automated through
    10471047the creation of compiler technology specifically supporting
    1048 parallel \bitstream{} programming.
     1048parallel bitstream programming.
    10491049</para>
    10501050                </section>
Note: See TracChangeset for help on using the changeset viewer.