Apr 19, 2013, 12:39:07 AM (6 years ago)

Replaced verbatim with code tags, replaced math escape with appropriate tags, added unicode escapes, added personal blurbs.

1 edited


  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3040 r3041  
    1212<div lang="en" class="article">
    1313<div class="titlepage">
    14 <h2 class="article-title" id="idp270784"></h2>
     14<h2 class="article-title" id="idp26800"></h2>
    1515<div class="author">
    1616<h3 class="author">Nigel Medforth</h3>
    1717<div class="affiliation">
    18 <p class="jobtitle"></p>
    19 <p class="orgname"></p>
    20 </div>
    21 <h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:"></a>&gt;</code></h5>
     18<p class="jobtitle">Developer</p>
     19<p class="orgname">International Characters Inc.</p>
     21<div class="affiliation">
     22<p class="jobtitle">Graduate Student, School of Computing Science</p>
     23<p class="orgname">Simon Fraser University </p>
     25<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>&gt;</code></h5>
    2327<div class="author">
    4347<p class="orgname">Simon Fraser University</p>
     49<div class="affiliation">
     50<p class="jobtitle">Chief Technology Officer</p>
     51<p class="orgname">International Characters, Inc.</p>
    4553<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
    5563<div class="abstract">
    5664<p class="title"><b>Abstract</b></p>
    57 <p id="idp438976">Prior research on the acceleration of XML processing
     65<p id="idp28432">Prior research on the acceleration of XML processing
    5866using SIMD and multi-core parallelism has lead to
    5967a number of interesting research prototypes.  This work
    7684<p><b>Table of Contents</b></p>
    78 <dt><span class="section"><a href="#idp426544" class="toc">Introduction</a></span></dt>
    79 <dt><span class="section"><a href="#idp427464" class="toc">Background</a></span></dt>
     86<dt><span class="section"><a href="#idp15032" class="toc">Introduction</a></span></dt>
     87<dt><span class="section"><a href="#idp15928" class="toc">Background</a></span></dt>
    81 <dt><span class="section"><a href="#idp427800" class="toc">Xerces C++ Structure</a></span></dt>
    82 <dt><span class="section"><a href="#idp432856" class="toc">The Parabix Framework</a></span></dt>
    83 <dt><span class="section"><a href="#idp584560" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     89<dt><span class="section"><a href="#idp16248" class="toc">Xerces C++ Structure</a></span></dt>
     90<dt><span class="section"><a href="#idp22392" class="toc">The Parabix Framework</a></span></dt>
     91<dt><span class="section"><a href="#idp178296" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    85 <dt><span class="section"><a href="#idp587512" class="toc">Architecture</a></span></dt>
     93<dt><span class="section"><a href="#idp181248" class="toc">Architecture</a></span></dt>
    87 <dt><span class="section"><a href="#idp587832" class="toc">Overview</a></span></dt>
    88 <dt><span class="section"><a href="#idp609600" class="toc">Character Set Adapters</a></span></dt>
    89 <dt><span class="section"><a href="#idp616344" class="toc">Combined Parallel Filtering</a></span></dt>
    90 <dt><span class="section"><a href="#idp628552" class="toc">Content Stream</a></span></dt>
    91 <dt><span class="section"><a href="#idp634584" class="toc">Namespace Handling</a></span></dt>
    92 <dt><span class="section"><a href="#idp642952" class="toc">Error Handling</a></span></dt>
     95<dt><span class="section"><a href="#idp181568" class="toc">Overview</a></span></dt>
     96<dt><span class="section"><a href="#idp195064" class="toc">Character Set Adapters</a></span></dt>
     97<dt><span class="section"><a href="#idm9304" class="toc">Combined Parallel Filtering</a></span></dt>
     98<dt><span class="section"><a href="#idp221352" class="toc">Content Stream</a></span></dt>
     99<dt><span class="section"><a href="#idp228528" class="toc">Namespace Handling</a></span></dt>
     100<dt><span class="section"><a href="#idp240160" class="toc">Error Handling</a></span></dt>
    94 <dt><span class="section"><a href="#idp650360" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
    95 <dt><span class="section"><a href="#idp657176" class="toc">Performance</a></span></dt>
     102<dt><span class="section"><a href="#idp247200" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
     103<dt><span class="section"><a href="#idp257920" class="toc">Performance</a></span></dt>
    97 <dt><span class="section"><a href="#idp659048" class="toc">Xerces C++ SAXCount</a></span></dt>
    98 <dt><span class="section"><a href="#idp663432" class="toc">GML2SVG</a></span></dt>
     105<dt><span class="section"><a href="#idp259800" class="toc">Xerces C++ SAXCount</a></span></dt>
     106<dt><span class="section"><a href="#idp264184" class="toc">GML2SVG</a></span></dt>
    100 <dt><span class="section"><a href="#idp664008" class="toc">Conclusion and Future Work</a></span></dt>
     108<dt><span class="section"><a href="#idp264760" class="toc">Conclusion and Future Work</a></span></dt>
    103 <div class="section" id="idp426544">
     111<div class="section" id="idp15032">
    104112<h2 class="title" style="clear: both">Introduction</h2>
    105 <p id="idp426888"></p>
    106 <p id="idp427016"></p>
    107 <p id="idp427144"></p>
    108 <p id="idp427272"></p>
    109 </div>
    110 <div class="section" id="idp427464">
     113<p id="idp15352"></p>
     114<p id="idp15480"></p>
     115<p id="idp15608"></p>
     116<p id="idp15736"></p>
     118<div class="section" id="idp15928">
    111119<h2 class="title" style="clear: both">Background</h2>
    112 <div class="section" id="idp427800">
     120<div class="section" id="idp16248">
    113121<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    114 <p id="idp428144">
     122<p id="idp16568">
    115123The Xerces C++ parser
    130138parsing as well as a DOM tree-based parsing interface.
    132 <p id="idp429744">
     140<p id="idp18112">
    151159is required, involving all aspects of the parser.
    153 <p id="idp431920">
    158 </p>
    159 </div>
    160 <div class="section" id="idp432856">
     161<p id="idp21496">
     168<div class="section" id="idp22392">
    161169<h3 class="title" style="clear: both">The Parabix Framework</h3>
    162 <p id="idp433176">
     170<p id="idp22712">
    163171The Parabix (parallel bit stream) framework is a transformative approach to XML parsing
    164172(and other forms of text processing.) The key idea is to exploit the availability of wide
    170178Boolean-logic operations\footnote{∧, \√ and ¬ denote the boolean AND, OR and NOT operators.}
    171179are used to classify the input bits into a set of <span class="ital">character-class bit streams</span>, which identify key
    172 characters (or groups of characters) with a $1$.
     180characters (or groups of characters) with a <code class="code">1</code>.
    173181For example, one of the fundamental characters in XML is a left-angle bracket.
    174 A character is a<code class="code">&lt; if and only if ¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧
     182A character is an <code class="code">'&lt;' if and only if ¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧
    175183b<sub>3</sub>) ∧ (b<sub>4</sub> ∧ b<sub>5</sub>) ∧ ¬ (b<sub>6</sub> √ b<sub>7</sub>) = 1</code>.
    176 Similarly, a character is numeric 
    177 <code class="code">[0-9] if and only if $¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧ b<sub>3</sub>) ∧ ¬(b<sub>4</sub> ∧ (b<sub>5</sub> √ b<sub>6</sub>))</code>.
     184Similarly, a character is numeric,
     185<code class="code">[0-9] if and only if ¬(b<sub>0</sub> √ b<sub>1</sub>) ∧ (b<sub>2</sub> ∧ b<sub>3</sub>) ∧ ¬(b<sub>4</sub> ∧ (b<sub>5</sub> √ b<sub>6</sub>))</code>.
    178186An important observation here is that ranges of characters may
    179187require fewer operations than individual characters and
    181189multiple classes can share the classification cost.
    183 <p id="idp577560">
    185 </p>
    186 <p id="idp579000">
     191<p id="idp170240">
     194<p id="idp171696">
    187195Consider, for example, the XML source data stream shown in the first line of .
    188196The remaining lines of this figure show several parallel bit streams that are computed in Parabix-style
    195203tag openers into start tag marks and end tag marks
    196204depending on the character immediately following the
    197 opener (i.e., ``\verb:/:'') or not.  The remaining three
     205opener (i.e., <code class="code">"/"</code>) or not.  The remaining three
    198206lines show streams that can be computed in subsequent
    199207parsing (using the technique
    201209attribute names and attribute values of tags. 
    203 <p id="idp580288">
     211<p id="idp175032">
    204212Two intuitions may help explain how the Parabix approach can lead
    205213to improved XML parsing performance. The first is that
    216224should provide substantial benefit.
    218 <p id="idp580480">
     226<p id="idp175992">
    219227Previous studies have shown that the Parabix approach improves many aspects of XML processing,
    220228including transcoding \cite{Cameron2008}, character classification and validation,
    229237they lacked the functionality required by full XML parsers.
    231 <p id="idp580672">
     239<p id="idp177744">
    232240Commercial XML processors support transcoding of multiple character sets and can parse and
    233241validate against multiple document vocabularies.
    238 <div class="section" id="idp584560">
     246<div class="section" id="idp178296">
    239247<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    240 <p id="idp584880">
     248<p id="idp178616">
    241249Xerces—like all traditional XML parsers—processes XML documents sequentially.
    242250Each character is examined to distinguish between the
    246254validation and content processing modes.
    248 <p id="idp585952">
     256<p id="idp179688">
    249257In other words, Xerces belongs to an equivalent class applications termed FSM applications\footnote{
    250258  Herein FSM applications are considered software systems whose behaviour is defined by the inputs,
    253261Unfortunately, textual data tends to be unpredictable and any character could induce a state transition.
    255 <p id="idp586616">
     263<p id="idp180352">
    256264Parabix-style XML parsers utilize a concept of layered processing.
    257265A block of source text is transformed into a set of lexical bitstreams,
    266 <div class="section" id="idp587512">
     274<div class="section" id="idp181248">
    267275<h2 class="title" style="clear: both">Architecture</h2>
    268 <div class="section" id="idp587832">
     276<div class="section" id="idp181568">
    269277<h3 class="title" style="clear: both">Overview</h3>
    270 <p id="idp588280">
     278<p id="idp182016">
    271279icXML is more than an optimized version of Xerces. Many components were grouped, restructured and
    272280rearchitected with pipeline parallelism in mind.
    293301the user-defined DTD and schema grammar(s) before passing it to the end-user.
    295 <p id="idp591904">
    297 </p>
    298 <p id="idp592160">
     303<p id="idp185640">
     306<p id="idp185896">
    299307In icXML functions are grouped into logical components.
    300308As shown in Figure \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the Markup Processor.
    315323From here, two data-independent branches exist: the Symbol Resolver and Content Preparation Unit.
    317 <p id="idp404616">
     325<p id="idp188856">
    318326A typical XML file contains few unique element and attribute names—but each of them will occur frequently.
    319327icXML stores these as distinct data structures, called symbols, each with their own global identifier (GID).
    321329the raw data to produce a sequence of GIDs, called the <span class="ital">symbol stream</span>.
    323 <p id="idp406288">
     331<p id="idp190496">
    324332The final components of the Parabix Subsystem are the <span class="ital">Content Preparation Unit</span> and <span class="ital">Content Stream Generator</span>.
    325333The former takes the (transposed) basis bitstreams and selectively filters them, according to the
    327335filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}.
    329 <p id="idp407840">
     337<p id="idp191984">
    330338Combined, the symbol and content stream form icXML's compressed IR of the XML document.
    331339The <span class="ital">Markup Processor</span>~parses the IR to validate and produce the sequential output for the end user.
    339347However, preprocessing associated with each symbol greatly reduces the work of this stage.
    341 <p id="idp609280">
    343 </p>
    344 </div>
    345 <div class="section" id="idp609600">
     349<p id="idp194728">
     353<div class="section" id="idp195064">
    346354<h3 class="title" style="clear: both">Character Set Adapters</h3>
    347 <p id="idp610216">
     355<p id="idp195712">
    348356In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of Xerces itself and
    349357provide the end-consumer with a single encoding format.
    354362transcoding imposes at least a cost of buffer copying.
    356 <p id="idp611592">
     364<p id="idp196496">
    357365In icXML, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
    358366Given a specified input encoding, a CSA is responsible for checking that
    362370is performed using the parallel bitstream representation of the source input.
    364 <p id="idp612312">
     372<p id="idp197216">
    365373An important observation is that many character sets are an
    366374extension to the legacy 7-bit ASCII character set.  This includes the
    370378serves to compute lexical item streams for all such ASCII-based character sets.
    372 <p id="idp612944">
     380<p id="idp197848">
    373381A second observation is that—regardless of which character set is used—quite
    374382often all of the characters in a particular block of input will be within the ASCII range.
    379387UTF-16 form are each set to zero in this case.
    381 <p id="idp614496">
     389<p id="idp199400">
    382390A third observation is that repeated transcoding of the names of XML
    383391elements, attributes and so on can be avoided by using a look-up mechanism.
    390398additional cost.
    392 <p id="idp615280">
     400<p id="idp200184">
    393401The cost of individual character transcoding is avoided whenever a block of input is
    394402confined to the ASCII subset and for all but the first occurrence of any XML element or attribute name.
    406 <div class="section" id="idp616344">
     414<div class="section" id="idm9304">
    407415<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    408 <p id="idp616664">
     416<p id="idm8952">
    409417As just mentioned, UTF-8 to UTF-16 transcoding involves marking
    410418all but the last bytes of multi-byte UTF-8 sequences as
    411419positions for deletion.   For example, the two
    412 Chinese characters \begin{CJK*}{UTF8}{gbsn}䜠奜\end{CJK*}
    413 are represented as two three-byte UTF-8 sequences \verb'E4 BD A0'
    414 and \verb'E5 A5 BD' while the UTF-16 representation must be
    415 compressed down to the two code units \verb'4F60' and \verb'597D'.
     420Chinese characters <code class="code">䜠奜</code>
     421are represented as two three-byte UTF-8 sequences <code class="code">E4 BD A0</code>
     422and <code class="code">E5 A5 BD</code> while the UTF-16 representation must be
     423compressed down to the two code units <code class="code">4F60</code> and <code class="code">597D</code>.
    416424In the bit parallel representation, this corresponds to a reduction
    417425from six bit positions representing UTF-8 code units (bytes)
    423431for deletion.   In this case, the portion of the mask
    424432corresponding to these input bytes is the bit sequence
    425 \verb'110110'.  Using this approach, transcoding may then be
     433<code class="code">110110</code>.  Using this approach, transcoding may then be
    426434completed by applying parallel deletion and inverse transposition of the
    427435UTF-16 bitstreams\cite{Cameron2008}.
    429 <p id="idp619512">
    439 </p>
    440 <p id="idp621992">
     437<p id="idm6424">
     448<p id="idm3968">
    441449Rather than immediately paying the
    442450costs of deletion and transposition just for transcoding,
    460 <p id="idp622312">
     468<p id="idm1464">
    461469In essence, the deletion masks for transcoding and
    462470for line break normalization each represent a bitwise
    465473applied once.
    467 <p id="idp624608">
     475<p id="idp217752">
    468476A further application of combined filtering
    469477is the processing of XML character and entity
    484492that this is not true, it is addressed in post-processing.
    486 <p id="idp627584">
     494<p id="idp220384">
    487495The final step of combined filtering occurs during
    488496the process of reducing markup data to tag bytes
    502 <div class="section" id="idp628552">
     510<div class="section" id="idp221352">
    503511<h3 class="title" style="clear: both">Content Stream</h3>
    504 <p id="idp628872">
     512<p id="idp221696">
    505513A relatively-unique concept for icXML is the use of a filtered content stream.
    506514Rather that parsing an XML document in its original format, the input is transformed
    514522through the parallel filtering algorithm, described in section \ref{sec:parfilter}.
    516 <p id="idp630400">
     524<p id="idp223408">
    517525Combined with the symbol stream, the parser traverses the content stream to effectively
    518526reconstructs the input document in its output form.
    519 The initial <span class="ital">0</span> indicates an empty content string. The following \verb|&gt;|
     527The initial <span class="ital">0</span> indicates an empty content string. The following <code class="code">&gt;</code>
    520528indicates that a start tag without any attributes is the first element in this text and
    521529the first unused symbol, <code class="code">document</code>, is the element name.
    529537directly jump to the end of every string without scanning for it.
    531 <p id="idp632232">
    532 Following ``\verb`fee`'' is a \verb`=`, which marks the existence of an attribute.
     539<p id="idp226064">
     540Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the existence of an attribute.
    533541Because all of the intra-element was performed in the Parabix Subsystem, this must be a legal attribute.
    534542Since attributes can only occur within start tags and must be accompanied by a textual value,
    535543the next symbol in the symbol stream must be the element name of a start tag,
    536 and the following one must be the name of the attribute and the string that follows the \verb`=` must be its value.
    537 However, the subsequent \verb`=` is not treated as an independent attribute because the parser has yet to
    538 read a \verb`&gt;`, which marks the end of a start tag. Thus only one symbol is taken from the symbol stream and
     544and the following one must be the name of the attribute and the string that follows the <code class="code">=</code> must be its value.
     545However, the subsequent <code class="code">=</code> is not treated as an independent attribute because the parser has yet to
     546read a <code class="code">&gt;</code>, which marks the end of a start tag. Thus only one symbol is taken from the symbol stream and
    539547it (along with the string value) is added to the element.
    540 Eventually the parser reaches a \verb`/`, which marks the existence of an end tag. Every end tag requires an
     548Eventually the parser reaches a <code class="code">/</code>, which marks the existence of an end tag. Every end tag requires an
    541549element name, which means they require a symbol. Inter-element validation whenever an empty tag is detected to
    542550ensure that the appropriate scope-nesting rules have been applied.
    545 <div class="section" id="idp634584">
     553<div class="section" id="idp228528">
    546554<h3 class="title" style="clear: both">Namespace Handling</h3>
    547 <p id="idp635192">
     555<p id="idp229176">
    548556In XML, namespaces prevents naming conflicts when multiple vocabularies are used together.
    549557It is especially important when a vocabulary application-dependant meaning, such as when
    551559Namespaces are bound to uniform resource identifiers (URIs), which are strings used to identify
    552560specific names or resources.
    553 On line 1 of Figure \ref{fig:namespace1}, the \verb|xmlns| attribute instructs the XML
     561On line 1 of Figure \ref{fig:namespace1}, the <code class="code">xmlns</code> attribute instructs the XML
    554562processor to bind the prefix <code class="code">p</code> to the URI '<code class="code">pub.net</code>' and the default (empty)
    555 prefix to <code class="code">book.org</code>. Thus to the XML processor, the \verb|title| on line 2 and
    556 \verb|price| on line 4 both read as \verb|"book.org":title| and \verb|"book.org":price|
    557 respectively, whereas on line 3 and 5, \verb|p:name| and \verb|price| are seen as
    558 \verb|"pub.net":name| and \verb|"pub.net":price|. Even though the actual element name
    559 \verb|price|, due to namespace scoping rules they are viewed as two uniquely-named items
     563prefix to <code class="code">book.org</code>. Thus to the XML processor, the <code class="code">title</code> on line 2 and
     564<code class="code">price</code> on line 4 both read as <code class="code">"book.org":title</code> and <code class="code">"book.org":price</code>
     565respectively, whereas on line 3 and 5, <code class="code">p:name</code> and <code class="code">price</code> are seen as
     566<code class="code">"pub.net":name</code> and <code class="code">"pub.net":price</code>. Even though the actual element name
     567<code class="code">price</code>, due to namespace scoping rules they are viewed as two uniquely-named items
    560568because the current vocabulary is determined by the namespace(s) that are in-scope.
    562 <p id="idp637032">
    564 </p>
    565 <p id="idp637288">
     570<p id="idp233096">
     573<p id="idp233368">
    566574In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID.
    567575These persist for the lifetime of the application through the use of a global URI pool.
    573581(2) those that repeatedly modify the namespaces in predictable patterns.
    575 <p id="idp637480">
     583<p id="idp233560">
    576584For that reason, icXML contains an independent namespace stack and utilizes bit vectors to cheaply perform
    579 When a prefix is declared (e.g., \verb|xmlns:p="pub.net"|), a namespace binding is created that maps
     587When a prefix is declared (e.g., <code class="code">xmlns:p="pub.net"</code>), a namespace binding is created that maps
    580588the prefix (which are assigned Prefix IDs in the symbol resolution process) to the URI.
    581589Each unique namespace binding has a unique namespace id (NSID) and every prefix contains a bit vector marking every
    582590NSID that has ever been associated with it within the document. For example, in Table \ref{tbl:namespace1}, the
    583 prefix binding set of \verb|p| and \verb|xmlns| would be \verb|01| and \verb|11| respectively.
     591prefix binding set of <code class="code">p</code> and <code class="code">xmlns</code> would be <code class="code">01</code> and <code class="code">11</code> respectively.
    584592To resolve the in-scope namespace binding for each prefix, a bit vector of the currently visible namespaces is
    585593maintained by the system. By ANDing the prefix bit vector with the currently visible namespaces, the in-scope
    587595A namespace binding table, similar to Table \ref{tbl:namespace1}, provides the actual URI ID.
    589 <p id="idp640280">
    591 </p>
    592 <p id="idp640536">
    597 </p>
    598 <p id="idp641968">
     597<p id="idp237336">
     600<p id="idp237608">
     606<p id="idp239176">
    599607To ensure that scoping rules are adhered to,
    600608whenever a start tag is encountered, any modification to the currently visible namespaces is calculated and stored
    607 <div class="section" id="idp642952">
     615<div class="section" id="idp240160">
    608616<h3 class="title" style="clear: both">Error Handling</h3>
    609 <p id="idp643296">
     617<p id="idp240504">
    611619Xerces outputs error messages in two ways: through the programmer API and as thrown objects for fatal errors.
    616624each with its own system for detecting and producing error messages.
    618 <p id="idp644416">
     626<p id="idp241640">
    619627Within the Parabix Subsystem, all computations are performed in parallel, a block at a time.
    620628Errors are derived as artifacts of bitstream calculations, with a 1-bit marking the byte-position of an error within a block,
    649657column number.
    651 <p id="idp648656">
     659<p id="idp245496">
    652660The Markup Processor is a state-driven machine. As such, error detection within it is very similar to Xerces.
    653661However, reporting the correct line/column is a much more difficult problem.
    666 <div class="section" id="idp650360">
     674<div class="section" id="idp247200">
    667675<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    668 <p id="idp650728">
     676<p id="idp247568">
    669677As discussed in section \ref{background:xerces}, Xerces can be considered a FSM application.
    670 These are ``embarrassingly sequential.''\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize.
     678These are "embarrassingly sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize.
    671679However, icXML is designed to organize processing into logical layers.   
    672680In particular, layers within the Parabix Subsystem are designed to operate
    676684of modules.
    678 <p id="idp651520">
     686<p id="idp249104">
    679687The most straightforward division of work in icXML is to separate
    680688the Parabix Subsystem and the Markup Processor into distinct logical layers into two separate stages.
    681689The resultant application, <span class="ital">icXML-p</span>, is a course-grained software-pipeline application.
    682 In this case, the Parabix Subsystem thread $T_1$ reads 16k of XML input $I$ at a time and produces the
    683 content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$.
    684 The Markup Processor thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation,
     690In this case, the Parabix Subsystem thread <code class="code">T<sub>1</sub></code> reads 16k of XML input <code class="code">I</code> at a time and produces the
     691content, symbol and URI streams, then stores them in a pre-allocated shared data structure <code class="code">S</code>.
     692The Markup Processor thread <code class="code">T<sub>2</sub></code> consumes <code class="code">S</code>, performs well-formedness and grammar-based validation,
    685693and the provides parsed XML data to the application through the Xerces API. 
    686694The shared data structure is implemented using a ring buffer,
    688696In the examples of Figure \ref{threads_timeline1} and \ref{threads_timeline2}, the ring buffer has four entries.
    689697A lock-free mechanism is applied to ensure that each entry can only be read or written by one thread at the same time.
    690 In Figure \ref{threads_timeline1} the processing time of $T_1$ is longer than $T_2$;
    691 thus $T_2$ always waits for $T_1$ to write to the shared memory.
    692 Figure \ref{threads_timeline2} illustrates the scenario in which $T_1$ is faster
    693 and must wait for $T_2$ to finish reading the shared data before it can reuse the memory space.
    694 </p>
    695 <p id="idp653392">
    697 </p>
    698 <p id="idp653664">
     698In Figure \ref{threads_timeline1} the processing time of <code class="code">T<sub>1</sub></code> is longer than <code class="code">T<sub>2</sub></code>;
     699thus <code class="code">T<sub>2</sub></code> always waits for <code class="code">T<sub>1</sub></code> to write to the shared memory.
     700Figure \ref{threads_timeline2} illustrates the scenario in which <code class="code">T<sub>1</sub></code> is faster
     701and must wait for <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can reuse the memory space.
     703<p id="idp254152">
     706<p id="idp254424">
    699707Overall, our design is intended to benefit a range of applications.
    700708Conceptually, we consider two design points.
    704712while the cost of XML parsing represents an overhead of 40%.
    706 <p id="idp653856">
     714<p id="idp254616">
    707715Our design is predicated on a goal of using the Parabix
    708716framework to achieve a 50% to 100% improvement in the parsing engine itself.   
    712720In this case, the single-threaded icXML should achieve a 1.5x speedup over Xerces
    713721so that the total application cost reduces to 67% of the original. 
    714 However, in \icXML-p, our ideal scenario gives us two well-balanced threads
     722However, in icXML-p, our ideal scenario gives us two well-balanced threads
    715723each performing about 33% of the original work.   
    716724In this case, Amdahl's law predicts that we could expect up to a 3x speedup at best.
    718 <p id="idp655592">
     726<p id="idp256344">
    719727At the other extreme of our design range, we consider an application
    720 in which core parsing cost is 40%.   Assuming the 2x speedup of
     728in which core parsing cost is 40%.  Assuming the 2x speedup of
    721729the Parabix Subsystem over the corresponding Xerces core, single-threaded
    722 icXML delivers a 25% speedup.   However, the most significant
     730icXML delivers a 25% speedup.  However, the most significant
    723731aspect of our two-stage multi-threaded design then becomes the
    724732ability to hide the entire latency of parsing within the serial time
    725 required by the application.   In this case, we achieve
     733required by the application.  In this case, we achieve
    726734an overall speedup in processing time by 1.67x.
    728 <p id="idp656304">
     736<p id="idp257048">
    729737Although the structure of the Parabix Subsystem allows division of the work into
    730738several pipeline stages and has been demonstrated to be effective
    740 <div class="section" id="idp657176">
     748<div class="section" id="idp257920">
    741749<h2 class="title" style="clear: both">Performance</h2>
    742 <p id="idp657512">
    743 We evaluate \xerces{}, icXML, \icXML-p against two benchmarking applications:
     750<p id="idp258256">
     751We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications:
    744752the Xerces C++ SAXCount sample application,
    745753and a real world GML to SVG transformation application.
    7507588 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
    752 <p id="idp658168">
     760<p id="idp258920">
    753761We analyzed the execution profiles of each XML parser
    754762using the performance counters found in the processor.
    764772to collect per core hardware events.
    766 <div class="section" id="idp659048">
     774<div class="section" id="idp259800">
    767775<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    768 <p id="idp659392">
     776<p id="idp260144">
    769777Xerces comes with sample applications that demonstrate salient features of the parser.
    770778SAXCount is the simplest such application:
    772780and prints out the totals.
    774 <p id="idp659856">
    776 </p>
    777 <p id="idp660128">
     782<p id="idp260608">
     785<p id="idp260880">
    778786Table \ref{XMLDocChars} shows the document characteristics of the XML input
    779787files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
    782790XML documents and consist entirely of single byte encoded ASCII characters.
    784 <p id="idp660320">
     792<p id="idp261072">
    785793A key predictor of the overall parsing performance of an XML file is markup density\footnote{
    786794  Markup Density: the ratio of markup bytes used to define the structure of the document vs. its file size.}.
    791799of markup densities.
    793 <p id="idp662072">
     801<p id="idp262824">
    794802Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML in terms of
    795803CPU cycles per byte for the SAXCount application.
    796804The speedup for icXML over Xerces is 1.3x to 1.8x.
    797 With two threads on the multicore machine, \icXML-p can achieve speedup up to 2.7x.
     805With two threads on the multicore machine, icXML-p can achieve speedup up to 2.7x.
    798806Xerces is substantially slowed by dense markup
    799807but icXML is less affected through a reduction in branches and the use of parallel-processing techniques.
    800 \icXML-p performs better as markup-density increases because the work performed by each stage is
     808icXML-p performs better as markup-density increases because the work performed by each stage is
    801809well balanced in this application.
    803 <p id="idp662840">
    805 </p>
    806 </div>
    807 <div class="section" id="idp663432">
     811<p id="idp263592">
     815<div class="section" id="idp264184">
    808816<h3 class="title" style="clear: both">GML2SVG</h3>
    809 <p id="idp663752"></p>
    810 </div>
    811 </div>
    812 <div class="section" id="idp664008">
     817<p id="idp264504"></p>
     820<div class="section" id="idp264760">
    813821<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    814 <p id="idp664360">
     822<p id="idp265112">
    815823This paper is the first case study documenting the significant
    816824performance benefits that may be realized through the integration
    828836feasibility of these techniques.
    830 <p id="idp665368">
     838<p id="idp266936">
    831839The further development of icXML to move beyond 2-stage
    832840pipeline parallelism is ongoing, with realistic prospects for
    836844library should offer substantial benefits. 
    838 <p id="idp665904">
     846<p id="idp267472">
    839847The example of XML parsing may be considered prototypical
    840848of finite-state machines applications which have sometimes
    841 been considered ``embarassingly sequential'' and so
    842 difficult to parallelize that ``nothing works.''  So the
     849been considered "embarassingly sequential" and so
     850difficult to parallelize that "nothing works."  So the
    843851case study presented here should be considered an important
    844852data point in making the case that parallelization can
    845853indeed be helpful across a broad array of application types.
    847 <p id="idp666504">
     855<p id="idp268584">
    848856To overcome the software engineering challenges in applying
    849857parallel bitstream technology to existing software systems,
    858 <div class="bibliography" id="idp667464">
     866<div class="bibliography" id="idp269544">
    859867<h2 class="title" style="clear:both">Bibliography</h2>
    860868<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
Note: See TracChangeset for help on using the changeset viewer.