source: docs/Balisage13/Bal2013came0601/Bal2013came0601.html @ 3047

Last change on this file since 3047 was 3047, checked in by cameron, 6 years ago

Xerces function and XML document characteristics tables

File size: 60.4 KB
Line 
1<html lang="en">
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4<title></title>
5<link rel="stylesheet" href="balisage-plain.css" type="text/css">
6<meta name="keywords" content="">
7</head>
8<body>
9<div id="balisage-header"><h1 style="text-align: right; font-family: serif; margin:0.25em">
10<i>Balisage:</i> <small>The Markup Conference</small>
11</h1></div>
12<div lang="en" class="article">
13<div class="titlepage">
14<h2 class="article-title" id="idp75552"></h2>
15<div class="author">
16<h3 class="author">Nigel Medforth</h3>
17<div class="affiliation">
18<p class="jobtitle">Developer</p>
19<p class="orgname">International Characters Inc.</p>
20</div>
21<div class="affiliation">
22<p class="jobtitle">Graduate Student, School of Computing Science</p>
23<p class="orgname">Simon Fraser University </p>
24</div>
25<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>&gt;</code></h5>
26</div>
27<div class="author">
28<h3 class="author">Dan Lin</h3>
29<div class="affiliation">
30<p class="jobtitle">Graduate Student, School of Computing Science</p>
31<p class="orgname">Simon Fraser University </p>
32</div>
33<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>&gt;</code></h5>
34</div>
35<div class="author">
36<h3 class="author">Kenneth Herdy</h3>
37<div class="affiliation">
38<p class="jobtitle">Graduate Student, School of Computing Science</p>
39<p class="orgname">Simon Fraser University </p>
40</div>
41<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>&gt;</code></h5>
42</div>
43<div class="author">
44<h3 class="author">Rob Cameron</h3>
45<div class="affiliation">
46<p class="jobtitle">Professor of Computing Science</p>
47<p class="orgname">Simon Fraser University</p>
48</div>
49<div class="affiliation">
50<p class="jobtitle">Chief Technology Officer</p>
51<p class="orgname">International Characters, Inc.</p>
52</div>
53<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
54</div>
55<div class="author">
56<h3 class="author">Arrvindh Shriraman</h3>
57<div class="affiliation">
58<p class="jobtitle"></p>
59<p class="orgname"></p>
60</div>
61<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:"></a>&gt;</code></h5>
62</div>
63<div class="abstract">
64<p class="title"><b>Abstract</b></p>
65<p id="idp76976">Prior research on the acceleration of XML processing using SIMD and multi-core
66            parallelism has lead to a number of interesting research prototypes. This work
67            investigates the extent to which the techniques underlying these prototypes could result
68            in systematic performance benefits when fully integrated into a commercial XML parser.
69            The widely used Xerces-C++ parser of the Apache Software Foundation was chosen as the
70            foundation for the study. A systematic restructuring of the parser was undertaken, while
71            maintaining the existing API for application programmers. Using SIMD techniques alone,
72            an increase in parsing speed of at least 50% was observed in a range of applications.
73            When coupled with pipeline parallelism on dual core processors, improvements of 2x and
74            beyond were realized. </p>
75</div>
76<hr>
77</div>
78<div class="toc">
79<p><b>Table of Contents</b></p>
80<dl>
81<dt><span class="section"><a href="#idp285120" class="toc">Introduction</a></span></dt>
82<dt><span class="section"><a href="#idp286912" class="toc">Background</a></span></dt>
83<dd><dl>
84<dt><span class="section"><a href="#idp287552" class="toc">Xerces C++ Structure</a></span></dt>
85<dt><span class="section"><a href="#idp331216" class="toc">The Parabix Framework</a></span></dt>
86<dt><span class="section"><a href="#idp40512" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
87</dl></dd>
88<dt><span class="section"><a href="#idp44272" class="toc">Architecture</a></span></dt>
89<dd><dl>
90<dt><span class="section"><a href="#idp366784" class="toc">Overview</a></span></dt>
91<dt><span class="section"><a href="#idp390240" class="toc">Character Set Adapters</a></span></dt>
92<dt><span class="section"><a href="#idp398192" class="toc">Combined Parallel Filtering</a></span></dt>
93<dt><span class="section"><a href="#idp416848" class="toc">Content Stream</a></span></dt>
94<dt><span class="section"><a href="#idp427456" class="toc">Namespace Handling</a></span></dt>
95<dt><span class="section"><a href="#idp447600" class="toc">Error Handling</a></span></dt>
96</dl></dd>
97<dt><span class="section"><a href="#idp458640" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>
98<dt><span class="section"><a href="#idp475920" class="toc">Performance</a></span></dt>
99<dd><dl>
100<dt><span class="section"><a href="#idp478640" class="toc">Xerces C++ SAXCount</a></span></dt>
101<dt><span class="section"><a href="#idp499872" class="toc">GML2SVG</a></span></dt>
102</dl></dd>
103<dt><span class="section"><a href="#idp501024" class="toc">Conclusion and Future Work</a></span></dt>
104</dl>
105</div>
106<div class="section" id="idp285120">
107<h2 class="title" style="clear: both">Introduction</h2>
108<p id="idp285760"></p>
109<p id="idp286016"></p>
110<p id="idp286272"></p>
111<p id="idp286528"></p>
112</div>
113<div class="section" id="idp286912">
114<h2 class="title" style="clear: both">Background</h2>
115<div class="section" id="idp287552">
116<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
117<p id="idp288192"> The Xerces C++ parser
118           
119           
120             features comprehensive support for a variety of character encodings both
121            commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple
122            XML vocabularies through the XML namespace mechanism, as well as complete
123            implementations of structure and data validation through multiple grammars declared
124            using either legacy DTDs (document type definitions) or modern XML Schema facilities.
125            Xerces also supports several APIs for accessing parser services, including event-based
126            parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM
127            tree-based parsing interface. </p>
128<p id="idp290320">
129           
130           
131             Xerces,
132            like all traditional parsers, processes XML documents sequentially a byte-at-a-time from
133            the first to the last byte of input data. Each byte passes through several processing
134            layers and is classified and eventually validated within the context of the document
135            state. This introduces implicit dependencies between the various tasks within the
136            application that make it difficult to optimize for performance. As a complex software
137            system, no one feature dominates the overall parsing performance. Figure
138            \ref{fig:xerces-profile} shows the execution time profile of the top ten functions in a
139            typical run. Even if it were possible, Amdahl's Law dictates that tackling any one of
140            these functions for parallelization in isolation would only produce a minute improvement
141            in performance. Unfortunately, early investigation into these functions found that
142            incorporating speculation-free thread-level parallelization was impossible and they were
143            already performing well in their given tasks; thus only trivial enhancements were
144            attainable. In order to obtain a systematic acceleration of Xerces, it should be
145            expected that a comprehensive restructuring is required, involving all aspects of the
146            parser. </p>
147<div class="table-wrapper" id="idp293248">
148<p class="title">Table I</p>
149<div class="caption"><p id="idp8080">Execution Time of Top 10 Xerces Functions</p></div>
150<table class="table">
151<colgroup span="1">
152<col align="left" valign="top" span="1">
153<col align="left" valign="top" span="1">
154</colgroup>
155<thead><tr>
156<th>Time (%) </th>
157<th> Function Name </th>
158</tr></thead>
159<tbody>
160<tr valign="top">
161<td>13.29       </td>
162<td>XMLUTF8Transcoder::transcodeFrom </td>
163</tr>
164<tr valign="top">
165<td>7.45        </td>
166<td>IGXMLScanner::scanCharData </td>
167</tr>
168<tr valign="top">
169<td>6.83        </td>
170<td>memcpy </td>
171</tr>
172<tr valign="top">
173<td>5.83        </td>
174<td>XMLReader::getNCName </td>
175</tr>
176<tr valign="top">
177<td>4.67        </td>
178<td>IGXMLScanner::buildAttList </td>
179</tr>
180<tr valign="top">
181<td>4.54        </td>
182<td>RefHashTableO&lt;&gt;::findBucketElem </td>
183</tr>
184<tr valign="top">
185<td>4.20        </td>
186<td>IGXMLScanner::scanStartTagNS </td>
187</tr>
188<tr valign="top">
189<td>3.75        </td>
190<td>ElemStack::mapPrefixToURI </td>
191</tr>
192<tr valign="top">
193<td>3.58        </td>
194<td>ReaderMgr::getNextChar </td>
195</tr>
196<tr valign="top">
197<td>3.20        </td>
198<td>IGXMLScanner::basicAttrValueScan </td>
199</tr>
200</tbody>
201</table>
202</div>
203</div>
204<div class="section" id="idp331216">
205<h3 class="title" style="clear: both">The Parabix Framework</h3>
206<p id="idp331888"> The Parabix (parallel bit stream) framework is a transformative approach to XML
207            parsing (and other forms of text processing.) The key idea is to exploit the
208            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
209            data from long blocks of input data by using one register bit per single input byte. To
210            facilitate this, the input data is first transposed into a set of basis bit streams. In
211           
212            Boolean-logic operations\footnote{∧, \√ and ¬ denote the
213            boolean AND, OR and NOT operators.} are used to classify the input bits into a set of
214               <span class="ital">character-class bit streams</span>, which identify key
215            characters (or groups of characters) with a <code class="code">1</code>. For example, one of the
216            fundamental characters in XML is a left-angle bracket. A character is an
217               <code class="code">'&lt;' if and only if
218               Â¬(b<sub>0</sub> âˆš b<sub>1</sub>)
219               âˆ§ (b<sub>2</sub> âˆ§ b<sub>3</sub>)
220               âˆ§ (b<sub>4</sub> âˆ§ b<sub>5</sub>)
221               âˆ§ ¬ (b<sub>6</sub> âˆš
222               b<sub>7</sub>) = 1</code>. Similarly, a character is numeric, <code class="code">[0-9]
223               if and only if ¬(b<sub>0</sub> âˆš
224               b<sub>1</sub>) ∧ (b<sub>2</sub> âˆ§
225                  b<sub>3</sub>) ∧ ¬(b<sub>4</sub>
226               âˆ§ (b<sub>5</sub> âˆš
227            b<sub>6</sub>))</code>. An important observation here is that ranges of
228            characters may require fewer operations than individual characters and
229             multiple
230            classes can share the classification cost. </p>
231<p id="idp343456">
232           
233         </p>
234<p id="idp347392"> Consider, for example, the XML source data stream shown in the first line of
235            . The remaining lines of this figure show
236            several parallel bit streams that are computed in Parabix-style parsing, with each bit
237            of each stream in one-to-one correspondence to the source character code units of the
238            input stream. For clarity, 1 bits are denoted with 1 in each stream and 0 bits are
239            represented as underscores. The first bit stream shown is that for the opening angle
240            brackets that represent tag openers in XML. The second and third streams show a
241            partition of the tag openers into start tag marks and end tag marks depending on the
242            character immediately following the opener (i.e., <code class="code">"/"</code>) or
243            not. The remaining three lines show streams that can be computed in subsequent parsing
244            (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
245            marking the element names, attribute names and attribute values of tags. </p>
246<p id="idp37152"> Two intuitions may help explain how the Parabix approach can lead to improved XML
247            parsing performance. The first is that the use of the full register width offers a
248            considerable information advantage over sequential byte-at-a-time parsing. That is,
249            sequential processing of bytes uses just 8 bits of each register, greatly limiting the
250            processor resources that are effectively being used at any one time. The second is that
251            byte-at-a-time loop scanning loops are actually often just computing a single bit of
252            information per iteration: is the scan complete yet? Rather than computing these
253            individual decision-bits, an approach that computes many of them in parallel (e.g., 128
254            bytes at a time using 128-bit registers) should provide substantial benefit. </p>
255<p id="idp38400"> Previous studies have shown that the Parabix approach improves many aspects of XML
256            processing, including transcoding \cite{Cameron2008}, character classification and
257            validation, tag parsing and well-formedness checking. The first Parabix parser used
258            processor bit scan instructions to considerably accelerate sequential scanning loops for
259            individual characters \cite{CameronHerdyLin2008}. Recent work has incorporated a method
260            of parallel scanning using bitstream addition \cite{cameron-EuroPar2011}, as well as
261            combining SIMD methods with 4-stage pipeline parallelism to further improve throughput
262            \cite{HPCA2012}. Although these research prototypes handled the full syntax of
263            schema-less XML documents, they lacked the functionality required by full XML parsers. </p>
264<p id="idp39664"> Commercial XML processors support transcoding of multiple character sets and can
265            parse and validate against multiple document vocabularies. Additionally, they provide
266            API facilities beyond those found in research prototypes, including the widely used SAX,
267            SAX2 and DOM interfaces. </p>
268</div>
269<div class="section" id="idp40512">
270<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
271<p id="idp41152"> Xerces—like all traditional XML parsers—processes XML documents
272            sequentially. Each character is examined to distinguish between the XML-specific markup,
273            such as a left angle bracket <code class="code">&lt;</code>, and the content held within the
274            document. As the parser progresses through the document, it alternates between markup
275            scanning, validation and content processing modes. </p>
276<p id="idp42720"> In other words, Xerces belongs to an equivalent class applications termed FSM
277            applications\footnote{ Herein FSM applications are considered software systems whose
278            behaviour is defined by the inputs, current state and the events associated with
279            transitions of states.}. Each state transition indicates the processing context of
280            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
281            character could induce a state transition. </p>
282<p id="idp43632"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
283            text is transformed into a set of lexical bitstreams, which undergo a series of
284            operations that can be grouped into logical layers, e.g., transposition, character
285            classification, and lexical analysis. Each layer is pipeline parallel and require
286            neither speculation nor pre-parsing stages\cite{HPCA2012}. To meet the API requirements
287            of the document-ordered Xerces output, the results of the Parabix processing layers must
288            be interleaved to produce the equivalent behaviour. </p>
289</div>
290</div>
291<div class="section" id="idp44272">
292<h2 class="title" style="clear: both">Architecture</h2>
293<div class="section" id="idp366784">
294<h3 class="title" style="clear: both">Overview</h3>
295<p id="idp367680"> icXML is more than an optimized version of Xerces. Many components were grouped,
296            restructured and rearchitected with pipeline parallelism in mind. In this section, we
297            highlight the core differences between the two systems. As shown in Figure
298            \ref{fig:xerces-arch}, Xerces is comprised of five main modules: the transcoder, reader,
299            scanner, namespace binder, and validator. The <span class="ital">Transcoder</span> converts source data into UTF-16 before Xerces parses it as XML;
300            the majority of the character set encoding validation is performed as a byproduct of
301            this process. The <span class="ital">Reader</span> is responsible for the
302            streaming and buffering of all raw and transcoded (UTF-16) text. It tracks the current
303            line/column position,
304           
305            performs line-break normalization and validates context-specific character set issues,
306            such as tokenization of qualified-names. The <span class="ital">Scanner</span>
307            pulls data through the reader and constructs the intermediate representation (IR) of the
308            document; it deals with all issues related to entity expansion, validates the XML
309            well-formedness constraints and any character set encoding issues that cannot be
310            completely handled by the reader or transcoder (e.g., surrogate characters, validation
311            and normalization of character references, etc.) The <span class="ital">Namespace
312               Binder</span> is a core piece of the element stack. It handles namespace scoping
313            issues between different XML vocabularies. This allows the scanner to properly select
314            the correct schema grammar structures. The <span class="ital">Validator</span>
315            takes the IR produced by the Scanner (and potentially annotated by the Namespace Binder)
316            and assesses whether the final output matches the user-defined DTD and schema grammar(s)
317            before passing it to the end-user. </p>
318<p id="idp373536">
319           
320         </p>
321<p id="idp374400"> In icXML functions are grouped into logical components. As shown in Figure
322            \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the
323            Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which
324            represents data as a set of parallel bitstreams. The <span class="ital">Character Set
325               Adapter</span>, discussed in Section \ref{arch:character-set-adapter}, mirrors
326            Xerces's Transcoder duties; however instead of producing UTF-16 it produces a set of
327            lexical bitstreams, similar to those shown in Figure \ref{fig:parabix1}. These lexical
328            bitstreams are later transformed into UTF-16 in the Content Stream Generator, after
329            additional processing is performed. The first precursor to producing UTF-16 is the
330               <span class="ital">Parallel Markup Parser</span> phase. It takes the lexical
331            streams and produces a set of marker bitstreams in which a 1-bit identifies significant
332            positions within the input data. One bitstream for each of the critical piece of
333            information is created, such as the beginning and ending of start tags, end tags,
334            element names, attribute names, attribute values and content. Intra-element
335            well-formedness validation is performed as an artifact of this process. Like Xerces,
336            icXML must provide the Line and Column position of each error. The <span class="ital">Line-Column Tracker</span> uses the lexical information to keep track of the
337            document position(s) through the use of an optimized population count algorithm,
338            described in Section \ref{section:arch:errorhandling}. From here, two data-independent
339            branches exist: the Symbol Resolver and Content Preparation Unit. </p>
340<p id="idp379008"> A typical XML file contains few unique element and attribute names—but
341            each of them will occur frequently. icXML stores these as distinct data structures,
342            called symbols, each with their own global identifier (GID). Using the symbol marker
343            streams produced by the Parallel Markup Parser, the <span class="ital">Symbol
344               Resolver</span> scans through the raw data to produce a sequence of GIDs, called
345            the <span class="ital">symbol stream</span>. </p>
346<p id="idp381504"> The final components of the Parabix Subsystem are the <span class="ital">Content
347               Preparation Unit</span> and <span class="ital">Content Stream
348            Generator</span>. The former takes the (transposed) basis bitstreams and selectively
349            filters them, according to the information provided by the Parallel Markup Parser, and
350            the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p>
351<p id="idp384416"> Combined, the symbol and content stream form icXML's compressed IR of the XML
352            document. The <span class="ital">Markup Processor</span>~parses the IR to
353            validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that
354            would be too costly to perform in bit space, such as ensuring every start tag has a
355            matching end tag. Xerces's namespace binding functionality is replaced by the <span class="ital">Namespace Processor</span>. Unlike Xerces, it is a discrete phase
356            that produces a series of URI identifiers (URI IDs), the <span class="ital">URI
357               stream</span>, which are associated with each symbol occurrence. This is
358            discussed in Section \ref{section:arch:namespacehandling}. Finally, the <span class="ital">Validation</span> layer implements the Xerces's validator. However,
359            preprocessing associated with each symbol greatly reduces the work of this stage. </p>
360<p id="idp389200">
361           
362         </p>
363</div>
364<div class="section" id="idp390240">
365<h3 class="title" style="clear: both">Character Set Adapters</h3>
366<p id="idp390880"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
367            Xerces itself and provide the end-consumer with a single encoding format. In the
368            important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant,
369            because of the need to decode and classify each byte of input, mapping variable-length
370            UTF-8 byte sequences into 16-bit UTF-16 code units with bit manipulation operations. In
371            other cases, transcoding may involve table look-up operations for each byte of input. In
372            any case, transcoding imposes at least a cost of buffer copying. </p>
373<p id="idp391936"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
374            transcoding costs. Given a specified input encoding, a CSA is responsible for checking
375            that input code units represent valid characters, mapping the characters of the encoding
376            into the appropriate bitstreams for XML parsing actions (i.e., producing the lexical
377            item streams), as well as supporting ultimate transcoding requirements. All of this work
378            is performed using the parallel bitstream representation of the source input. </p>
379<p id="idp392912"> An important observation is that many character sets are an extension to the legacy
380            7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8,
381            UTF-16 and many others. Furthermore, all significant characters for parsing XML are
382            confined to the ASCII repertoire. Thus, a single common set of lexical item calculations
383            serves to compute lexical item streams for all such ASCII-based character sets. </p>
384<p id="idp393792"> A second observation is that—regardless of which character set is
385            used—quite often all of the characters in a particular block of input will be
386            within the ASCII range. This is a very simple test to perform using the bitstream
387            representation, simply confirming that the bit 0 stream is zero for the entire block.
388            For blocks satisfying this test, all logic dealing with non-ASCII characters can simply
389            be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
390            UTF-16 form are each set to zero in this case. </p>
391<p id="idp395712"> A third observation is that repeated transcoding of the names of XML elements,
392            attributes and so on can be avoided by using a look-up mechanism. That is, the first
393            occurrence of each symbol is stored in a look-up table mapping the input encoding to a
394            numeric symbol ID. Transcoding of the symbol is applied at this time. Subsequent look-up
395            operations can avoid transcoding by simply retrieving the stored representation. As
396            symbol look up is required to apply various XML validation rules, there is achieves the
397            effect of transcoding each occurrence without additional cost. </p>
398<p id="idp396768"> The cost of individual character transcoding is avoided whenever a block of input is
399            confined to the ASCII subset and for all but the first occurrence of any XML element or
400            attribute name. Furthermore, when transcoding is required, the parallel bitstream
401            representation supports efficient transcoding operations. In the important case of UTF-8
402            to UTF-16 transcoding, the corresponding UTF-16 bitstreams can be calculated in bit
403            parallel fashion based on UTF-8 streams \cite{Cameron2008}, and all but the final bytes
404            of multi-byte sequences can be marked for deletion as discussed in the following
405            subsection. In other cases, transcoding within a block only need be applied for
406            non-ASCII bytes, which are conveniently identified by iterating through the bit 0 stream
407            using bit scan operations. </p>
408</div>
409<div class="section" id="idp398192">
410<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
411<p id="idp398880"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
412            bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two
413            Chinese characters <code class="code">䜠奜</code> are represented as two
414            three-byte UTF-8 sequences <code class="code">E4 BD A0</code> and <code class="code">E5 A5 BD</code> while the
415            UTF-16 representation must be compressed down to the two code units <code class="code">4F60</code>
416            and <code class="code">597D</code>. In the bit parallel representation, this corresponds to a
417            reduction from six bit positions representing UTF-8 code units (bytes) down to just two
418            bit positions representing UTF-16 code units (double bytes). This compression may be
419            achieved by arranging to calculate the correct UTF-16 bits at the final position of each
420            sequence and creating a deletion mask to mark the first two bytes of each 3-byte
421            sequence for deletion. In this case, the portion of the mask corresponding to these
422            input bytes is the bit sequence <code class="code">110110</code>. Using this approach, transcoding
423            may then be completed by applying parallel deletion and inverse transposition of the
424            UTF-16 bitstreams\cite{Cameron2008}. </p>
425<p id="idp403040">
426           
427           
428           
429           
430           
431           
432           
433           
434           
435         </p>
436<p id="idp406976"> Rather than immediately paying the costs of deletion and transposition just for
437            transcoding, however, icXML defers these steps so that the deletion masks for several
438            stages of processing may be combined. In particular, this includes core XML requirements
439            to normalize line breaks and to replace character reference and entity references by
440            their corresponding text. In the case of line break normalization, all forms of line
441            breaks, including bare carriage returns (CR), line feeds (LF) and CR-LF combinations
442            must be normalized to a single LF character in each case. In icXML, this is achieved by
443            first marking CR positions, performing two bit parallel operations to transform the
444            marked CRs into LFs, and then marking for deletion any LF that is found immediately
445            after the marked CR as shown by the Pablo source code in Figure
446            \ref{fig:LBnormalization}.
447           
448         </p>
449<p id="idp409616"> In essence, the deletion masks for transcoding and for line break normalization each
450            represent a bitwise filter; these filters can be combined using bitwise-or so that the
451            parallel deletion algorithm need only be applied once. </p>
452<p id="idp410800"> A further application of combined filtering is the processing of XML character and
453            entity references. Consider, for example, the references <code class="code">&amp;</code> or
454               <code class="code">&lt;</code>. which must be replaced in XML processing with the single
455               <code class="code">&amp;</code> and <code class="code">&lt;</code> characters, respectively. The
456            approach in icXML is to mark all but the first character positions of each reference for
457            deletion, leaving a single character position unmodified. Thus, for the references
458               <code class="code">&amp;</code> or <code class="code">&lt;</code> the masks <code class="code">01111</code> and
459               <code class="code">011111</code> are formed and combined into the overall deletion mask. After the
460            deletion and inverse transposition operations are finally applied, a post-processing
461            step inserts the proper character at these positions. One note about this process is
462            that it is speculative; references are assumed to generally be replaced by a single
463            UTF-16 code unit. In the case, that this is not true, it is addressed in
464            post-processing. </p>
465<p id="idp415520"> The final step of combined filtering occurs during the process of reducing markup
466            data to tag bytes preceding each significant XML transition as described in
467            section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying
468            operations for each of the these filtering steps, paying the cost of parallel deletion
469            and inverse transposition only once. Currently, icXML employs the parallel-prefix
470            compress algorithm of Steele~\cite{HackersDelight} Performance is independent of the
471            number of positions deleted. Future versions of icXML are expected to take advantage of
472            the parallel extract operation~\cite{HilewitzLee2006} that Intel is now providing in its
473            Haswell architecture. </p>
474</div>
475<div class="section" id="idp416848">
476<h3 class="title" style="clear: both">Content Stream</h3>
477<p id="idp417520"> A relatively-unique concept for icXML is the use of a filtered content stream.
478            Rather that parsing an XML document in its original format, the input is transformed
479            into one that is easier for the parser to iterate through and produce the sequential
480            output. In , the source data
481           
482            is transformed into
483           
484            through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p>
485<p id="idp419920"> Combined with the symbol stream, the parser traverses the content stream to
486            effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following
487               <code class="code">&gt;</code> indicates that a start tag without any attributes is the first
488            element in this text and the first unused symbol, <code class="code">document</code>, is the element
489            name. Succeeding that is the content string <code class="code">fee</code>, which is null-terminated
490            in accordance with the Xerces API specification. Unlike Xerces, no memory-copy
491            operations are required to produce these strings, which as
492            Figure~\ref{fig:xerces-profile} shows accounts for 6.83% of Xerces's execution time.
493            Additionally, it is cheap to locate the terminal character of each string: using the
494            String End bitstream, the Parabix Subsystem can effectively calculate the offset of each
495            null character in the content stream in parallel, which in turn means the parser can
496            directly jump to the end of every string without scanning for it. </p>
497<p id="idp423312"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
498            existence of an attribute. Because all of the intra-element was performed in the Parabix
499            Subsystem, this must be a legal attribute. Since attributes can only occur within start
500            tags and must be accompanied by a textual value, the next symbol in the symbol stream
501            must be the element name of a start tag, and the following one must be the name of the
502            attribute and the string that follows the <code class="code">=</code> must be its value. However, the
503            subsequent <code class="code">=</code> is not treated as an independent attribute because the parser
504            has yet to read a <code class="code">&gt;</code>, which marks the end of a start tag. Thus only
505            one symbol is taken from the symbol stream and it (along with the string value) is added
506            to the element. Eventually the parser reaches a <code class="code">/</code>, which marks the
507            existence of an end tag. Every end tag requires an element name, which means they
508            require a symbol. Inter-element validation whenever an empty tag is detected to ensure
509            that the appropriate scope-nesting rules have been applied. </p>
510</div>
511<div class="section" id="idp427456">
512<h3 class="title" style="clear: both">Namespace Handling</h3>
513<p id="idp428544"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
514            together. It is especially important when a vocabulary application-dependant meaning,
515            such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound
516            to uniform resource identifiers (URIs), which are strings used to identify specific
517            names or resources. On line 1 of Figure \ref{fig:namespace1}, the <code class="code">xmlns</code>
518            attribute instructs the XML processor to bind the prefix <code class="code">p</code> to the URI
519               '<code class="code">pub.net</code>' and the default (empty) prefix to
520               <code class="code">book.org</code>. Thus to the XML processor, the <code class="code">title</code> on line 2
521            and <code class="code">price</code> on line 4 both read as
522            <code class="code">"book.org":title</code> and
523               <code class="code">"book.org":price</code> respectively, whereas on line 3 and
524            5, <code class="code">p:name</code> and <code class="code">price</code> are seen as
525               <code class="code">"pub.net":name</code> and
526               <code class="code">"pub.net":price</code>. Even though the actual element name
527               <code class="code">price</code>, due to namespace scoping rules they are viewed as two
528            uniquely-named items because the current vocabulary is determined by the namespace(s)
529            that are in-scope. </p>
530<p id="idp435712">
531           
532         </p>
533<p id="idp436256"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
534            persist for the lifetime of the application through the use of a global URI pool. Xerces
535            maintains a stack of namespace scopes that is pushed (popped) every time a start tag
536            (end tag) occurs in the document. Because a namespace declaration affects the entire
537            element, it must be processed prior to grammar validation. This is a costly process
538            considering that a typical namespaced XML document only comes in one of two forms: (1)
539            those that declare a set of namespaces upfront and never change them, and (2) those that
540            repeatedly modify the namespaces in predictable patterns. </p>
541<p id="idp438720"> For that reason, icXML contains an independent namespace stack and utilizes bit
542            vectors to cheaply perform
543             When a prefix is
544            declared (e.g., <code class="code">xmlns:p="pub.net"</code>), a namespace binding
545            is created that maps the prefix (which are assigned Prefix IDs in the symbol resolution
546            process) to the URI. Each unique namespace binding has a unique namespace id (NSID) and
547            every prefix contains a bit vector marking every NSID that has ever been associated with
548            it within the document. For example, in Table \ref{tbl:namespace1}, the prefix binding
549            set of <code class="code">p</code> and <code class="code">xmlns</code> would be <code class="code">01</code> and
550            <code class="code">11</code> respectively. To resolve the in-scope namespace binding for each prefix,
551            a bit vector of the currently visible namespaces is maintained by the system. By ANDing
552            the prefix bit vector with the currently visible namespaces, the in-scope NSID can be
553            found using a bit-scan intrinsic. A namespace binding table, similar to Table
554            \ref{tbl:namespace1}, provides the actual URI ID. </p>
555<p id="idp443152">
556           
557         </p>
558<p id="idp443696">
559           
560           
561           
562           
563         </p>
564<p id="idp446144"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
565            any modification to the currently visible namespaces is calculated and stored within a
566            stack of bit vectors denoting the locally modified namespace bindings. When an end tag
567            is found, the currently visible namespaces is XORed with the vector at the top of the
568            stack. This allows any number of changes to be performed at each scope-level with a
569            constant time.
570           
571         </p>
572</div>
573<div class="section" id="idp447600">
574<h3 class="title" style="clear: both">Error Handling</h3>
575<p id="idp448272">
576           
577            Xerces outputs error messages in two ways: through the programmer API and as thrown
578            objects for fatal errors. As Xerces parses a file, it uses context-dependant logic to
579            assess whether the next character is legal; if not, the current state determines the
580            type and severity of the error. icXML emits errors in the similar manner—but
581            how it discovers them is substantially different. Recall that in Figure
582            \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and
583            Markup Processor, each with its own system for detecting and producing error messages. </p>
584<p id="idp449904"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
585            a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking
586            the byte-position of an error within a block, and the type of error is determined by the
587            equation that discovered it. The difficulty of error processing in this section is that
588            in Xerces the line and column number must be given with every error production. Two
589            major issues exist because of this: (1) line position adheres to XML white-normalization
590            rules; as such, some sequences of characters, e.g., a carriage return followed by a line
591            feed, are counted as a single new line character. (2) column position is counted in
592            characters, not bytes or code units; thus multi-code-unit code-points and surrogate
593            character pairs are all counted as a single column position. Note that typical XML
594            documents are error-free but the calculation of the line/column position is a constant
595            overhead in Xerces.  To
596            reduce this, icXML pushes the bulk cost of the line/column calculation to the occurrence
597            of the error and performs the minimal amount of book-keeping necessary to facilitate it.
598            icXML leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates
599            the information within the Line Column Tracker (LCT). One of the CSA's major
600            responsibilities is transcoding an input text.
601             During this process,
602            white-space normalization rules are applied and multi-code-unit and surrogate characters
603            are detected and validated. A <span class="ital">line-feed bitstream</span>,
604            which marks the positions of the normalized new lines characters, is a natural
605            derivative of this process. Using an optimized population count algorithm, the line
606            count can be summarized cheaply for each valid block of text.
607             Column position is more
608            difficult to calculate. It is possible to scan backwards through the bitstream of new
609            line characters to determine the distance (in code-units) between the position between
610            which an error was detected and the last line feed. However, this distance may exceed
611            than the actual character position for the reasons discussed in (2). To handle this, the
612            CSA generates a <span class="ital">skip mask</span> bitstream by ORing together
613            many relevant bitstreams, such as all trailing multi-code-unit and surrogate characters,
614            and any characters that were removed during the normalization process. When an error is
615            detected, the sum of those skipped positions is subtracted from the distance to
616            determine the actual column number. </p>
617<p id="idp455392"> The Markup Processor is a state-driven machine. As such, error detection within it
618            is very similar to Xerces. However, reporting the correct line/column is a much more
619            difficult problem. The Markup Processor parses the content stream, which is a series of
620            tagged UTF-16 strings. Each string is normalized in accordance with the XML
621            specification. All symbol data and unnecessary whitespace is eliminated from the stream;
622            thus its impossible to derive the current location using only the content stream. To
623            calculate the location, the Markup Processor borrows three additional pieces of
624            information from the Parabix Subsystem: the line-feed, skip mask, and a <span class="ital">deletion mask stream</span>, which is a bitstream denoting the
625            (code-unit) position of every datum that was suppressed from the source during the
626            production of the content stream. Armed with these, it is possible to calculate the
627            actual line/column using the same system as the Parabix Subsystem until the sum of the
628            negated deletion mask stream is equal to the current position. </p>
629</div>
630</div>
631<div class="section" id="idp458640">
632<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
633<p id="idp459280"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM
634         application. These are "embarrassingly
635         sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to
636         parallelize. However, icXML is designed to organize processing into logical layers. In
637         particular, layers within the Parabix Subsystem are designed to operate over significant
638         segments of input data before passing their outputs on for subsequent processing. This fits
639         well into the general model of pipeline parallelism, in which each thread is in charge of a
640         single module or group of modules. </p>
641<p id="idp461136"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
642         and the Markup Processor into distinct logical layers into two separate stages. The
643         resultant application, <span class="ital">icXML-p</span>, is a course-grained
644         software-pipeline application. In this case, the Parabix Subsystem thread
645               <code class="code">T<sub>1</sub></code> reads 16k of XML input <code class="code">I</code> at a
646         time and produces the content, symbol and URI streams, then stores them in a pre-allocated
647         shared data structure <code class="code">S</code>. The Markup Processor thread
648            <code class="code">T<sub>2</sub></code> consumes <code class="code">S</code>, performs well-formedness
649         and grammar-based validation, and the provides parsed XML data to the application through
650         the Xerces API. The shared data structure is implemented using a ring buffer, where every
651         entry contains an independent set of data streams. In the examples of Figure
652         \ref{threads_timeline1} and \ref{threads_timeline2}, the ring buffer has four entries. A
653         lock-free mechanism is applied to ensure that each entry can only be read or written by one
654         thread at the same time. In Figure \ref{threads_timeline1} the processing time of
655               <code class="code">T<sub>1</sub></code> is longer than
656         <code class="code">T<sub>2</sub></code>; thus <code class="code">T<sub>2</sub></code> always
657         waits for <code class="code">T<sub>1</sub></code> to write to the shared memory. Figure
658         \ref{threads_timeline2} illustrates the scenario in which
659         <code class="code">T<sub>1</sub></code> is faster and must wait for
660            <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can
661         reuse the memory space. </p>
662<p id="idp470208">
663         
664      </p>
665<p id="idp470752"> Overall, our design is intended to benefit a range of applications. Conceptually, we
666         consider two design points. The first, the parsing performed by the Parabix Subsystem
667         dominates at 67% of the overall cost, with the cost of application processing (including
668         the driver logic within the Markup Processor) at 33%. The second is almost the opposite
669         scenario, the cost of application processing dominates at 60%, while the cost of XML
670         parsing represents an overhead of 40%. </p>
671<p id="idp472672"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
672         100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement
673         of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the
674         total application cost. In this case, the single-threaded icXML should achieve a 1.5x
675         speedup over Xerces so that the total application cost reduces to 67% of the original.
676         However, in icXML-p, our ideal scenario gives us two well-balanced threads each performing
677         about 33% of the original work. In this case, Amdahl's law predicts that we could expect up
678         to a 3x speedup at best. </p>
679<p id="idp473792"> At the other extreme of our design range, we consider an application in which core
680         parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the
681         corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most
682         significant aspect of our two-stage multi-threaded design then becomes the ability to hide
683         the entire latency of parsing within the serial time required by the application. In this
684         case, we achieve an overall speedup in processing time by 1.67x. </p>
685<p id="idp474736"> Although the structure of the Parabix Subsystem allows division of the work into
686         several pipeline stages and has been demonstrated to be effective for four pipeline stages
687         in a research prototype \cite{HPCA2012}, our analysis here suggests that the further
688         pipelining of work within the Parabix Subsystem is not worthwhile if the cost of
689         application logic is little as 33% of the end-to-end cost using Xerces. To achieve benefits
690         of further parallelization with multi-core technology, there would need to be reductions in
691         the cost of application logic that could match reductions in core parsing cost. </p>
692</div>
693<div class="section" id="idp475920">
694<h2 class="title" style="clear: both">Performance</h2>
695<p id="idp476592"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
696         Xerces C++ SAXCount sample application, and a real world GML to SVG transformation
697         application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy
698         Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core), 32+32 kB (per core)
699         L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu
700         12.04 (Linux). </p>
701<p id="idp477504"> We analyzed the execution profiles of each XML parser using the performance counters
702         found in the processor. We chose several key hardware events that provide insight into the
703         profile of each application and indicate if the processor is doing useful work. The set of
704         events included in our study are: processor cycles, branch instructions, branch
705         mispredictions, and cache misses. The Performance Application Programming Interface (PAPI)
706         Version 5.5.0 \cite{papi} toolkit was installed on the test system to facilitate the
707         collection of hardware performance monitoring statistics. In addition, we used the Linux
708         perf \cite{perf} utility to collect per core hardware events. </p>
709<div class="section" id="idp478640">
710<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
711<p id="idp479312"> Xerces comes with sample applications that demonstrate salient features of the
712            parser. SAXCount is the simplest such application: it counts the elements, attributes
713            and characters of a given XML file using the (event based) SAX API and prints out the
714            totals. </p>
715<p id="idp480016"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files
716            selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented
717            XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the
718            UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML
719            documents and consist entirely of single byte encoded ASCII characters.
720  <div class="table-wrapper" id="idp480752">
721<p class="title">Table II</p>
722<div class="caption"><p id="idp481264">XML Document Characteristics</p></div>
723<table class="table">
724<colgroup span="1">
725<col align="left" valign="top" span="1">
726<col align="left" valign="top" span="1">
727<col align="left" valign="top" span="1">
728<col align="left" valign="top" span="1">
729<col align="left" valign="top" span="1">
730</colgroup>
731<tbody>
732<tr>
733<td>File Name           </td>
734<td> jaw.xml            </td>
735<td> road.gml   </td>
736<td> po.xml     </td>
737<td> soap.xml </td>
738</tr>
739<tr>
740<td>File Type           </td>
741<td> document           </td>
742<td> data               </td>
743<td> data               </td>
744<td> data        </td>
745</tr>
746<tr>
747<td>File Size (kB)              </td>
748<td> 7343                       </td>
749<td> 11584      </td>
750<td> 76450              </td>
751<td> 2717 </td>
752</tr>
753<tr>
754<td>Markup Item Count   </td>
755<td> 74882              </td>
756<td> 280724     </td>
757<td> 4634110    </td>
758<td> 18004 </td>
759</tr>
760<tr>
761<td>Markup Density              </td>
762<td> 0.13                       </td>
763<td> 0.57       </td>
764<td> 0.76               </td>
765<td> 0.87       </td>
766</tr>
767</tbody>
768</table>
769</div>           
770</p>
771<p id="idp496800"> A key predictor of the overall parsing performance of an XML file is markup
772            density\footnote{ Markup Density: the ratio of markup bytes used to define the structure
773            of the document vs. its file size.}. This metric has substantial influence on the
774            performance of traditional recursive descent XML parsers because it directly corresponds
775            to the number of state transitions that occur when parsing a document. We use a mixture
776            of document-oriented and data-oriented XML files to analyze performance over a spectrum
777            of markup densities. </p>
778<p id="idp497808"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML
779            in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over
780            Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve
781            speedup up to 2.7x. Xerces is substantially slowed by dense markup but icXML is less
782            affected through a reduction in branches and the use of parallel-processing techniques.
783            icXML-p performs better as markup-density increases because the work performed by each
784            stage is well balanced in this application. </p>
785<p id="idp498848">
786           
787         </p>
788</div>
789<div class="section" id="idp499872">
790<h3 class="title" style="clear: both">GML2SVG</h3>
791<p id="idp500512"></p>
792</div>
793</div>
794<div class="section" id="idp501024">
795<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
796<p id="idp501712"> This paper is the first case study documenting the significant performance benefits
797         that may be realized through the integration of parallel bitstream technology into existing
798         widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined
799         integration of SIMD and multicore parallelism was shown capable of dramatic producing
800         dramatic increases in throughput and reductions in branch mispredictions and cache misses.
801         The modified parser, going under the name icXML is designed to provide the full
802         functionality of the original Xerces library with complete compatibility of APIs. Although
803         substantial re-engineering was required to realize the performance potential of parallel
804         technologies, this is an important case study demonstrating the general feasibility of
805         these techniques. </p>
806<p id="idp502992"> The further development of icXML to move beyond 2-stage pipeline parallelism is
807         ongoing, with realistic prospects for four reasonably balanced stages within the library.
808         For applications such as GML2SVG which are dominated by time spent on XML parsing, such a
809         multistage pipelined parsing library should offer substantial benefits. </p>
810<p id="idp503760"> The example of XML parsing may be considered prototypical of finite-state machines
811         applications which have sometimes been considered "embarassingly
812         sequential" and so difficult to parallelize that "nothing
813         works." So the case study presented here should be considered an important data
814         point in making the case that parallelization can indeed be helpful across a broad array of
815         application types. </p>
816<p id="idp505136"> To overcome the software engineering challenges in applying parallel bitstream
817         technology to existing software systems, it is clear that better library and tool support
818         is needed. The techniques used in the implementation of icXML and documented in this paper
819         could well be generalized for applications in other contexts and automated through the
820         creation of compiler technology specifically supporting parallel bitstream programming.
821      </p>
822</div>
823<div class="bibliography" id="idp506624">
824<h2 class="title" style="clear:both">Bibliography</h2>
825<p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and
826         Eric Lemoine 2009. The XML chip at 6 years. Proceedings of International Symposium on
827         Processing XML Efficiently 2009, Montréal.</p>
828<p class="bibliomixed" id="Datapower09">[Salz, Achilles and Maze 2009] Salz, Richard,
829         Heather Achilles, and David Maze. 2009. Hardware and software trade-offs in the IBM
830         DataPower XML XG4 processor card. Proceedings of International Symposium on Processing XML
831         Efficiently 2009, Montréal.</p>
832<p class="bibliomixed" id="PPoPP08">[Cameron 2007] Cameron, Robert D. 2007. A Case Study
833         in SIMD Text Processing with Parallel Bit Streams UTF-8 to UTF-16 Transcoding. Proceedings
834         of 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008, Salt
835         Lake City, Utah. On the Web at <a href="http://research.ihost.com/ppopp08/" class="link" target="_new">http://research.ihost.com/ppopp08/</a>.</p>
836<p class="bibliomixed" id="CASCON08">[Cameron, Herdy and Lin 2008] Cameron, Robert D.,
837         Kenneth S Herdy, and Dan Lin. 2008. High Performance XML Parsing Using Parallel Bit Stream
838         Technology. Proceedings of CASCON 2008. 13th ACM SIGPLAN Symposium on Principles and
839         Practice of Parallel Programming 2008, Toronto.</p>
840<p class="bibliomixed" id="SVGOpen08">[Herdy, Burggraf and Cameron 2008] Herdy, Kenneth
841         S., Robert D. Cameron and David S. Burggraf. 2008. High Performance GML to SVG
842         Transformation for the Visual Presentation of Geographic Data in Web-Based Mapping Systems.
843         Proceedings of SVG Open 6th International Conference on Scalable Vector Graphics,
844         Nuremburg. On the Web at
845            <a href="http://www.svgopen.org/2008/papers/74-HighPerformance_GML_to_SVG_Transformation_for_the_Visual_Presentation_of_Geographic_Data_in_WebBased_Mapping_Systems/" class="link" target="_new">http://www.svgopen.org/2008/papers/74-HighPerformance_GML_to_SVG_Transformation_for_the_Visual_Presentation_of_Geographic_Data_in_WebBased_Mapping_Systems/</a>.</p>
846<p class="bibliomixed" id="Ross06">[Ross 2006] Ross, Kenneth A. 2006. Efficient hash
847         probes on modern processors. Proceedings of ICDE, 2006. ICDE 2006, Atlanta. On the Web at
848            <a href="www.cs.columbia.edu/~kar/pubsk/icde2007.pdf" class="link" target="_new">www.cs.columbia.edu/~kar/pubsk/icde2007.pdf</a>.</p>
849<p class="bibliomixed" id="ASPLOS09">[Cameron and Lin 2009] Cameron, Robert D. and Dan
850         Lin. 2009. Architectural Support for SWAR Text Processing with Parallel Bit Streams: The
851         Inductive Doubling Principle. Proceedings of ASPLOS 2009, Washington, DC.</p>
852<p class="bibliomixed" id="Wu08">[Wu et al. 2008] Wu, Yu, Qi Zhang, Zhiqiang Yu and
853         Jianhui Li. 2008. A Hybrid Parallel Processing for XML Parsing and Schema Validation.
854         Proceedings of Balisage 2008, Montréal. On the Web at
855            <a href="http://www.balisage.net/Proceedings/vol1/html/Wu01/BalisageVol1-Wu01.html" class="link" target="_new">http://www.balisage.net/Proceedings/vol1/html/Wu01/BalisageVol1-Wu01.html</a>.</p>
856<p class="bibliomixed" id="u8u16">[Cameron 2008] u8u16 - A High-Speed UTF-8 to UTF-16
857         Transcoder Using Parallel Bit Streams Technical Report 2007-18. 2007. School of Computing
858         Science Simon Fraser University, June 21 2007.</p>
859<p class="bibliomixed" id="XML10">[XML 1.0] Extensible Markup Language (XML) 1.0 (Fifth
860         Edition) W3C Recommendation 26 November 2008. On the Web at
861            <a href="http://www.w3.org/TR/REC-xml/" class="link" target="_new">http://www.w3.org/TR/REC-xml/</a>.</p>
862<p class="bibliomixed" id="Unicode">[Unicode] The Unicode Consortium. 2009. On the Web at
863            <a href="http://unicode.org/" class="link" target="_new">http://unicode.org/</a>.</p>
864<p class="bibliomixed" id="Pex06">[Hilewitz and Lee 2006]  Hilewitz, Y. and Ruby B. Lee.
865         2006. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit
866         Instructions. Proceedings of the IEEE 17th International Conference on Application-Specific
867         Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006.</p>
868<p class="bibliomixed" id="InfoSet">[XML Infoset] XML Information Set (Second Edition) W3C
869         Recommendation 4 February 2004. On the Web at
870         <a href="http://www.w3.org/TR/xml-infoset/" class="link" target="_new">http://www.w3.org/TR/xml-infoset/</a>.</p>
871<p class="bibliomixed" id="Saxon">[Saxon] SAXON The XSLT and XQuery Processor. On the Web
872         at <a href="http://saxon.sourceforge.net/" class="link" target="_new">http://saxon.sourceforge.net/</a>.</p>
873<p class="bibliomixed" id="Kay08">[Kay 2008]  Kay, Michael Y. 2008. Ten Reasons Why Saxon
874         XQuery is Fast, IEEE Data Engineering Bulletin, December 2008.</p>
875<p class="bibliomixed" id="AElfred">[Ælfred]  The Ælfred XML Parser. On the Web at
876            <a href="http://saxon.sourceforge.net/aelfred.html" class="link" target="_new">http://saxon.sourceforge.net/aelfred.html</a>.</p>
877<p class="bibliomixed" id="JNI">[Hitchens 2002] Hitchens, Ron. Java NIO. O'Reilly, 2002.</p>
878<p class="bibliomixed" id="Expat">[Expat] The Expat XML Parser.
879            <a href="http://expat.sourceforge.net/" class="link" target="_new">http://expat.sourceforge.net/</a>.</p>
880</div>
881</div>
882<div id="balisage-footer"><h3 style="font-family: serif; margin:0.25em">
883<i>Balisage:</i> <small>The Markup Conference</small>
884</h3></div>
885</body>
886</html>
Note: See TracBrowser for help on using the repository browser.