Changeset 3053
- Timestamp:
- Apr 19, 2013, 4:07:55 PM (6 years ago)
- Location:
- docs/Balisage13/Bal2013came0601
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Balisage13/Bal2013came0601/Bal2013came0601.html
r3052 r3053 274 274 </div> 275 275 <div id="mast"><div class="content"> 276 <h2 class="article-title" id="idp 74240"></h2>276 <h2 class="article-title" id="idp66752"></h2> 277 277 <div class="author"> 278 278 <h3 class="author">Nigel Medforth</h3> … … 324 324 </div> 325 325 <div class="mast-box"> 326 <p class="title"><a href="javascript:toggle('idp 75360')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75360"></a> <span onclick="javascript:toggle('idp75360');return true">Abstract</span></p>327 <div class="folder" id="folder-idp 75360" style="display:none"><p id="idp75664">Prior research on the acceleration of XML processing using SIMD and multi-core326 <p class="title"><a href="javascript:toggle('idp67872')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67872"></a> <span onclick="javascript:toggle('idp67872');return true">Abstract</span></p> 327 <div class="folder" id="folder-idp67872" style="display:none"><p id="idp68176">Prior research on the acceleration of XML processing using SIMD and multi-core 328 328 parallelism has lead to a number of interesting research prototypes. This work 329 329 investigates the extent to which the techniques underlying these prototypes could result … … 339 339 <p><b>Table of Contents</b></p> 340 340 <dl> 341 <dt><span class="section"><a href="#idp2 84144" class="toc">Introduction</a></span></dt>342 <dt><span class="section"><a href="#idp2 85936" class="toc">Background</a></span></dt>341 <dt><span class="section"><a href="#idp275888" class="toc">Introduction</a></span></dt> 342 <dt><span class="section"><a href="#idp277680" class="toc">Background</a></span></dt> 343 343 <dd><dl> 344 <dt><span class="section"><a href="#idp2 86576" class="toc">Xerces C++ Structure</a></span></dt>345 <dt><span class="section"><a href="#idp3 30416" class="toc">The Parabix Framework</a></span></dt>346 <dt><span class="section"><a href="#idp4 04448" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>344 <dt><span class="section"><a href="#idp278320" class="toc">Xerces C++ Structure</a></span></dt> 345 <dt><span class="section"><a href="#idp321360" class="toc">The Parabix Framework</a></span></dt> 346 <dt><span class="section"><a href="#idp411088" class="toc">Sequential vs. Parallel Paradigm</a></span></dt> 347 347 </dl></dd> 348 <dt><span class="section"><a href="#idp4 08832" class="toc">Architecture</a></span></dt>348 <dt><span class="section"><a href="#idp415472" class="toc">Architecture</a></span></dt> 349 349 <dd><dl> 350 <dt><span class="section"><a href="#idp4 09472" class="toc">Overview</a></span></dt>351 <dt><span class="section"><a href="#idp4 37120" class="toc">Character Set Adapters</a></span></dt>352 <dt><span class="section"><a href="#idp4 46016" class="toc">Combined Parallel Filtering</a></span></dt>353 <dt><span class="section"><a href="#idp4 74496" class="toc">Content Stream</a></span></dt>354 <dt><span class="section"><a href="#idp4 85104" class="toc">Namespace Handling</a></span></dt>355 <dt><span class="section"><a href="#idp52 8320" class="toc">Error Handling</a></span></dt>350 <dt><span class="section"><a href="#idp416112" class="toc">Overview</a></span></dt> 351 <dt><span class="section"><a href="#idp444144" class="toc">Character Set Adapters</a></span></dt> 352 <dt><span class="section"><a href="#idp451856" class="toc">Combined Parallel Filtering</a></span></dt> 353 <dt><span class="section"><a href="#idp467808" class="toc">Content Stream</a></span></dt> 354 <dt><span class="section"><a href="#idp478528" class="toc">Namespace Handling</a></span></dt> 355 <dt><span class="section"><a href="#idp521984" class="toc">Error Handling</a></span></dt> 356 356 </dl></dd> 357 <dt><span class="section"><a href="#idp53 8544" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>358 <dt><span class="section"><a href="#idp5 60656" class="toc">Performance</a></span></dt>357 <dt><span class="section"><a href="#idp532208" class="toc">Multithreading with Pipeline Parallelism</a></span></dt> 358 <dt><span class="section"><a href="#idp554320" class="toc">Performance</a></span></dt> 359 359 <dd><dl> 360 <dt><span class="section"><a href="#idp5 63376" class="toc">Xerces C++ SAXCount</a></span></dt>361 <dt><span class="section"><a href="#idp58 8176" class="toc">GML2SVG</a></span></dt>360 <dt><span class="section"><a href="#idp557040" class="toc">Xerces C++ SAXCount</a></span></dt> 361 <dt><span class="section"><a href="#idp581280" class="toc">GML2SVG</a></span></dt> 362 362 </dl></dd> 363 <dt><span class="section"><a href="#idp 605296" class="toc">Conclusion and Future Work</a></span></dt>363 <dt><span class="section"><a href="#idp598304" class="toc">Conclusion and Future Work</a></span></dt> 364 364 </dl> 365 365 </div> 366 366 <div class="mast-box"> 367 <p class="title"><a href="javascript:toggle('idp 77088')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77088"></a> <span onclick="javascript:toggle('idp77088');return true">Nigel Medforth</span></p>368 <div class="folder" id="folder-idp 77088" style="display:none">367 <p class="title"><a href="javascript:toggle('idp69600')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp69600"></a> <span onclick="javascript:toggle('idp69600');return true">Nigel Medforth</span></p> 368 <div class="folder" id="folder-idp69600" style="display:none"> 369 369 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>></code></h5> 370 370 <div class="affiliation"> … … 377 377 </div> 378 378 <div class="personblurb"> 379 <p id="idp5 8880">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead379 <p id="idp51296">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead 380 380 developer of icXML. He earned a Bachelor of Technology in Information Technology at 381 381 Kwantlen Polytechnic University in 2009 and was awarded the Deanâs Medal for 382 382 Outstanding Achievement.</p> 383 <p id="idp5 9888">Nigel is currently researching ways to leverage both the Parabix framework and383 <p id="idp52304">Nigel is currently researching ways to leverage both the Parabix framework and 384 384 stream-processing models to further accelerate XML parsing within icXML.</p> 385 385 </div> … … 387 387 </div> 388 388 <div class="mast-box"> 389 <p class="title"><a href="javascript:toggle('idp 63552')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63552"></a> <span onclick="javascript:toggle('idp63552');return true">Dan Lin</span></p>390 <div class="folder" id="folder-idp 63552" style="display:none">389 <p class="title"><a href="javascript:toggle('idp55968')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp55968"></a> <span onclick="javascript:toggle('idp55968');return true">Dan Lin</span></p> 390 <div class="folder" id="folder-idp55968" style="display:none"> 391 391 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>></code></h5> 392 392 <div class="affiliation"> … … 394 394 <p class="orgname">Simon Fraser University </p> 395 395 </div> 396 <div class="personblurb"><p id="idp 65264">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science396 <div class="personblurb"><p id="idp57680">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science 397 397 in Computing Science at Simon Fraser University in 2010. Her research focus on on high 398 398 performance algorithms that exploit parallelization strategies on various multicore platforms. … … 401 401 </div> 402 402 <div class="mast-box"> 403 <p class="title"><a href="javascript:toggle('idp6 7824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67824"></a> <span onclick="javascript:toggle('idp67824');return true">Kenneth Herdy</span></p>404 <div class="folder" id="folder-idp6 7824" style="display:none">403 <p class="title"><a href="javascript:toggle('idp60240')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp60240"></a> <span onclick="javascript:toggle('idp60240');return true">Kenneth Herdy</span></p> 404 <div class="folder" id="folder-idp60240" style="display:none"> 405 405 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>></code></h5> 406 406 <div class="affiliation"> … … 409 409 </div> 410 410 <div class="personblurb"> 411 <p id="idp6 9552"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information411 <p id="idp61968"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information 412 412 Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor 413 413 of Science in Computing Science with a Certificate in Spatial Information Systems at 414 414 Simon Fraser University in 2005. </p> 415 <p id="idp2 71088"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser415 <p id="idp262752"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser 416 416 University with industrial scholarship support from the Natural Sciences and 417 417 Engineering Research Council of Canada, the Mathematics of Information Technology and … … 423 423 </div> 424 424 <div class="mast-box"> 425 <p class="title"><a href="javascript:toggle('idp2 73824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp273824"></a> <span onclick="javascript:toggle('idp273824');return true">Rob Cameron</span></p>426 <div class="folder" id="folder-idp2 73824" style="display:none">425 <p class="title"><a href="javascript:toggle('idp265488')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp265488"></a> <span onclick="javascript:toggle('idp265488');return true">Rob Cameron</span></p> 426 <div class="folder" id="folder-idp265488" style="display:none"> 427 427 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>></code></h5> 428 428 <div class="affiliation"> … … 434 434 <p class="orgname">International Characters, Inc.</p> 435 435 </div> 436 <div class="personblurb"><p id="idp2 75488">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied436 <div class="personblurb"><p id="idp267152">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied 437 437 Sciences at Simon Fraser University. His research interests include programming 438 438 language and software system technology, with a specific focus on high performance … … 450 450 <div id="main"> 451 451 <div class="article"> 452 <h2 class="article-title" id="idp 74240"></h2>453 <div class="section" id="idp2 84144">452 <h2 class="article-title" id="idp66752"></h2> 453 <div class="section" id="idp275888"> 454 454 <h2 class="title" style="clear: both">Introduction</h2> 455 <p id="idp2 84784"></p>456 <p id="idp2 85040"></p>457 <p id="idp2 85296"></p>458 <p id="idp2 85552"></p>459 </div> 460 <div class="section" id="idp2 85936">455 <p id="idp276528"></p> 456 <p id="idp276784"></p> 457 <p id="idp277040"></p> 458 <p id="idp277296"></p> 459 </div> 460 <div class="section" id="idp277680"> 461 461 <h2 class="title" style="clear: both">Background</h2> 462 <div class="section" id="idp2 86576">462 <div class="section" id="idp278320"> 463 463 <h3 class="title" style="clear: both">Xerces C++ Structure</h3> 464 <p id="idp2 87216"> The Xerces C++ parser465 466 467 features comprehensive support for a variety of character encodings both464 <p id="idp278960"> The Xerces C++ parser is a widely-used standards-conformant 465 XML parser produced as open-source software 466 by the Apache Software Foundation. 467 It features comprehensive support for a variety of character encodings both 468 468 commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple 469 469 XML vocabularies through the XML namespace mechanism, as well as complete … … 473 473 parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM 474 474 tree-based parsing interface. </p> 475 <p id="idp289440"> 476 477 478 Xerces, 475 <p id="idp280224"> 476 Xerces, 479 477 like all traditional parsers, processes XML documents sequentially a byte-at-a-time from 480 478 the first to the last byte of input data. Each byte passes through several processing … … 492 490 expected that a comprehensive restructuring is required, involving all aspects of the 493 491 parser. </p> 494 <div class="table-wrapper" id="idp2 92400">492 <div class="table-wrapper" id="idp283344"> 495 493 <p class="title">Table I</p> 496 <div class="caption"><p id="idm8 48128">Execution Time of Top 10 Xerces Functions</p></div>494 <div class="caption"><p id="idm855552">Execution Time of Top 10 Xerces Functions</p></div> 497 495 <table class="table"> 498 496 <colgroup span="1"> … … 549 547 </div> 550 548 </div> 551 <div class="section" id="idp3 30416">549 <div class="section" id="idp321360"> 552 550 <h3 class="title" style="clear: both">The Parabix Framework</h3> 553 <p id="idp3 31088"> The Parabix (parallel bit stream) framework is a transformative approach to XML551 <p id="idp322032"> The Parabix (parallel bit stream) framework is a transformative approach to XML 554 552 parsing (and other forms of text processing.) The key idea is to exploit the 555 553 availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent 556 554 data from long blocks of input data by using one register bit per single input byte. To 557 facilitate this, the input data is first transposed into a set of basis bit streams. In 555 facilitate this, the input data is first transposed into a set of basis bit streams. 556 For example Table II shows the ASCII bytes for the string "<code class="code">b7<A</code>" with 557 the corresponding 8 basis bit streams, b<sub>0</sub> through b<sub>7</sub> shown in Table III. 558 --> 558 559 559 560 Boolean-logic operations\footnote{â§, \âš and ¬ denote the … … 576 577 multiple 577 578 classes can share the classification cost. </p> 578 <div class="table-wrapper" id="idp3 42656">579 <div class="table-wrapper" id="idp334416"> 579 580 <p class="title">Table II</p> 580 <div class="caption"><p id="idp3 43168">XML Source Data</p></div>581 <div class="caption"><p id="idp334928">XML Source Data</p></div> 581 582 <table class="table"> 582 583 <colgroup span="1"> … … 605 606 </table> 606 607 </div> 607 <div class="table-wrapper" id="idp35 8560">608 <div class="table-wrapper" id="idp350320"> 608 609 <p class="title">Table III</p> 609 <div class="caption"><p id="idp35 9072">8-bit ASCII Basis Bit Streams</p></div>610 <div class="caption"><p id="idp350832">8-bit ASCII Basis Bit Streams</p></div> 610 611 <table class="table"> 611 612 <colgroup span="1"> … … 674 675 </table> 675 676 </div> 676 <p id="idp39 9232"> Consider, for example, the XML source data stream shown in the first line of Table II.677 <p id="idp390992"> Consider, for example, the XML source data stream shown in the first line of Table IV. 677 678 The remaining lines of this figure show 678 679 several parallel bit streams that are computed in Parabix-style parsing, with each bit … … 682 683 brackets that represent tag openers in XML. The second and third streams show a 683 684 partition of the tag openers into start tag marks and end tag marks depending on the 684 character immediately following the opener (i.e., <code class="code">"/"</code>) or685 character immediately following the opener (i.e., "<code class="code">/</code>") or 685 686 not. The remaining three lines show streams that can be computed in subsequent parsing 686 687 (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams 687 688 marking the element names, attribute names and attribute values of tags. </p> 688 <p id="idp401088"> Two intuitions may help explain how the Parabix approach can lead to improved XML 689 <div class="table-wrapper" id="idp393904"> 690 <p class="title">Table IV</p> 691 <div class="caption"><p id="idp394416">XML Source Data and Derived Parallel Bit Streams</p></div> 692 <table class="table"> 693 <colgroup span="1"> 694 <col align="centre" valign="top" span="1"> 695 <col align="left" valign="top" span="1"> 696 </colgroup> 697 <tbody> 698 <tr> 699 <td> Source Data </td> 700 <td> <code class="code"> <document>fee<element a1='fie' a2 = 'foe'></element>fum</document> </code> 701 </td> 702 </tr> 703 <tr> 704 <td> Tag Openers </td> 705 <td> <code class="code">1____________1____________________________1____________1__________</code> 706 </td> 707 </tr> 708 <tr> 709 <td> Start Tag Marks </td> 710 <td> <code class="code">_1____________1___________________________________________________</code> 711 </td> 712 </tr> 713 <tr> 714 <td> End Tag Marks </td> 715 <td> <code class="code">___________________________________________1____________1_________</code> 716 </td> 717 </tr> 718 <tr> 719 <td> Empty Tag Marks </td> 720 <td> <code class="code">__________________________________________________________________</code> 721 </td> 722 </tr> 723 <tr> 724 <td> Element Names </td> 725 <td> <code class="code">_11111111_____1111111_____________________________________________</code> 726 </td> 727 </tr> 728 <tr> 729 <td> Attribute Names </td> 730 <td> <code class="code">______________________11_______11_________________________________</code> 731 </td> 732 </tr> 733 <tr> 734 <td> Attribute Values </td> 735 <td> <code class="code">__________________________111________111__________________________</code> 736 </td> 737 </tr> 738 </tbody> 739 </table> 740 </div> 741 <p id="idp406864"> Two intuitions may help explain how the Parabix approach can lead to improved XML 689 742 parsing performance. The first is that the use of the full register width offers a 690 743 considerable information advantage over sequential byte-at-a-time parsing. That is, … … 695 748 individual decision-bits, an approach that computes many of them in parallel (e.g., 128 696 749 bytes at a time using 128-bit registers) should provide substantial benefit. </p> 697 <p id="idp40 2336"> Previous studies have shown that the Parabix approach improves many aspects of XML750 <p id="idp408112"> Previous studies have shown that the Parabix approach improves many aspects of XML 698 751 processing, including transcoding \cite{Cameron2008}, character classification and 699 752 validation, tag parsing and well-formedness checking. The first Parabix parser used … … 704 757 \cite{HPCA2012}. Although these research prototypes handled the full syntax of 705 758 schema-less XML documents, they lacked the functionality required by full XML parsers. </p> 706 <p id="idp4 03600"> Commercial XML processors support transcoding of multiple character sets and can759 <p id="idp410240"> Commercial XML processors support transcoding of multiple character sets and can 707 760 parse and validate against multiple document vocabularies. Additionally, they provide 708 761 API facilities beyond those found in research prototypes, including the widely used SAX, 709 762 SAX2 and DOM interfaces. </p> 710 763 </div> 711 <div class="section" id="idp4 04448">764 <div class="section" id="idp411088"> 712 765 <h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3> 713 <p id="idp4 05088"> Xercesâlike all traditional XML parsersâprocesses XML documents766 <p id="idp411728"> Xercesâlike all traditional XML parsersâprocesses XML documents 714 767 sequentially. Each character is examined to distinguish between the XML-specific markup, 715 768 such as a left angle bracket <code class="code"><</code>, and the content held within the 716 769 document. As the parser progresses through the document, it alternates between markup 717 770 scanning, validation and content processing modes. </p> 718 <p id="idp4 06624"> In other words, Xerces belongs to an equivalent class applications termed FSM771 <p id="idp413264"> In other words, Xerces belongs to an equivalent class applications termed FSM 719 772 applications\footnote{ Herein FSM applications are considered software systems whose 720 773 behaviour is defined by the inputs, current state and the events associated with … … 722 775 subsequent characters. Unfortunately, textual data tends to be unpredictable and any 723 776 character could induce a state transition. </p> 724 <p id="idp4 07536"> Parabix-style XML parsers utilize a concept of layered processing. A block of source777 <p id="idp414176"> Parabix-style XML parsers utilize a concept of layered processing. A block of source 725 778 text is transformed into a set of lexical bitstreams, which undergo a series of 726 779 operations that can be grouped into logical layers, e.g., transposition, character … … 731 784 </div> 732 785 </div> 733 <div class="section" id="idp4 08832">786 <div class="section" id="idp415472"> 734 787 <h2 class="title" style="clear: both">Architecture</h2> 735 <div class="section" id="idp4 09472">788 <div class="section" id="idp416112"> 736 789 <h3 class="title" style="clear: both">Overview</h3> 737 <p id="idp41 0368"> icXML is more than an optimized version of Xerces. Many components were grouped,790 <p id="idp417008"> icXML is more than an optimized version of Xerces. Many components were grouped, 738 791 restructured and rearchitected with pipeline parallelism in mind. In this section, we 739 792 highlight the core differences between the two systems. As shown in Figure … … 761 814 <p class="title">Figure 1: Xerces Architecture</p> 762 815 <div class="figure-contents"> 763 <div class="mediaobject" id="idp4 17008"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>816 <div class="mediaobject" id="idp423824"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div> 764 817 <div class="caption"></div> 765 818 </div> 766 819 </div> 767 <p id="idp4 19264"> In icXML functions are grouped into logical components. As shown in Figure820 <p id="idp426144"> In icXML functions are grouped into logical components. As shown in Figure 768 821 \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the 769 822 Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which … … 784 837 described in Section \ref{section:arch:errorhandling}. From here, two data-independent 785 838 branches exist: the Symbol Resolver and Content Preparation Unit. </p> 786 <p id="idp4 23344"> A typical XML file contains few unique element and attribute namesâbut839 <p id="idp430368"> A typical XML file contains few unique element and attribute namesâbut 787 840 each of them will occur frequently. icXML stores these as distinct data structures, 788 841 called symbols, each with their own global identifier (GID). Using the symbol marker … … 790 843 Resolver</span> scans through the raw data to produce a sequence of GIDs, called 791 844 the <span class="ital">symbol stream</span>. </p> 792 <p id="idp4 25936"> The final components of the Parabix Subsystem are the <span class="ital">Content845 <p id="idp432960"> The final components of the Parabix Subsystem are the <span class="ital">Content 793 846 Preparation Unit</span> and <span class="ital">Content Stream 794 847 Generator</span>. The former takes the (transposed) basis bitstreams and selectively 795 848 filters them, according to the information provided by the Parallel Markup Parser, and 796 849 the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p> 797 <p id="idp4 28848"> Combined, the symbol and content stream form icXML's compressed IR of the XML850 <p id="idp435872"> Combined, the symbol and content stream form icXML's compressed IR of the XML 798 851 document. The <span class="ital">Markup Processor</span>~parses the IR to 799 852 validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that … … 807 860 <p class="title">Figure 2: icXML Architecture</p> 808 861 <div class="figure-contents"> 809 <div class="mediaobject" id="idp4 34672"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>862 <div class="mediaobject" id="idp441696"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div> 810 863 <div class="caption"></div> 811 864 </div> 812 865 </div> 813 866 </div> 814 <div class="section" id="idp4 37120">867 <div class="section" id="idp444144"> 815 868 <h3 class="title" style="clear: both">Character Set Adapters</h3> 816 <p id="idp4 37792"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of869 <p id="idp444816"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of 817 870 Xerces itself and provide the end-consumer with a single encoding format. In the 818 871 important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant, … … 821 874 other cases, transcoding may involve table look-up operations for each byte of input. In 822 875 any case, transcoding imposes at least a cost of buffer copying. </p> 823 <p id="idp4 38848"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize876 <p id="idp445872"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize 824 877 transcoding costs. Given a specified input encoding, a CSA is responsible for checking 825 878 that input code units represent valid characters, mapping the characters of the encoding … … 827 880 item streams), as well as supporting ultimate transcoding requirements. All of this work 828 881 is performed using the parallel bitstream representation of the source input. </p> 829 <p id="idp4 39824"> An important observation is that many character sets are an extension to the legacy882 <p id="idp446848"> An important observation is that many character sets are an extension to the legacy 830 883 7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8, 831 884 UTF-16 and many others. Furthermore, all significant characters for parsing XML are 832 885 confined to the ASCII repertoire. Thus, a single common set of lexical item calculations 833 886 serves to compute lexical item streams for all such ASCII-based character sets. </p> 834 <p id="idp44 0704"> A second observation is thatâregardless of which character set is887 <p id="idp447728"> A second observation is thatâregardless of which character set is 835 888 usedâquite often all of the characters in a particular block of input will be 836 889 within the ASCII range. This is a very simple test to perform using the bitstream … … 839 892 be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the 840 893 UTF-16 form are each set to zero in this case. </p> 841 <p id="idp44 2624"> A third observation is that repeated transcoding of the names of XML elements,894 <p id="idp449376"> A third observation is that repeated transcoding of the names of XML elements, 842 895 attributes and so on can be avoided by using a look-up mechanism. That is, the first 843 896 occurrence of each symbol is stored in a look-up table mapping the input encoding to a … … 846 899 symbol look up is required to apply various XML validation rules, there is achieves the 847 900 effect of transcoding each occurrence without additional cost. </p> 848 <p id="idp4 43680"> The cost of individual character transcoding is avoided whenever a block of input is901 <p id="idp450432"> The cost of individual character transcoding is avoided whenever a block of input is 849 902 confined to the ASCII subset and for all but the first occurrence of any XML element or 850 903 attribute name. Furthermore, when transcoding is required, the parallel bitstream … … 857 910 using bit scan operations. </p> 858 911 </div> 859 <div class="section" id="idp4 46016">912 <div class="section" id="idp451856"> 860 913 <h3 class="title" style="clear: both">Combined Parallel Filtering</h3> 861 <p id="idp4 46704"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last914 <p id="idp452544"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last 862 915 bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two 863 916 Chinese characters <code class="code">äœ å¥œ</code> are represented as two … … 873 926 may then be completed by applying parallel deletion and inverse transposition of the 874 927 UTF-16 bitstreams\cite{Cameron2008}. </p> 875 <div class="table-wrapper" id="idp450864"> 876 <p class="title">Table IV</p> 877 <div class="caption"><p id="idp451376">XML Source Data and Derived Parallel Bit Streams</p></div> 878 <table class="table"> 879 <colgroup span="1"> 880 <col align="centre" valign="top" span="1"> 881 <col align="left" valign="top" span="1"> 882 </colgroup> 883 <tbody> 884 <tr> 885 <td> Source Data </td> 886 <td> <code class="code"> <document>fee<element a1='fie' a2 = 'foe'></element>fum</document> </code> 887 </td> 888 </tr> 889 <tr> 890 <td> Tag Openers </td> 891 <td> <code class="code">1____________1____________________________1____________1__________</code> 892 </td> 893 </tr> 894 <tr> 895 <td> Start Tag Marks </td> 896 <td> <code class="code">_1____________1___________________________________________________</code> 897 </td> 898 </tr> 899 <tr> 900 <td> End Tag Marks </td> 901 <td> <code class="code">___________________________________________1____________1_________</code> 902 </td> 903 </tr> 904 <tr> 905 <td> Empty Tag Marks </td> 906 <td> <code class="code">__________________________________________________________________</code> 907 </td> 908 </tr> 909 <tr> 910 <td> Element Names </td> 911 <td> <code class="code">_11111111_____1111111_____________________________________________</code> 912 </td> 913 </tr> 914 <tr> 915 <td> Attribute Names </td> 916 <td> <code class="code">______________________11_______11_________________________________</code> 917 </td> 918 </tr> 919 <tr> 920 <td> Attribute Values </td> 921 <td> <code class="code">__________________________111________111__________________________</code> 922 </td> 923 </tr> 924 </tbody> 925 </table> 926 </div> 927 <p id="idp464464"> Rather than immediately paying the costs of deletion and transposition just for 928 <p id="idp456704"> Rather than immediately paying the costs of deletion and transposition just for 928 929 transcoding, however, icXML defers these steps so that the deletion masks for several 929 930 stages of processing may be combined. In particular, this includes core XML requirements … … 936 937 after the marked CR as shown by the Pablo source code in Figure 937 938 \ref{fig:LBnormalization}. 938 939 <div class="figure" id="idp458944"> 940 <p class="title">Figure 3</p> 941 <div class="figure-contents"> 942 <div class="caption">Line Break Normalization Logic</div> 943 <pre class="programlisting" id="idp459632"> 944 # XML 1.0 line-break normalization rules. 945 if lex.CR: 946 # Modify CR (#x0D) to LF (#x0A) 947 u16lo.bit_5 ^= lex.CR 948 u16lo.bit_6 ^= lex.CR 949 u16lo.bit_7 ^= lex.CR 950 CRLF = pablo.Advance(lex.CR) & lex.LF 951 callouts.delmask |= CRLF 952 # Adjust LF streams for line/column tracker 953 lex.LF |= lex.CR 954 lex.LF ^= CRLF 955 </pre> 956 </div> 957 </div> 939 958 </p> 940 <p id="idp46 6976"> In essence, the deletion masks for transcoding and for line break normalization each959 <p id="idp461008"> In essence, the deletion masks for transcoding and for line break normalization each 941 960 represent a bitwise filter; these filters can be combined using bitwise-or so that the 942 961 parallel deletion algorithm need only be applied once. </p> 943 <p id="idp46 8288"> A further application of combined filtering is the processing of XML character and962 <p id="idp461664"> A further application of combined filtering is the processing of XML character and 944 963 entity references. Consider, for example, the references <code class="code">&</code> or 945 964 <code class="code"><</code>. which must be replaced in XML processing with the single … … 954 973 UTF-16 code unit. In the case, that this is not true, it is addressed in 955 974 post-processing. </p> 956 <p id="idp4 73168"> The final step of combined filtering occurs during the process of reducing markup975 <p id="idp466480"> The final step of combined filtering occurs during the process of reducing markup 957 976 data to tag bytes preceding each significant XML transition as described in 958 977 section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying … … 964 983 Haswell architecture. </p> 965 984 </div> 966 <div class="section" id="idp4 74496">985 <div class="section" id="idp467808"> 967 986 <h3 class="title" style="clear: both">Content Stream</h3> 968 <p id="idp4 75168"> A relatively-unique concept for icXML is the use of a filtered content stream.987 <p id="idp468480"> A relatively-unique concept for icXML is the use of a filtered content stream. 969 988 Rather that parsing an XML document in its original format, the input is transformed 970 989 into one that is easier for the parser to iterate through and produce the sequential … … 974 993 975 994 through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p> 976 <p id="idp47 7568"> Combined with the symbol stream, the parser traverses the content stream to995 <p id="idp470992"> Combined with the symbol stream, the parser traverses the content stream to 977 996 effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following 978 997 <code class="code">></code> indicates that a start tag without any attributes is the first … … 986 1005 null character in the content stream in parallel, which in turn means the parser can 987 1006 directly jump to the end of every string without scanning for it. </p> 988 <p id="idp4 80960"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the1007 <p id="idp474384"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the 989 1008 existence of an attribute. Because all of the intra-element was performed in the Parabix 990 1009 Subsystem, this must be a legal attribute. Since attributes can only occur within start … … 1000 1019 that the appropriate scope-nesting rules have been applied. </p> 1001 1020 </div> 1002 <div class="section" id="idp4 85104">1021 <div class="section" id="idp478528"> 1003 1022 <h3 class="title" style="clear: both">Namespace Handling</h3> 1004 <p id="idp4 86192"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used1023 <p id="idp479616"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used 1005 1024 together. It is especially important when a vocabulary application-dependant meaning, 1006 1025 such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound … … 1019 1038 uniquely-named items because the current vocabulary is determined by the namespace(s) 1020 1039 that are in-scope. </p> 1021 <div class="table-wrapper" id="idp4 93312">1040 <div class="table-wrapper" id="idp486784"> 1022 1041 <p class="title">Table V</p> 1023 <div class="caption"><p id="idp4 93824">XML Namespace Example</p></div>1042 <div class="caption"><p id="idp487296">XML Namespace Example</p></div> 1024 1043 <table class="table"> 1025 1044 <colgroup span="1"> … … 1055 1074 </table> 1056 1075 </div> 1057 <p id="idp 502800"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These1076 <p id="idp496272"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These 1058 1077 persist for the lifetime of the application through the use of a global URI pool. Xerces 1059 1078 maintains a stack of namespace scopes that is pushed (popped) every time a start tag … … 1063 1082 those that declare a set of namespaces upfront and never change them, and (2) those that 1064 1083 repeatedly modify the namespaces in predictable patterns. </p> 1065 <p id="idp 503936"> For that reason, icXML contains an independent namespace stack and utilizes bit1084 <p id="idp497408"> For that reason, icXML contains an independent namespace stack and utilizes bit 1066 1085 vectors to cheaply perform 1067 1086 When a prefix is … … 1077 1096 found using a bit-scan intrinsic. A namespace binding table, similar to Table 1078 1097 \ref{tbl:namespace1}, provides the actual URI ID. </p> 1079 <div class="table-wrapper" id="idp50 8352">1098 <div class="table-wrapper" id="idp501872"> 1080 1099 <p class="title">Table VI</p> 1081 <div class="caption"><p id="idp50 8864">Namespace Binding Table Example</p></div>1100 <div class="caption"><p id="idp502384">Namespace Binding Table Example</p></div> 1082 1101 <table class="table"> 1083 1102 <colgroup span="1"> … … 1120 1139 </table> 1121 1140 </div> 1122 <p id="idp5 24960">1141 <p id="idp518624"> 1123 1142 1124 1143 … … 1126 1145 1127 1146 </p> 1128 <p id="idp52 6864"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,1147 <p id="idp520528"> To ensure that scoping rules are adhered to, whenever a start tag is encountered, 1129 1148 any modification to the currently visible namespaces is calculated and stored within a 1130 1149 stack of bit vectors denoting the locally modified namespace bindings. When an end tag … … 1135 1154 </p> 1136 1155 </div> 1137 <div class="section" id="idp52 8320">1156 <div class="section" id="idp521984"> 1138 1157 <h3 class="title" style="clear: both">Error Handling</h3> 1139 <p id="idp52 8992">1158 <p id="idp522656"> 1140 1159 1141 1160 Xerces outputs error messages in two ways: through the programmer API and as thrown … … 1146 1165 \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and 1147 1166 Markup Processor, each with its own system for detecting and producing error messages. </p> 1148 <p id="idp5 30624"> Within the Parabix Subsystem, all computations are performed in parallel, a block at1167 <p id="idp524288"> Within the Parabix Subsystem, all computations are performed in parallel, a block at 1149 1168 a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking 1150 1169 the byte-position of an error within a block, and the type of error is determined by the … … 1179 1198 detected, the sum of those skipped positions is subtracted from the distance to 1180 1199 determine the actual column number. </p> 1181 <p id="idp5 36112"> The Markup Processor is a state-driven machine. As such, error detection within it1200 <p id="idp529776"> The Markup Processor is a state-driven machine. As such, error detection within it 1182 1201 is very similar to Xerces. However, reporting the correct line/column is a much more 1183 1202 difficult problem. The Markup Processor parses the content stream, which is a series of … … 1193 1212 </div> 1194 1213 </div> 1195 <div class="section" id="idp53 8544">1214 <div class="section" id="idp532208"> 1196 1215 <h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2> 1197 <p id="idp53 9248"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM1216 <p id="idp532912"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM 1198 1217 application. These are "embarrassingly 1199 1218 sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to … … 1203 1222 well into the general model of pipeline parallelism, in which each thread is in charge of a 1204 1223 single module or group of modules. </p> 1205 <p id="idp5 41104"> The most straightforward division of work in icXML is to separate the Parabix Subsystem1224 <p id="idp534768"> The most straightforward division of work in icXML is to separate the Parabix Subsystem 1206 1225 and the Markup Processor into distinct logical layers into two separate stages. The 1207 1226 resultant application, <span class="ital">icXML-p</span>, is a course-grained … … 1224 1243 <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can 1225 1244 reuse the memory space. </p> 1226 <p id="idp5 50224">1245 <p id="idp544496"> 1227 1246 <div class="figure" id="threads_timeline1"> 1228 <p class="title">Figure 3: Thread Balance in Two-Stage Pipelines</p>1247 <p class="title">Figure 4: Thread Balance in Two-Stage Pipelines</p> 1229 1248 <div class="figure-contents"> 1230 <div class="mediaobject" id="idp5 51616"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div>1231 <div class="mediaobject" id="idp5 53392"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>1249 <div class="mediaobject" id="idp545936"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div> 1250 <div class="mediaobject" id="idp547712"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div> 1232 1251 <div class="caption"></div> 1233 1252 </div> 1234 1253 </div> 1235 1254 </p> 1236 <p id="idp55 5840"> Overall, our design is intended to benefit a range of applications. Conceptually, we1255 <p id="idp550160"> Overall, our design is intended to benefit a range of applications. Conceptually, we 1237 1256 consider two design points. The first, the parsing performed by the Parabix Subsystem 1238 1257 dominates at 67% of the overall cost, with the cost of application processing (including … … 1240 1259 scenario, the cost of application processing dominates at 60%, while the cost of XML 1241 1260 parsing represents an overhead of 40%. </p> 1242 <p id="idp55 6752"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to1261 <p id="idp551072"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to 1243 1262 100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement 1244 1263 of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the … … 1248 1267 about 33% of the original work. In this case, Amdahl's law predicts that we could expect up 1249 1268 to a 3x speedup at best. </p> 1250 <p id="idp55 7872"> At the other extreme of our design range, we consider an application in which core1269 <p id="idp552192"> At the other extreme of our design range, we consider an application in which core 1251 1270 parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the 1252 1271 corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most … … 1254 1273 the entire latency of parsing within the serial time required by the application. In this 1255 1274 case, we achieve an overall speedup in processing time by 1.67x. </p> 1256 <p id="idp55 8816"> Although the structure of the Parabix Subsystem allows division of the work into1275 <p id="idp553136"> Although the structure of the Parabix Subsystem allows division of the work into 1257 1276 several pipeline stages and has been demonstrated to be effective for four pipeline stages 1258 1277 in a research prototype \cite{HPCA2012}, our analysis here suggests that the further … … 1262 1281 the cost of application logic that could match reductions in core parsing cost. </p> 1263 1282 </div> 1264 <div class="section" id="idp5 60656">1283 <div class="section" id="idp554320"> 1265 1284 <h2 class="title" style="clear: both">Performance</h2> 1266 <p id="idp5 61328"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the1285 <p id="idp554992"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the 1267 1286 Xerces C++ SAXCount sample application, and a real world GML to SVG transformation 1268 1287 application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy … … 1270 1289 L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 1271 1290 12.04 (Linux). </p> 1272 <p id="idp5 62240"> We analyzed the execution profiles of each XML parser using the performance counters1291 <p id="idp555904"> We analyzed the execution profiles of each XML parser using the performance counters 1273 1292 found in the processor. We chose several key hardware events that provide insight into the 1274 1293 profile of each application and indicate if the processor is doing useful work. The set of … … 1278 1297 collection of hardware performance monitoring statistics. In addition, we used the Linux 1279 1298 perf \cite{perf} utility to collect per core hardware events. </p> 1280 <div class="section" id="idp5 63376">1299 <div class="section" id="idp557040"> 1281 1300 <h3 class="title" style="clear: both">Xerces C++ SAXCount</h3> 1282 <p id="idp5 64048"> Xerces comes with sample applications that demonstrate salient features of the1301 <p id="idp557712"> Xerces comes with sample applications that demonstrate salient features of the 1283 1302 parser. SAXCount is the simplest such application: it counts the elements, attributes 1284 1303 and characters of a given XML file using the (event based) SAX API and prints out the 1285 1304 totals. </p> 1286 <p id="idp5 64752"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files1305 <p id="idp558416"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files 1287 1306 selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented 1288 1307 XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the 1289 1308 UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML 1290 1309 documents and consist entirely of single byte encoded ASCII characters. 1291 <div class="table-wrapper" id="idp5 65488">1310 <div class="table-wrapper" id="idp559152"> 1292 1311 <p class="title">Table VII</p> 1293 <div class="caption"><p id="idp5 66000">XML Document Characteristics</p></div>1312 <div class="caption"><p id="idp559664">XML Document Characteristics</p></div> 1294 1313 <table class="table"> 1295 1314 <colgroup span="1"> … … 1340 1359 </div> 1341 1360 </p> 1342 <p id="idp5 81584"> A key predictor of the overall parsing performance of an XML file is markup1361 <p id="idp575296"> A key predictor of the overall parsing performance of an XML file is markup 1343 1362 density\footnote{ Markup Density: the ratio of markup bytes used to define the structure 1344 1363 of the document vs. its file size.}. This metric has substantial influence on the … … 1347 1366 of document-oriented and data-oriented XML files to analyze performance over a spectrum 1348 1367 of markup densities. </p> 1349 <p id="idp5 83200"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML1368 <p id="idp576304"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML 1350 1369 in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over 1351 1370 Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve … … 1354 1373 icXML-p performs better as markup-density increases because the work performed by each 1355 1374 stage is well balanced in this application. </p> 1356 <p id="idp5 84240">1375 <p id="idp577344"> 1357 1376 <div class="figure" id="perf_SAX"> 1358 <p class="title">Figure 4: SAXCount Performance Comparison</p>1377 <p class="title">Figure 5: SAXCount Performance Comparison</p> 1359 1378 <div class="figure-contents"> 1360 <div class="mediaobject" id="idp5 85632"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>1379 <div class="mediaobject" id="idp578736"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div> 1361 1380 <div class="caption"></div> 1362 1381 </div> … … 1364 1383 </p> 1365 1384 </div> 1366 <div class="section" id="idp58 8176">1385 <div class="section" id="idp581280"> 1367 1386 <h3 class="title" style="clear: both">GML2SVG</h3> 1368 <p id="idp58 8848"> As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application1387 <p id="idp581952"> As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application 1369 1388 was chosen. This application transforms geospatially encoded data represented using 1370 1389 an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography} … … 1378 1397 a known XML format for the purpose of analysis and restructuring to meet 1379 1398 the requirements of an alternative format.</p> 1380 <p id="idp5 90224">Our GML to SVG data translations are executed on GML source data1399 <p id="idp583280">Our GML to SVG data translations are executed on GML source data 1381 1400 modelling the city of Vancouver, British Columbia, Canada. 1382 1401 The GML source document set … … 1386 1405 213.4 MB of source GML data generates 91.9 MB of target SVG data.</p> 1387 1406 <div class="figure" id="perf_GML2SVG"> 1388 <p class="title">Figure 5: Performance Comparison for GML2SVG</p>1407 <p class="title">Figure 6: Performance Comparison for GML2SVG</p> 1389 1408 <div class="figure-contents"> 1390 <div class="mediaobject" id="idp5 92304"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>1409 <div class="mediaobject" id="idp585312"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div> 1391 1410 <div class="caption"></div> 1392 1411 </div> 1393 1412 </div> 1394 <p id="idp5 94592">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against1413 <p id="idp587600">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against 1395 1414 the Xerces, icXML and icXML-p. 1396 1415 On the GML workload with this application, single-thread icXML … … 1399 1418 Using icXML-p, a further throughput increase to 111 MB/sec was recorded, 1400 1419 approximately a 2X speedup.</p> 1401 <p id="idp5 95408">An important aspect of icXML is the replacement of much branch-laden1420 <p id="idp588416">An important aspect of icXML is the replacement of much branch-laden 1402 1421 sequential code inside Xerces with straight-line SIMD code using far 1403 1422 fewer branches. Figure \ref{branchmiss_GML2SVG} shows the corresponding … … 1408 1427 less overloaded and able to increase the successful branch prediction rate.</p> 1409 1428 <div class="figure" id="branchmiss_GML2SVG"> 1410 <p class="title">Figure 6: Comparative Branch Misprediction Rate</p>1429 <p class="title">Figure 7: Comparative Branch Misprediction Rate</p> 1411 1430 <div class="figure-contents"> 1412 <div class="mediaobject" id="idp59 7520"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>1431 <div class="mediaobject" id="idp590528"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div> 1413 1432 <div class="caption"></div> 1414 1433 </div> 1415 1434 </div> 1416 <p id="idp59 9808">The behaviour of the three versions with respect to L1 cache misses per kB is shown1435 <p id="idp592816">The behaviour of the three versions with respect to L1 cache misses per kB is shown 1417 1436 in Figure \ref{cachemiss_GML2SVG}. Improvements are shown in both instruction- 1418 1437 and data-cache performance with the improvements in instruction-cache … … 1424 1443 significant benefit.</p> 1425 1444 <div class="figure" id="cachemiss_GML2SVG"> 1426 <p class="title">Figure 7: Comparative Cache Miss Rate</p>1445 <p class="title">Figure 8: Comparative Cache Miss Rate</p> 1427 1446 <div class="figure-contents"> 1428 <div class="mediaobject" id="idp 601936"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>1447 <div class="mediaobject" id="idp594944"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div> 1429 1448 <div class="caption"></div> 1430 1449 </div> 1431 1450 </div> 1432 <p id="idp 604224">One caveat with this study is that the GML2SVG application did not exhibit1451 <p id="idp597232">One caveat with this study is that the GML2SVG application did not exhibit 1433 1452 a relative balance of processing between application code and Xerces library 1434 1453 code reaching the 33% figure. This suggests that for this application and … … 1438 1457 </div> 1439 1458 </div> 1440 <div class="section" id="idp 605296">1459 <div class="section" id="idp598304"> 1441 1460 <h2 class="title" style="clear: both">Conclusion and Future Work</h2> 1442 <p id="idp 605984"> This paper is the first case study documenting the significant performance benefits1461 <p id="idp598992"> This paper is the first case study documenting the significant performance benefits 1443 1462 that may be realized through the integration of parallel bitstream technology into existing 1444 1463 widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined … … 1450 1469 technologies, this is an important case study demonstrating the general feasibility of 1451 1470 these techniques. </p> 1452 <p id="idp60 7264"> The further development of icXML to move beyond 2-stage pipeline parallelism is1471 <p id="idp600272"> The further development of icXML to move beyond 2-stage pipeline parallelism is 1453 1472 ongoing, with realistic prospects for four reasonably balanced stages within the library. 1454 1473 For applications such as GML2SVG which are dominated by time spent on XML parsing, such a 1455 1474 multistage pipelined parsing library should offer substantial benefits. </p> 1456 <p id="idp60 8032"> The example of XML parsing may be considered prototypical of finite-state machines1475 <p id="idp601040"> The example of XML parsing may be considered prototypical of finite-state machines 1457 1476 applications which have sometimes been considered "embarassingly 1458 1477 sequential" and so difficult to parallelize that "nothing … … 1460 1479 point in making the case that parallelization can indeed be helpful across a broad array of 1461 1480 application types. </p> 1462 <p id="idp60 9408"> To overcome the software engineering challenges in applying parallel bitstream1481 <p id="idp602416"> To overcome the software engineering challenges in applying parallel bitstream 1463 1482 technology to existing software systems, it is clear that better library and tool support 1464 1483 is needed. The techniques used in the implementation of icXML and documented in this paper … … 1467 1486 </p> 1468 1487 </div> 1469 <div class="bibliography" id="idp6 11376">1488 <div class="bibliography" id="idp603904"> 1470 1489 <h2 class="title" style="clear:both">Bibliography</h2> 1471 1490 <p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and -
docs/Balisage13/Bal2013came0601/Bal2013came0601.xml
r3052 r3053 127 127 <!-- 128 128 <legalnotice> 129 <para>Copyright © 20 09 Robert D. Cameron, Kenneth S. Herdy and Ehsan Amiri.129 <para>Copyright © 2013 Nigel Medforth, Dan Lin, Kenneth S. Herdy, Robert D. Cameron and Arrvindh Shriraman. 130 130 This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative 131 131 Works 2.5 Canada License.</para> … … 149 149 <section> 150 150 <title>Xerces C++ Structure</title> 151 <para> The Xerces C++ parser <!-- is a widely-used standards-conformant -->152 <!-- XML parser produced as open-source software -->153 <!-- by the Apache Software Foundation. -->154 <!-- It -->features comprehensive support for a variety of character encodings both151 <para> The Xerces C++ parser is a widely-used standards-conformant 152 XML parser produced as open-source software 153 by the Apache Software Foundation. 154 It features comprehensive support for a variety of character encodings both 155 155 commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple 156 156 XML vocabularies through the XML namespace mechanism, as well as complete … … 161 161 tree-based parsing interface. </para> 162 162 <para> 163 <!--What is the story behind the xerces-profile picture? should it contain one single file or several from our test suite?--> 164 <!--Our test suite does not have any grammars in it; ergo, processing those files will give a poor indication of the cost of using grammars--> 165 <!--Should we show a val-grind summary of a few files in a linechart form?--> Xerces, 163 Xerces, 166 164 like all traditional parsers, processes XML documents sequentially a byte-at-a-time from 167 165 the first to the last byte of input data. Each byte passes through several processing … … 208 206 availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent 209 207 data from long blocks of input data by using one register bit per single input byte. To 210 facilitate this, the input data is first transposed into a set of basis bit streams. In <!--FIGURE REF Figure~\ref{fig:BitStreamsExample}, the ASCII string ``{\ttfamily b7\verb|<|A}'' 211 is represented as 8 basis bit streams, $\tt b<subscript>{0 \ldots 7}$. 208 facilitate this, the input data is first transposed into a set of basis bit streams. 209 For example Table II shows the ASCII bytes for the string "<code>b7<A</code>" with 210 the corresponding 8 basis bit streams, b<subscript>0</subscript> through b<subscript>7</subscript> shown in Table III. 212 211 --> 213 212 <!-- The bits used to construct $\tt <subscript>7</subscript>$ have been highlighted in this example. --> … … 279 278 <!-- process, intra-element well-formedness validation is performed on each block --> 280 279 <!-- of text. --> 281 <para> Consider, for example, the XML source data stream shown in the first line of Table I I.280 <para> Consider, for example, the XML source data stream shown in the first line of Table IV. 282 281 The remaining lines of this figure show 283 282 several parallel bit streams that are computed in Parabix-style parsing, with each bit … … 287 286 brackets that represent tag openers in XML. The second and third streams show a 288 287 partition of the tag openers into start tag marks and end tag marks depending on the 289 character immediately following the opener (i.e., <code>"/"</code>) or288 character immediately following the opener (i.e., "<code>/</code>") or 290 289 not. The remaining three lines show streams that can be computed in subsequent parsing 291 290 (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams 292 291 marking the element names, attribute names and attribute values of tags. </para> 292 <table> 293 <caption> 294 <para>XML Source Data and Derived Parallel Bit Streams</para> 295 </caption> 296 <colgroup> 297 <col align="centre" valign="top"/> 298 <col align="left" valign="top"/> 299 </colgroup> 300 <tbody> 301 <tr><td> Source Data </td><td> <code> <![CDATA[<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>]]> </code></td></tr> 302 <tr><td> Tag Openers </td><td> <code>1____________1____________________________1____________1__________</code></td></tr> 303 <tr><td> Start Tag Marks </td><td> <code>_1____________1___________________________________________________</code></td></tr> 304 <tr><td> End Tag Marks </td><td> <code>___________________________________________1____________1_________</code></td></tr> 305 <tr><td> Empty Tag Marks </td><td> <code>__________________________________________________________________</code></td></tr> 306 <tr><td> Element Names </td><td> <code>_11111111_____1111111_____________________________________________</code></td></tr> 307 <tr><td> Attribute Names </td><td> <code>______________________11_______11_________________________________</code></td></tr> 308 <tr><td> Attribute Values </td><td> <code>__________________________111________111__________________________</code></td></tr> 309 </tbody> 310 </table> 311 293 312 <para> Two intuitions may help explain how the Parabix approach can lead to improved XML 294 313 parsing performance. The first is that the use of the full register width offers a … … 491 510 may then be completed by applying parallel deletion and inverse transposition of the 492 511 UTF-16 bitstreams\cite{Cameron2008}. </para> 493 <table>494 <caption>495 <para>XML Source Data and Derived Parallel Bit Streams</para>496 </caption>497 <colgroup>498 <col align="centre" valign="top"/>499 <col align="left" valign="top"/>500 </colgroup>501 <tbody>502 <tr><td> Source Data </td><td> <code> <![CDATA[<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>]]> </code></td></tr>503 <tr><td> Tag Openers </td><td> <code>1____________1____________________________1____________1__________</code></td></tr>504 <tr><td> Start Tag Marks </td><td> <code>_1____________1___________________________________________________</code></td></tr>505 <tr><td> End Tag Marks </td><td> <code>___________________________________________1____________1_________</code></td></tr>506 <tr><td> Empty Tag Marks </td><td> <code>__________________________________________________________________</code></td></tr>507 <tr><td> Element Names </td><td> <code>_11111111_____1111111_____________________________________________</code></td></tr>508 <tr><td> Attribute Names </td><td> <code>______________________11_______11_________________________________</code></td></tr>509 <tr><td> Attribute Values </td><td> <code>__________________________111________111__________________________</code></td></tr>510 </tbody>511 </table>512 512 <para> Rather than immediately paying the costs of deletion and transposition just for 513 513 transcoding, however, icXML defers these steps so that the deletion masks for several … … 521 521 after the marked CR as shown by the Pablo source code in Figure 522 522 \ref{fig:LBnormalization}. 523 <!-- FIGURE 524 \begin{figure} 525 \begin{verbatim} 523 <figure> 524 <caption>Line Break Normalization Logic</caption> 525 <programlisting> 526 526 # XML 1.0 line-break normalization rules. 527 527 if lex.CR: … … 530 530 u16lo.bit_6 ^= lex.CR 531 531 u16lo.bit_7 ^= lex.CR 532 CRLF = pablo.Advance(lex.CR) & lex.LF532 CRLF = pablo.Advance(lex.CR) & lex.LF 533 533 callouts.delmask |= CRLF 534 534 # Adjust LF streams for line/column tracker 535 535 lex.LF |= lex.CR 536 536 lex.LF ^= CRLF 537 \end{verbatim} 538 \caption{Line Break Normalization Logic}\label{fig:LBnormalization} 539 \end{figure} 540 --> 537 </programlisting> 538 </figure> 541 539 </para> 542 540 <para> In essence, the deletion masks for transcoding and for line break normalization each
Note: See TracChangeset
for help on using the changeset viewer.