Changeset 3052
- Timestamp:
- Apr 19, 2013, 3:56:55 PM (6 years ago)
- Location:
- docs/Balisage13/Bal2013came0601
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Balisage13/Bal2013came0601/Bal2013came0601.html
r3051 r3052 274 274 </div> 275 275 <div id="mast"><div class="content"> 276 <h2 class="article-title" id="idp74 624"></h2>276 <h2 class="article-title" id="idp74240"></h2> 277 277 <div class="author"> 278 278 <h3 class="author">Nigel Medforth</h3> … … 324 324 </div> 325 325 <div class="mast-box"> 326 <p class="title"><a href="javascript:toggle('idp75 744')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75744"></a> <span onclick="javascript:toggle('idp75744');return true">Abstract</span></p>327 <div class="folder" id="folder-idp75 744" style="display:none"><p id="idp76048">Prior research on the acceleration of XML processing using SIMD and multi-core326 <p class="title"><a href="javascript:toggle('idp75360')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp75360"></a> <span onclick="javascript:toggle('idp75360');return true">Abstract</span></p> 327 <div class="folder" id="folder-idp75360" style="display:none"><p id="idp75664">Prior research on the acceleration of XML processing using SIMD and multi-core 328 328 parallelism has lead to a number of interesting research prototypes. This work 329 329 investigates the extent to which the techniques underlying these prototypes could result … … 339 339 <p><b>Table of Contents</b></p> 340 340 <dl> 341 <dt><span class="section"><a href="#idp284 352" class="toc">Introduction</a></span></dt>342 <dt><span class="section"><a href="#idp28 6144" class="toc">Background</a></span></dt>341 <dt><span class="section"><a href="#idp284144" class="toc">Introduction</a></span></dt> 342 <dt><span class="section"><a href="#idp285936" class="toc">Background</a></span></dt> 343 343 <dd><dl> 344 <dt><span class="section"><a href="#idp286 784" class="toc">Xerces C++ Structure</a></span></dt>345 <dt><span class="section"><a href="#idp330 512" class="toc">The Parabix Framework</a></span></dt>346 <dt><span class="section"><a href="#idp404 544" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>344 <dt><span class="section"><a href="#idp286576" class="toc">Xerces C++ Structure</a></span></dt> 345 <dt><span class="section"><a href="#idp330416" class="toc">The Parabix Framework</a></span></dt> 346 <dt><span class="section"><a href="#idp404448" class="toc">Sequential vs. Parallel Paradigm</a></span></dt> 347 347 </dl></dd> 348 <dt><span class="section"><a href="#idp408 928" class="toc">Architecture</a></span></dt>348 <dt><span class="section"><a href="#idp408832" class="toc">Architecture</a></span></dt> 349 349 <dd><dl> 350 <dt><span class="section"><a href="#idp409 568" class="toc">Overview</a></span></dt>351 <dt><span class="section"><a href="#idp437 296" class="toc">Character Set Adapters</a></span></dt>352 <dt><span class="section"><a href="#idp446 192" class="toc">Combined Parallel Filtering</a></span></dt>353 <dt><span class="section"><a href="#idp474 608" class="toc">Content Stream</a></span></dt>354 <dt><span class="section"><a href="#idp485 264" class="toc">Namespace Handling</a></span></dt>355 <dt><span class="section"><a href="#idp528 864" class="toc">Error Handling</a></span></dt>350 <dt><span class="section"><a href="#idp409472" class="toc">Overview</a></span></dt> 351 <dt><span class="section"><a href="#idp437120" class="toc">Character Set Adapters</a></span></dt> 352 <dt><span class="section"><a href="#idp446016" class="toc">Combined Parallel Filtering</a></span></dt> 353 <dt><span class="section"><a href="#idp474496" class="toc">Content Stream</a></span></dt> 354 <dt><span class="section"><a href="#idp485104" class="toc">Namespace Handling</a></span></dt> 355 <dt><span class="section"><a href="#idp528320" class="toc">Error Handling</a></span></dt> 356 356 </dl></dd> 357 <dt><span class="section"><a href="#idp53 9088" class="toc">Multithreading with Pipeline Parallelism</a></span></dt>358 <dt><span class="section"><a href="#idp56 2720" class="toc">Performance</a></span></dt>357 <dt><span class="section"><a href="#idp538544" class="toc">Multithreading with Pipeline Parallelism</a></span></dt> 358 <dt><span class="section"><a href="#idp560656" class="toc">Performance</a></span></dt> 359 359 <dd><dl> 360 <dt><span class="section"><a href="#idp56 5440" class="toc">Xerces C++ SAXCount</a></span></dt>361 <dt><span class="section"><a href="#idp5 90288" class="toc">GML2SVG</a></span></dt>360 <dt><span class="section"><a href="#idp563376" class="toc">Xerces C++ SAXCount</a></span></dt> 361 <dt><span class="section"><a href="#idp588176" class="toc">GML2SVG</a></span></dt> 362 362 </dl></dd> 363 <dt><span class="section"><a href="#idp60 7904" class="toc">Conclusion and Future Work</a></span></dt>363 <dt><span class="section"><a href="#idp605296" class="toc">Conclusion and Future Work</a></span></dt> 364 364 </dl> 365 365 </div> 366 366 <div class="mast-box"> 367 <p class="title"><a href="javascript:toggle('idp77 472')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77472"></a> <span onclick="javascript:toggle('idp77472');return true">Nigel Medforth</span></p>368 <div class="folder" id="folder-idp77 472" style="display:none">367 <p class="title"><a href="javascript:toggle('idp77088')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77088"></a> <span onclick="javascript:toggle('idp77088');return true">Nigel Medforth</span></p> 368 <div class="folder" id="folder-idp77088" style="display:none"> 369 369 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>></code></h5> 370 370 <div class="affiliation"> … … 377 377 </div> 378 378 <div class="personblurb"> 379 <p id="idp5 9168">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead379 <p id="idp58880">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead 380 380 developer of icXML. He earned a Bachelor of Technology in Information Technology at 381 381 Kwantlen Polytechnic University in 2009 and was awarded the Deanâs Medal for 382 382 Outstanding Achievement.</p> 383 <p id="idp 60176">Nigel is currently researching ways to leverage both the Parabix framework and383 <p id="idp59888">Nigel is currently researching ways to leverage both the Parabix framework and 384 384 stream-processing models to further accelerate XML parsing within icXML.</p> 385 385 </div> … … 387 387 </div> 388 388 <div class="mast-box"> 389 <p class="title"><a href="javascript:toggle('idp63 840')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63840"></a> <span onclick="javascript:toggle('idp63840');return true">Dan Lin</span></p>390 <div class="folder" id="folder-idp63 840" style="display:none">389 <p class="title"><a href="javascript:toggle('idp63552')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63552"></a> <span onclick="javascript:toggle('idp63552');return true">Dan Lin</span></p> 390 <div class="folder" id="folder-idp63552" style="display:none"> 391 391 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>></code></h5> 392 392 <div class="affiliation"> … … 394 394 <p class="orgname">Simon Fraser University </p> 395 395 </div> 396 <div class="personblurb"><p id="idp65 552">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science396 <div class="personblurb"><p id="idp65264">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science 397 397 in Computing Science at Simon Fraser University in 2010. Her research focus on on high 398 398 performance algorithms that exploit parallelization strategies on various multicore platforms. … … 401 401 </div> 402 402 <div class="mast-box"> 403 <p class="title"><a href="javascript:toggle('idp6 8112')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp68112"></a> <span onclick="javascript:toggle('idp68112');return true">Kenneth Herdy</span></p>404 <div class="folder" id="folder-idp6 8112" style="display:none">403 <p class="title"><a href="javascript:toggle('idp67824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67824"></a> <span onclick="javascript:toggle('idp67824');return true">Kenneth Herdy</span></p> 404 <div class="folder" id="folder-idp67824" style="display:none"> 405 405 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>></code></h5> 406 406 <div class="affiliation"> … … 409 409 </div> 410 410 <div class="personblurb"> 411 <p id="idp 270560"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information411 <p id="idp69552"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information 412 412 Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor 413 413 of Science in Computing Science with a Certificate in Spatial Information Systems at 414 414 Simon Fraser University in 2005. </p> 415 <p id="idp271 296"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser415 <p id="idp271088"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser 416 416 University with industrial scholarship support from the Natural Sciences and 417 417 Engineering Research Council of Canada, the Mathematics of Information Technology and … … 423 423 </div> 424 424 <div class="mast-box"> 425 <p class="title"><a href="javascript:toggle('idp27 4032')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp274032"></a> <span onclick="javascript:toggle('idp274032');return true">Rob Cameron</span></p>426 <div class="folder" id="folder-idp27 4032" style="display:none">425 <p class="title"><a href="javascript:toggle('idp273824')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp273824"></a> <span onclick="javascript:toggle('idp273824');return true">Rob Cameron</span></p> 426 <div class="folder" id="folder-idp273824" style="display:none"> 427 427 <h5 class="author-email"><code class="email"><<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>></code></h5> 428 428 <div class="affiliation"> … … 434 434 <p class="orgname">International Characters, Inc.</p> 435 435 </div> 436 <div class="personblurb"><p id="idp275 696">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied436 <div class="personblurb"><p id="idp275488">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied 437 437 Sciences at Simon Fraser University. His research interests include programming 438 438 language and software system technology, with a specific focus on high performance … … 450 450 <div id="main"> 451 451 <div class="article"> 452 <h2 class="article-title" id="idp74 624"></h2>453 <div class="section" id="idp284 352">452 <h2 class="article-title" id="idp74240"></h2> 453 <div class="section" id="idp284144"> 454 454 <h2 class="title" style="clear: both">Introduction</h2> 455 <p id="idp284 992"></p>456 <p id="idp285 248"></p>457 <p id="idp285 504"></p>458 <p id="idp285 760"></p>459 </div> 460 <div class="section" id="idp28 6144">455 <p id="idp284784"></p> 456 <p id="idp285040"></p> 457 <p id="idp285296"></p> 458 <p id="idp285552"></p> 459 </div> 460 <div class="section" id="idp285936"> 461 461 <h2 class="title" style="clear: both">Background</h2> 462 <div class="section" id="idp286 784">462 <div class="section" id="idp286576"> 463 463 <h3 class="title" style="clear: both">Xerces C++ Structure</h3> 464 <p id="idp287 424"> The Xerces C++ parser464 <p id="idp287216"> The Xerces C++ parser 465 465 466 466 … … 473 473 parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM 474 474 tree-based parsing interface. </p> 475 <p id="idp289 584">475 <p id="idp289440"> 476 476 477 477 … … 492 492 expected that a comprehensive restructuring is required, involving all aspects of the 493 493 parser. </p> 494 <div class="table-wrapper" id="idp292 544">494 <div class="table-wrapper" id="idp292400"> 495 495 <p class="title">Table I</p> 496 <div class="caption"><p id="idm84 7680">Execution Time of Top 10 Xerces Functions</p></div>496 <div class="caption"><p id="idm848128">Execution Time of Top 10 Xerces Functions</p></div> 497 497 <table class="table"> 498 498 <colgroup span="1"> … … 549 549 </div> 550 550 </div> 551 <div class="section" id="idp330 512">551 <div class="section" id="idp330416"> 552 552 <h3 class="title" style="clear: both">The Parabix Framework</h3> 553 <p id="idp331 184"> The Parabix (parallel bit stream) framework is a transformative approach to XML553 <p id="idp331088"> The Parabix (parallel bit stream) framework is a transformative approach to XML 554 554 parsing (and other forms of text processing.) The key idea is to exploit the 555 555 availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent … … 576 576 multiple 577 577 classes can share the classification cost. </p> 578 <div class="table-wrapper" id="idp342 752">578 <div class="table-wrapper" id="idp342656"> 579 579 <p class="title">Table II</p> 580 <div class="caption"><p id="idp343 264">XML Source Data</p></div>580 <div class="caption"><p id="idp343168">XML Source Data</p></div> 581 581 <table class="table"> 582 582 <colgroup span="1"> … … 605 605 </table> 606 606 </div> 607 <div class="table-wrapper" id="idp358 656">607 <div class="table-wrapper" id="idp358560"> 608 608 <p class="title">Table III</p> 609 <div class="caption"><p id="idp359 168">8-bit ASCII Basis Bit Streams</p></div>609 <div class="caption"><p id="idp359072">8-bit ASCII Basis Bit Streams</p></div> 610 610 <table class="table"> 611 611 <colgroup span="1"> … … 674 674 </table> 675 675 </div> 676 <p id="idp399 328"> Consider, for example, the XML source data stream shown in the first line of Table II.676 <p id="idp399232"> Consider, for example, the XML source data stream shown in the first line of Table II. 677 677 The remaining lines of this figure show 678 678 several parallel bit streams that are computed in Parabix-style parsing, with each bit … … 686 686 (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams 687 687 marking the element names, attribute names and attribute values of tags. </p> 688 <p id="idp401 184"> Two intuitions may help explain how the Parabix approach can lead to improved XML688 <p id="idp401088"> Two intuitions may help explain how the Parabix approach can lead to improved XML 689 689 parsing performance. The first is that the use of the full register width offers a 690 690 considerable information advantage over sequential byte-at-a-time parsing. That is, … … 695 695 individual decision-bits, an approach that computes many of them in parallel (e.g., 128 696 696 bytes at a time using 128-bit registers) should provide substantial benefit. </p> 697 <p id="idp402 432"> Previous studies have shown that the Parabix approach improves many aspects of XML697 <p id="idp402336"> Previous studies have shown that the Parabix approach improves many aspects of XML 698 698 processing, including transcoding \cite{Cameron2008}, character classification and 699 699 validation, tag parsing and well-formedness checking. The first Parabix parser used … … 704 704 \cite{HPCA2012}. Although these research prototypes handled the full syntax of 705 705 schema-less XML documents, they lacked the functionality required by full XML parsers. </p> 706 <p id="idp4036 96"> Commercial XML processors support transcoding of multiple character sets and can706 <p id="idp403600"> Commercial XML processors support transcoding of multiple character sets and can 707 707 parse and validate against multiple document vocabularies. Additionally, they provide 708 708 API facilities beyond those found in research prototypes, including the widely used SAX, 709 709 SAX2 and DOM interfaces. </p> 710 710 </div> 711 <div class="section" id="idp404 544">711 <div class="section" id="idp404448"> 712 712 <h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3> 713 <p id="idp405 184"> Xercesâlike all traditional XML parsersâprocesses XML documents713 <p id="idp405088"> Xercesâlike all traditional XML parsersâprocesses XML documents 714 714 sequentially. Each character is examined to distinguish between the XML-specific markup, 715 715 such as a left angle bracket <code class="code"><</code>, and the content held within the 716 716 document. As the parser progresses through the document, it alternates between markup 717 717 scanning, validation and content processing modes. </p> 718 <p id="idp406 720"> In other words, Xerces belongs to an equivalent class applications termed FSM718 <p id="idp406624"> In other words, Xerces belongs to an equivalent class applications termed FSM 719 719 applications\footnote{ Herein FSM applications are considered software systems whose 720 720 behaviour is defined by the inputs, current state and the events associated with … … 722 722 subsequent characters. Unfortunately, textual data tends to be unpredictable and any 723 723 character could induce a state transition. </p> 724 <p id="idp407 632"> Parabix-style XML parsers utilize a concept of layered processing. A block of source724 <p id="idp407536"> Parabix-style XML parsers utilize a concept of layered processing. A block of source 725 725 text is transformed into a set of lexical bitstreams, which undergo a series of 726 726 operations that can be grouped into logical layers, e.g., transposition, character … … 731 731 </div> 732 732 </div> 733 <div class="section" id="idp408 928">733 <div class="section" id="idp408832"> 734 734 <h2 class="title" style="clear: both">Architecture</h2> 735 <div class="section" id="idp409 568">735 <div class="section" id="idp409472"> 736 736 <h3 class="title" style="clear: both">Overview</h3> 737 <p id="idp410 464"> icXML is more than an optimized version of Xerces. Many components were grouped,737 <p id="idp410368"> icXML is more than an optimized version of Xerces. Many components were grouped, 738 738 restructured and rearchitected with pipeline parallelism in mind. In this section, we 739 739 highlight the core differences between the two systems. As shown in Figure … … 761 761 <p class="title">Figure 1: Xerces Architecture</p> 762 762 <div class="figure-contents"> 763 <div class="mediaobject" id="idp417 104"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>763 <div class="mediaobject" id="idp417008"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div> 764 764 <div class="caption"></div> 765 765 </div> 766 766 </div> 767 <p id="idp419 360"> In icXML functions are grouped into logical components. As shown in Figure767 <p id="idp419264"> In icXML functions are grouped into logical components. As shown in Figure 768 768 \ref{fig:icxml-arch}, two major categories exist: (1) the Parabix Subsystem and (2) the 769 769 Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which … … 784 784 described in Section \ref{section:arch:errorhandling}. From here, two data-independent 785 785 branches exist: the Symbol Resolver and Content Preparation Unit. </p> 786 <p id="idp423 520"> A typical XML file contains few unique element and attribute namesâbut786 <p id="idp423344"> A typical XML file contains few unique element and attribute namesâbut 787 787 each of them will occur frequently. icXML stores these as distinct data structures, 788 788 called symbols, each with their own global identifier (GID). Using the symbol marker … … 790 790 Resolver</span> scans through the raw data to produce a sequence of GIDs, called 791 791 the <span class="ital">symbol stream</span>. </p> 792 <p id="idp42 6112"> The final components of the Parabix Subsystem are the <span class="ital">Content792 <p id="idp425936"> The final components of the Parabix Subsystem are the <span class="ital">Content 793 793 Preparation Unit</span> and <span class="ital">Content Stream 794 794 Generator</span>. The former takes the (transposed) basis bitstreams and selectively 795 795 filters them, according to the information provided by the Parallel Markup Parser, and 796 796 the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in Section \ref{section:arch:contentstream}. </p> 797 <p id="idp42 9024"> Combined, the symbol and content stream form icXML's compressed IR of the XML797 <p id="idp428848"> Combined, the symbol and content stream form icXML's compressed IR of the XML 798 798 document. The <span class="ital">Markup Processor</span>~parses the IR to 799 799 validate and produce the sequential output for the end user. The <span class="ital">Final WF checker</span> performs inter-element well-formedness validation that … … 807 807 <p class="title">Figure 2: icXML Architecture</p> 808 808 <div class="figure-contents"> 809 <div class="mediaobject" id="idp434 848"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>809 <div class="mediaobject" id="idp434672"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div> 810 810 <div class="caption"></div> 811 811 </div> 812 812 </div> 813 813 </div> 814 <div class="section" id="idp437 296">814 <div class="section" id="idp437120"> 815 815 <h3 class="title" style="clear: both">Character Set Adapters</h3> 816 <p id="idp437 968"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of816 <p id="idp437792"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of 817 817 Xerces itself and provide the end-consumer with a single encoding format. In the 818 818 important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant, … … 821 821 other cases, transcoding may involve table look-up operations for each byte of input. In 822 822 any case, transcoding imposes at least a cost of buffer copying. </p> 823 <p id="idp43 9024"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize823 <p id="idp438848"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize 824 824 transcoding costs. Given a specified input encoding, a CSA is responsible for checking 825 825 that input code units represent valid characters, mapping the characters of the encoding … … 827 827 item streams), as well as supporting ultimate transcoding requirements. All of this work 828 828 is performed using the parallel bitstream representation of the source input. </p> 829 <p id="idp4 40000"> An important observation is that many character sets are an extension to the legacy829 <p id="idp439824"> An important observation is that many character sets are an extension to the legacy 830 830 7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8, 831 831 UTF-16 and many others. Furthermore, all significant characters for parsing XML are 832 832 confined to the ASCII repertoire. Thus, a single common set of lexical item calculations 833 833 serves to compute lexical item streams for all such ASCII-based character sets. </p> 834 <p id="idp440 880"> A second observation is thatâregardless of which character set is834 <p id="idp440704"> A second observation is thatâregardless of which character set is 835 835 usedâquite often all of the characters in a particular block of input will be 836 836 within the ASCII range. This is a very simple test to perform using the bitstream … … 839 839 be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the 840 840 UTF-16 form are each set to zero in this case. </p> 841 <p id="idp442 800"> A third observation is that repeated transcoding of the names of XML elements,841 <p id="idp442624"> A third observation is that repeated transcoding of the names of XML elements, 842 842 attributes and so on can be avoided by using a look-up mechanism. That is, the first 843 843 occurrence of each symbol is stored in a look-up table mapping the input encoding to a … … 846 846 symbol look up is required to apply various XML validation rules, there is achieves the 847 847 effect of transcoding each occurrence without additional cost. </p> 848 <p id="idp443 856"> The cost of individual character transcoding is avoided whenever a block of input is848 <p id="idp443680"> The cost of individual character transcoding is avoided whenever a block of input is 849 849 confined to the ASCII subset and for all but the first occurrence of any XML element or 850 850 attribute name. Furthermore, when transcoding is required, the parallel bitstream … … 857 857 using bit scan operations. </p> 858 858 </div> 859 <div class="section" id="idp446 192">859 <div class="section" id="idp446016"> 860 860 <h3 class="title" style="clear: both">Combined Parallel Filtering</h3> 861 <p id="idp446 880"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last861 <p id="idp446704"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last 862 862 bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two 863 863 Chinese characters <code class="code">äœ å¥œ</code> are represented as two … … 873 873 may then be completed by applying parallel deletion and inverse transposition of the 874 874 UTF-16 bitstreams\cite{Cameron2008}. </p> 875 <div class="table-wrapper" id="idp45 1040">875 <div class="table-wrapper" id="idp450864"> 876 876 <p class="title">Table IV</p> 877 <div class="caption"><p id="idp451 552">XML Source Data and Derived Parallel Bit Streams</p></div>877 <div class="caption"><p id="idp451376">XML Source Data and Derived Parallel Bit Streams</p></div> 878 878 <table class="table"> 879 879 <colgroup span="1"> … … 925 925 </table> 926 926 </div> 927 <p id="idp464 640"> Rather than immediately paying the costs of deletion and transposition just for927 <p id="idp464464"> Rather than immediately paying the costs of deletion and transposition just for 928 928 transcoding, however, icXML defers these steps so that the deletion masks for several 929 929 stages of processing may be combined. In particular, this includes core XML requirements … … 938 938 939 939 </p> 940 <p id="idp46 7152"> In essence, the deletion masks for transcoding and for line break normalization each940 <p id="idp466976"> In essence, the deletion masks for transcoding and for line break normalization each 941 941 represent a bitwise filter; these filters can be combined using bitwise-or so that the 942 942 parallel deletion algorithm need only be applied once. </p> 943 <p id="idp468 464"> A further application of combined filtering is the processing of XML character and943 <p id="idp468288"> A further application of combined filtering is the processing of XML character and 944 944 entity references. Consider, for example, the references <code class="code">&</code> or 945 945 <code class="code"><</code>. which must be replaced in XML processing with the single … … 954 954 UTF-16 code unit. In the case, that this is not true, it is addressed in 955 955 post-processing. </p> 956 <p id="idp473 280"> The final step of combined filtering occurs during the process of reducing markup956 <p id="idp473168"> The final step of combined filtering occurs during the process of reducing markup 957 957 data to tag bytes preceding each significant XML transition as described in 958 958 section~\ref{section:arch:contentstream}. Overall, icXML avoids separate buffer copying … … 964 964 Haswell architecture. </p> 965 965 </div> 966 <div class="section" id="idp474 608">966 <div class="section" id="idp474496"> 967 967 <h3 class="title" style="clear: both">Content Stream</h3> 968 <p id="idp475 280"> A relatively-unique concept for icXML is the use of a filtered content stream.968 <p id="idp475168"> A relatively-unique concept for icXML is the use of a filtered content stream. 969 969 Rather that parsing an XML document in its original format, the input is transformed 970 970 into one that is easier for the parser to iterate through and produce the sequential … … 974 974 975 975 through the parallel filtering algorithm, described in section \ref{sec:parfilter}. </p> 976 <p id="idp477 680"> Combined with the symbol stream, the parser traverses the content stream to976 <p id="idp477568"> Combined with the symbol stream, the parser traverses the content stream to 977 977 effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following 978 978 <code class="code">></code> indicates that a start tag without any attributes is the first … … 986 986 null character in the content stream in parallel, which in turn means the parser can 987 987 directly jump to the end of every string without scanning for it. </p> 988 <p id="idp48 1120"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the988 <p id="idp480960"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the 989 989 existence of an attribute. Because all of the intra-element was performed in the Parabix 990 990 Subsystem, this must be a legal attribute. Since attributes can only occur within start … … 1000 1000 that the appropriate scope-nesting rules have been applied. </p> 1001 1001 </div> 1002 <div class="section" id="idp485 264">1002 <div class="section" id="idp485104"> 1003 1003 <h3 class="title" style="clear: both">Namespace Handling</h3> 1004 <p id="idp486 352"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used1004 <p id="idp486192"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used 1005 1005 together. It is especially important when a vocabulary application-dependant meaning, 1006 1006 such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound … … 1019 1019 uniquely-named items because the current vocabulary is determined by the namespace(s) 1020 1020 that are in-scope. </p> 1021 <div class="table-wrapper" id="idp493 520">1021 <div class="table-wrapper" id="idp493312"> 1022 1022 <p class="title">Table V</p> 1023 <div class="caption"><p id="idp49 4032">XML Namespace Example</p></div>1023 <div class="caption"><p id="idp493824">XML Namespace Example</p></div> 1024 1024 <table class="table"> 1025 1025 <colgroup span="1"> … … 1055 1055 </table> 1056 1056 </div> 1057 <p id="idp50 3056"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These1057 <p id="idp502800"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These 1058 1058 persist for the lifetime of the application through the use of a global URI pool. Xerces 1059 1059 maintains a stack of namespace scopes that is pushed (popped) every time a start tag … … 1063 1063 those that declare a set of namespaces upfront and never change them, and (2) those that 1064 1064 repeatedly modify the namespaces in predictable patterns. </p> 1065 <p id="idp50 4192"> For that reason, icXML contains an independent namespace stack and utilizes bit1065 <p id="idp503936"> For that reason, icXML contains an independent namespace stack and utilizes bit 1066 1066 vectors to cheaply perform 1067 1067 When a prefix is … … 1077 1077 found using a bit-scan intrinsic. A namespace binding table, similar to Table 1078 1078 \ref{tbl:namespace1}, provides the actual URI ID. </p> 1079 <div class="table-wrapper" id="idp508 608">1079 <div class="table-wrapper" id="idp508352"> 1080 1080 <p class="title">Table VI</p> 1081 <div class="caption"><p id="idp50 9120">Namespace Binding Table Example</p></div>1081 <div class="caption"><p id="idp508864">Namespace Binding Table Example</p></div> 1082 1082 <table class="table"> 1083 1083 <colgroup span="1"> … … 1120 1120 </table> 1121 1121 </div> 1122 <p id="idp52 5504">1122 <p id="idp524960"> 1123 1123 1124 1124 … … 1126 1126 1127 1127 </p> 1128 <p id="idp52 7408"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,1128 <p id="idp526864"> To ensure that scoping rules are adhered to, whenever a start tag is encountered, 1129 1129 any modification to the currently visible namespaces is calculated and stored within a 1130 1130 stack of bit vectors denoting the locally modified namespace bindings. When an end tag … … 1135 1135 </p> 1136 1136 </div> 1137 <div class="section" id="idp528 864">1137 <div class="section" id="idp528320"> 1138 1138 <h3 class="title" style="clear: both">Error Handling</h3> 1139 <p id="idp52 9536">1139 <p id="idp528992"> 1140 1140 1141 1141 Xerces outputs error messages in two ways: through the programmer API and as thrown … … 1146 1146 \ref{fig:icxml-arch}, icXML is divided into two sections: the Parabix Subsystem and 1147 1147 Markup Processor, each with its own system for detecting and producing error messages. </p> 1148 <p id="idp53 1168"> Within the Parabix Subsystem, all computations are performed in parallel, a block at1148 <p id="idp530624"> Within the Parabix Subsystem, all computations are performed in parallel, a block at 1149 1149 a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking 1150 1150 the byte-position of an error within a block, and the type of error is determined by the … … 1179 1179 detected, the sum of those skipped positions is subtracted from the distance to 1180 1180 determine the actual column number. </p> 1181 <p id="idp536 656"> The Markup Processor is a state-driven machine. As such, error detection within it1181 <p id="idp536112"> The Markup Processor is a state-driven machine. As such, error detection within it 1182 1182 is very similar to Xerces. However, reporting the correct line/column is a much more 1183 1183 difficult problem. The Markup Processor parses the content stream, which is a series of … … 1193 1193 </div> 1194 1194 </div> 1195 <div class="section" id="idp53 9088">1195 <div class="section" id="idp538544"> 1196 1196 <h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2> 1197 <p id="idp539 792"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM1197 <p id="idp539248"> As discussed in section \ref{background:xerces}, Xerces can be considered a FSM 1198 1198 application. These are "embarrassingly 1199 1199 sequential."\cite{Asanovic:EECS-2006-183} and notoriously difficult to … … 1203 1203 well into the general model of pipeline parallelism, in which each thread is in charge of a 1204 1204 single module or group of modules. </p> 1205 <p id="idp541 648"> The most straightforward division of work in icXML is to separate the Parabix Subsystem1205 <p id="idp541104"> The most straightforward division of work in icXML is to separate the Parabix Subsystem 1206 1206 and the Markup Processor into distinct logical layers into two separate stages. The 1207 1207 resultant application, <span class="ital">icXML-p</span>, is a course-grained … … 1224 1224 <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can 1225 1225 reuse the memory space. </p> 1226 <p id="idp550 720">1226 <p id="idp550224"> 1227 1227 <div class="figure" id="threads_timeline1"> 1228 1228 <p class="title">Figure 3: Thread Balance in Two-Stage Pipelines</p> 1229 1229 <div class="figure-contents"> 1230 <div class="mediaobject" id="idp552112"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div> 1230 <div class="mediaobject" id="idp551616"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div> 1231 <div class="mediaobject" id="idp553392"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div> 1231 1232 <div class="caption"></div> 1232 1233 </div> 1233 1234 </div> 1234 <div class="figure" id="threads_timeline2">1235 <p class="title">Figure 4: Thread Balance in Two-Stage Pipelines</p>1236 <div class="figure-contents">1237 <div class="mediaobject" id="idp555488"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div>1238 <div class="caption"></div>1239 </div>1240 </div>1241 1235 </p> 1242 <p id="idp55 7904"> Overall, our design is intended to benefit a range of applications. Conceptually, we1236 <p id="idp555840"> Overall, our design is intended to benefit a range of applications. Conceptually, we 1243 1237 consider two design points. The first, the parsing performed by the Parabix Subsystem 1244 1238 dominates at 67% of the overall cost, with the cost of application processing (including … … 1246 1240 scenario, the cost of application processing dominates at 60%, while the cost of XML 1247 1241 parsing represents an overhead of 40%. </p> 1248 <p id="idp55 8816"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to1242 <p id="idp556752"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to 1249 1243 100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement 1250 1244 of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the … … 1254 1248 about 33% of the original work. In this case, Amdahl's law predicts that we could expect up 1255 1249 to a 3x speedup at best. </p> 1256 <p id="idp55 9936"> At the other extreme of our design range, we consider an application in which core1250 <p id="idp557872"> At the other extreme of our design range, we consider an application in which core 1257 1251 parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the 1258 1252 corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most … … 1260 1254 the entire latency of parsing within the serial time required by the application. In this 1261 1255 case, we achieve an overall speedup in processing time by 1.67x. </p> 1262 <p id="idp5 60880"> Although the structure of the Parabix Subsystem allows division of the work into1256 <p id="idp558816"> Although the structure of the Parabix Subsystem allows division of the work into 1263 1257 several pipeline stages and has been demonstrated to be effective for four pipeline stages 1264 1258 in a research prototype \cite{HPCA2012}, our analysis here suggests that the further … … 1268 1262 the cost of application logic that could match reductions in core parsing cost. </p> 1269 1263 </div> 1270 <div class="section" id="idp56 2720">1264 <div class="section" id="idp560656"> 1271 1265 <h2 class="title" style="clear: both">Performance</h2> 1272 <p id="idp56 3392"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the1266 <p id="idp561328"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the 1273 1267 Xerces C++ SAXCount sample application, and a real world GML to SVG transformation 1274 1268 application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy … … 1276 1270 L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu 1277 1271 12.04 (Linux). </p> 1278 <p id="idp56 4304"> We analyzed the execution profiles of each XML parser using the performance counters1272 <p id="idp562240"> We analyzed the execution profiles of each XML parser using the performance counters 1279 1273 found in the processor. We chose several key hardware events that provide insight into the 1280 1274 profile of each application and indicate if the processor is doing useful work. The set of … … 1284 1278 collection of hardware performance monitoring statistics. In addition, we used the Linux 1285 1279 perf \cite{perf} utility to collect per core hardware events. </p> 1286 <div class="section" id="idp56 5440">1280 <div class="section" id="idp563376"> 1287 1281 <h3 class="title" style="clear: both">Xerces C++ SAXCount</h3> 1288 <p id="idp56 6112"> Xerces comes with sample applications that demonstrate salient features of the1282 <p id="idp564048"> Xerces comes with sample applications that demonstrate salient features of the 1289 1283 parser. SAXCount is the simplest such application: it counts the elements, attributes 1290 1284 and characters of a given XML file using the (event based) SAX API and prints out the 1291 1285 totals. </p> 1292 <p id="idp56 6864"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files1286 <p id="idp564752"> Table \ref{XMLDocChars} shows the document characteristics of the XML input files 1293 1287 selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented 1294 1288 XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the 1295 1289 UTF-8 encoding of Japanese characters. The remaining data files are data-oriented XML 1296 1290 documents and consist entirely of single byte encoded ASCII characters. 1297 <div class="table-wrapper" id="idp56 7600">1291 <div class="table-wrapper" id="idp565488"> 1298 1292 <p class="title">Table VII</p> 1299 <div class="caption"><p id="idp56 8112">XML Document Characteristics</p></div>1293 <div class="caption"><p id="idp566000">XML Document Characteristics</p></div> 1300 1294 <table class="table"> 1301 1295 <colgroup span="1"> … … 1346 1340 </div> 1347 1341 </p> 1348 <p id="idp58 3696"> A key predictor of the overall parsing performance of an XML file is markup1342 <p id="idp581584"> A key predictor of the overall parsing performance of an XML file is markup 1349 1343 density\footnote{ Markup Density: the ratio of markup bytes used to define the structure 1350 1344 of the document vs. its file size.}. This metric has substantial influence on the … … 1353 1347 of document-oriented and data-oriented XML files to analyze performance over a spectrum 1354 1348 of markup densities. </p> 1355 <p id="idp58 5312"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML1349 <p id="idp583200"> Figure \ref{perf_SAX} compares the performance of Xerces, icXML and pipelined icXML 1356 1350 in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over 1357 1351 Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve … … 1360 1354 icXML-p performs better as markup-density increases because the work performed by each 1361 1355 stage is well balanced in this application. </p> 1362 <p id="idp58 6352">1356 <p id="idp584240"> 1363 1357 <div class="figure" id="perf_SAX"> 1364 <p class="title">Figure 5: SAXCount Performance Comparison</p>1358 <p class="title">Figure 4: SAXCount Performance Comparison</p> 1365 1359 <div class="figure-contents"> 1366 <div class="mediaobject" id="idp58 7744"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>1360 <div class="mediaobject" id="idp585632"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div> 1367 1361 <div class="caption"></div> 1368 1362 </div> … … 1370 1364 </p> 1371 1365 </div> 1372 <div class="section" id="idp5 90288">1366 <div class="section" id="idp588176"> 1373 1367 <h3 class="title" style="clear: both">GML2SVG</h3> 1374 <p id="idp5 90960"> As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application1368 <p id="idp588848"> As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application 1375 1369 was chosen. This application transforms geospatially encoded data represented using 1376 1370 an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography} … … 1384 1378 a known XML format for the purpose of analysis and restructuring to meet 1385 1379 the requirements of an alternative format.</p> 1386 <p id="idp59 2288">Our GML to SVG data translations are executed on GML source data1380 <p id="idp590224">Our GML to SVG data translations are executed on GML source data 1387 1381 modelling the city of Vancouver, British Columbia, Canada. 1388 1382 The GML source document set … … 1392 1386 213.4 MB of source GML data generates 91.9 MB of target SVG data.</p> 1393 1387 <div class="figure" id="perf_GML2SVG"> 1394 <p class="title">Figure 6: Performance Comparison for GML2SVG</p>1388 <p class="title">Figure 5: Performance Comparison for GML2SVG</p> 1395 1389 <div class="figure-contents"> 1396 <div class="mediaobject" id="idp59 4320"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>1390 <div class="mediaobject" id="idp592304"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div> 1397 1391 <div class="caption"></div> 1398 1392 </div> 1399 1393 </div> 1400 <p id="idp59 6608">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against1401 the Xerces, \icXML{} and \icXMLp{}.1402 On the GML workload with this application, single-thread \icXML{}1403 achieved about a 50 \% acceleration over Xerces,1394 <p id="idp594592">Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against 1395 the Xerces, icXML and icXML-p. 1396 On the GML workload with this application, single-thread icXML 1397 achieved about a 50% acceleration over Xerces, 1404 1398 increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec. 1405 Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,1399 Using icXML-p, a further throughput increase to 111 MB/sec was recorded, 1406 1400 approximately a 2X speedup.</p> 1407 <p id="idp59 7440">An important aspect of \icXML{}is the replacement of much branch-laden1401 <p id="idp595408">An important aspect of icXML is the replacement of much branch-laden 1408 1402 sequential code inside Xerces with straight-line SIMD code using far 1409 1403 fewer branches. Figure \ref{branchmiss_GML2SVG} shows the corresponding 1410 1404 improvement in branching behaviour, with a dramatic reduction in branch misses per kB. 1411 It is also interesting to note that \icXMLp{}goes even further.1405 It is also interesting to note that icXML-p goes even further. 1412 1406 In essence, in using pipeline parallelism to split the instruction 1413 1407 stream onto separate cores, the branch target buffers on each core are 1414 1408 less overloaded and able to increase the successful branch prediction rate.</p> 1415 1409 <div class="figure" id="branchmiss_GML2SVG"> 1416 <p class="title">Figure 7: Comparative Branch Misprediction Rate</p>1410 <p class="title">Figure 6: Comparative Branch Misprediction Rate</p> 1417 1411 <div class="figure-contents"> 1418 <div class="mediaobject" id="idp 600144"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>1412 <div class="mediaobject" id="idp597520"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div> 1419 1413 <div class="caption"></div> 1420 1414 </div> 1421 1415 </div> 1422 <p id="idp 602432">The behaviour of the three versions with respect to L1 cache misses per kB is shown1416 <p id="idp599808">The behaviour of the three versions with respect to L1 cache misses per kB is shown 1423 1417 in Figure \ref{cachemiss_GML2SVG}. Improvements are shown in both instruction- 1424 1418 and data-cache performance with the improvements in instruction-cache 1425 behaviour the most dramatic. Single-threaded \icXML{}shows substantially improved1419 behaviour the most dramatic. Single-threaded icXML shows substantially improved 1426 1420 performance over Xerces on both measures. 1427 Although \icXMLp{} is slightly worse \wrt{}data-cache performance,1421 Although icXML-p is slightly worse with respect to data-cache performance, 1428 1422 this is more than offset by a further dramatic reduction in instruction-cache miss rate. 1429 1423 Again partitioning the instruction stream through the pipeline parallelism model has 1430 1424 significant benefit.</p> 1431 1425 <div class="figure" id="cachemiss_GML2SVG"> 1432 <p class="title">Figure 8: Comparative Cache Miss Rate</p>1426 <p class="title">Figure 7: Comparative Cache Miss Rate</p> 1433 1427 <div class="figure-contents"> 1434 <div class="mediaobject" id="idp60 4544"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>1428 <div class="mediaobject" id="idp601936"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div> 1435 1429 <div class="caption"></div> 1436 1430 </div> 1437 1431 </div> 1438 <p id="idp60 6832">One caveat with this study is that the GML2SVG application did not exhibit1432 <p id="idp604224">One caveat with this study is that the GML2SVG application did not exhibit 1439 1433 a relative balance of processing between application code and Xerces library 1440 code reaching the 33 \% figure. This suggests that for this application and1434 code reaching the 33% figure. This suggests that for this application and 1441 1435 possibly others, further separating the logical layers of the 1442 \icXML{}engine into different pipeline stages could well offer significant benefit.1436 icXML engine into different pipeline stages could well offer significant benefit. 1443 1437 This remains an area of ongoing work.</p> 1444 1438 </div> 1445 1439 </div> 1446 <div class="section" id="idp60 7904">1440 <div class="section" id="idp605296"> 1447 1441 <h2 class="title" style="clear: both">Conclusion and Future Work</h2> 1448 <p id="idp60 8592"> This paper is the first case study documenting the significant performance benefits1442 <p id="idp605984"> This paper is the first case study documenting the significant performance benefits 1449 1443 that may be realized through the integration of parallel bitstream technology into existing 1450 1444 widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined … … 1456 1450 technologies, this is an important case study demonstrating the general feasibility of 1457 1451 these techniques. </p> 1458 <p id="idp60 9872"> The further development of icXML to move beyond 2-stage pipeline parallelism is1452 <p id="idp607264"> The further development of icXML to move beyond 2-stage pipeline parallelism is 1459 1453 ongoing, with realistic prospects for four reasonably balanced stages within the library. 1460 1454 For applications such as GML2SVG which are dominated by time spent on XML parsing, such a 1461 1455 multistage pipelined parsing library should offer substantial benefits. </p> 1462 <p id="idp6 10640"> The example of XML parsing may be considered prototypical of finite-state machines1456 <p id="idp608032"> The example of XML parsing may be considered prototypical of finite-state machines 1463 1457 applications which have sometimes been considered "embarassingly 1464 1458 sequential" and so difficult to parallelize that "nothing … … 1466 1460 point in making the case that parallelization can indeed be helpful across a broad array of 1467 1461 application types. </p> 1468 <p id="idp6 12016"> To overcome the software engineering challenges in applying parallel bitstream1462 <p id="idp609408"> To overcome the software engineering challenges in applying parallel bitstream 1469 1463 technology to existing software systems, it is clear that better library and tool support 1470 1464 is needed. The techniques used in the implementation of icXML and documented in this paper … … 1473 1467 </p> 1474 1468 </div> 1475 <div class="bibliography" id="idp61 3504">1469 <div class="bibliography" id="idp611376"> 1476 1470 <h2 class="title" style="clear:both">Bibliography</h2> 1477 1471 <p class="bibliomixed" id="XMLChip09">[Leventhal and Lemoine 2009] Leventhal, Michael and -
docs/Balisage13/Bal2013came0601/Bal2013came0601.xml
r3051 r3052 798 798 </imageobject> 799 799 </mediaobject> 800 <caption>801 </caption>802 </figure>803 <figure xml:id="threads_timeline2">804 <title>Thread Balance in Two-Stage Pipelines</title>805 800 <mediaobject> 806 801 <imageobject> … … 952 947 953 948 <para>Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against 954 the Xerces, \icXML{} and \icXMLp{}.955 On the GML workload with this application, single-thread \icXML{}956 achieved about a 50 \% acceleration over Xerces,949 the Xerces, icXML and icXML-p. 950 On the GML workload with this application, single-thread icXML 951 achieved about a 50% acceleration over Xerces, 957 952 increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec. 958 Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,953 Using icXML-p, a further throughput increase to 111 MB/sec was recorded, 959 954 approximately a 2X speedup.</para> 960 955 961 <para>An important aspect of \icXML{}is the replacement of much branch-laden956 <para>An important aspect of icXML is the replacement of much branch-laden 962 957 sequential code inside Xerces with straight-line SIMD code using far 963 958 fewer branches. Figure \ref{branchmiss_GML2SVG} shows the corresponding 964 959 improvement in branching behaviour, with a dramatic reduction in branch misses per kB. 965 It is also interesting to note that \icXMLp{}goes even further.960 It is also interesting to note that icXML-p goes even further. 966 961 In essence, in using pipeline parallelism to split the instruction 967 962 stream onto separate cores, the branch target buffers on each core are … … 982 977 in Figure \ref{cachemiss_GML2SVG}. Improvements are shown in both instruction- 983 978 and data-cache performance with the improvements in instruction-cache 984 behaviour the most dramatic. Single-threaded \icXML{}shows substantially improved979 behaviour the most dramatic. Single-threaded icXML shows substantially improved 985 980 performance over Xerces on both measures. 986 Although \icXMLp{} is slightly worse \wrt{}data-cache performance,981 Although icXML-p is slightly worse with respect to data-cache performance, 987 982 this is more than offset by a further dramatic reduction in instruction-cache miss rate. 988 983 Again partitioning the instruction stream through the pipeline parallelism model has … … 1002 997 <para>One caveat with this study is that the GML2SVG application did not exhibit 1003 998 a relative balance of processing between application code and Xerces library 1004 code reaching the 33 \% figure. This suggests that for this application and999 code reaching the 33% figure. This suggests that for this application and 1005 1000 possibly others, further separating the logical layers of the 1006 \icXML{}engine into different pipeline stages could well offer significant benefit.1001 icXML engine into different pipeline stages could well offer significant benefit. 1007 1002 This remains an area of ongoing work.</para> 1008 1003 </section>
Note: See TracChangeset
for help on using the changeset viewer.