Changeset 3397 for docs/Balisage13


Ignore:
Timestamp:
Jul 12, 2013, 3:03:07 PM (6 years ago)
Author:
cameron
Message:

Clean up references and links.

Location:
docs/Balisage13
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3395 r3397  
    185185<body>
    186186<div class="inline-citation" id="cite-CameronHerdyLin2008" style="display:none;width: 240px">
    187 <a class="quiet" href="javascript:hidecite('cite-CameronHerdyLin2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. 2008 New York, NY, USA</p>
     187<a class="quiet" href="javascript:hidecite('cite-CameronHerdyLin2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. Richmond Hill, Ontario, Canada. 2008.</p>
    188188</div>
    189189<div class="inline-citation" id="cite-papi" style="display:none;width: 240px">
     
    191191</div>
    192192<div class="inline-citation" id="cite-perf" style="display:none;width: 240px">
    193 <a class="quiet" href="javascript:hidecite('cite-perf')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf.<a href="https://perf.wiki.kernel.org/index.php/Tutorial" class="link" target="_new">https://perf.wiki.kernel.org/index.php/Tutorial</a></p>
     193<a class="quiet" href="javascript:hidecite('cite-perf')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf. <a href="https://perf.wiki.kernel.org/index.php/Tutorial" class="link" target="_new">https://perf.wiki.kernel.org/index.php/Tutorial</a></p>
    194194</div>
    195195<div class="inline-citation" id="cite-Cameron2008" style="display:none;width: 240px">
    196 <a class="quiet" href="javascript:hidecite('cite-Cameron2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2008 New York, NY, USA</p>
     196<a class="quiet" href="javascript:hidecite('cite-Cameron2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Salt Lake City, USA. 2008.</p>
    197197</div>
    198198<div class="inline-citation" id="cite-ParaDOM2009" style="display:none;width: 240px">
    199 <a class="quiet" href="javascript:hidecite('cite-ParaDOM2009')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009</p>
     199<a class="quiet" href="javascript:hidecite('cite-ParaDOM2009')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009.</p>
    200200</div>
    201201<div class="inline-citation" id="cite-XMLSSE42" style="display:none;width: 240px">
    202 <a class="quiet" href="javascript:hidecite('cite-XMLSSE42')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). 2008<a href="Intel%20Software%20Network" class="link" target="_new">Intel Software Network</a></p>
     202<a class="quiet" href="javascript:hidecite('cite-XMLSSE42')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). <a href="Intel%20Software%20Network" class="link" target="_new">Intel Software Network</a>.  2008.</p>
    203203</div>
    204204<div class="inline-citation" id="cite-Cameron2009" style="display:none;width: 240px">
    205 <a class="quiet" href="javascript:hidecite('cite-Cameron2009')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. 2009</p>
     205<a class="quiet" href="javascript:hidecite('cite-Cameron2009')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Montreal, Quebec, Canada.  2009.</p>
    206206</div>
    207207<div class="inline-citation" id="cite-HilewitzLee2006" style="display:none;width: 240px">
    208 <a class="quiet" href="javascript:hidecite('cite-HilewitzLee2006')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. 2006 Washington, DC, USA</p>
     208<a class="quiet" href="javascript:hidecite('cite-HilewitzLee2006')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. Steamboat Springs, Colorado, USA.  2006.</p>
    209209</div>
    210210<div class="inline-citation" id="cite-Asanovic-EECS-2006-183" style="display:none;width: 240px">
    211 <a class="quiet" href="javascript:hidecite('cite-Asanovic-EECS-2006-183')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. 2006</p>
     211<a class="quiet" href="javascript:hidecite('cite-Asanovic-EECS-2006-183')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department, University of California, Berkeley.  2006.</p>
    212212</div>
    213213<div class="inline-citation" id="cite-GRID2006" style="display:none;width: 240px">
    214 <a class="quiet" href="javascript:hidecite('cite-GRID2006')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. 2006 Washington, DC, USA</p>
     214<a class="quiet" href="javascript:hidecite('cite-GRID2006')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. Barcelona, Spain.  2006.</p>
    215215</div>
    216216<div class="inline-citation" id="cite-cameron-EuroPar2011" style="display:none;width: 240px">
    217 <a class="quiet" href="javascript:hidecite('cite-cameron-EuroPar2011')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II. 2011 Berlin, Heidelberg</p>
     217<a class="quiet" href="javascript:hidecite('cite-cameron-EuroPar2011')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II.  Bordeaux, Frane. 2011.</p>
    218218</div>
    219219<div class="inline-citation" id="cite-HPCA2012" style="display:none;width: 240px">
    220 <a class="quiet" href="javascript:hidecite('cite-HPCA2012')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. 2012 Los Alamitos, CA, USA</p>
     220<a class="quiet" href="javascript:hidecite('cite-HPCA2012')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. New Orleans, LA. 2012.</p>
    221221</div>
    222222<div class="inline-citation" id="cite-HPCC2011" style="display:none;width: 240px">
    223 <a class="quiet" href="javascript:hidecite('cite-HPCC2011')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. 2011 Los Alamitos, CA, USA</p>
     223<a class="quiet" href="javascript:hidecite('cite-HPCC2011')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. Banff, Alberta, Canada. 2011.</p>
    224224</div>
    225225<div class="inline-citation" id="cite-E-SCIENCE2007" style="display:none;width: 240px">
    226 <a class="quiet" href="javascript:hidecite('cite-E-SCIENCE2007')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing. 2007 Los Alamitos, CA, USA</p>
     226<a class="quiet" href="javascript:hidecite('cite-E-SCIENCE2007')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing.   Bangalore, India.  2007.</p>
    227227</div>
    228228<div class="inline-citation" id="cite-ICWS2008" style="display:none;width: 240px">
    229 <a class="quiet" href="javascript:hidecite('cite-ICWS2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. 2008 Los Alamitos, CA, USA</p>
     229<a class="quiet" href="javascript:hidecite('cite-ICWS2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. Beijing, China.  2008.</p>
    230230</div>
    231231<div class="inline-citation" id="cite-IPDPS2008" style="display:none;width: 240px">
    232 <a class="quiet" href="javascript:hidecite('cite-IPDPS2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. 2008 Los Alamitos, CA, USA</p>
     232<a class="quiet" href="javascript:hidecite('cite-IPDPS2008')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. Miami, Florida, USA.  2008.</p>
    233233</div>
    234234<div class="inline-citation" id="cite-HackersDelight" style="display:none;width: 240px">
    235 <a class="quiet" href="javascript:hidecite('cite-HackersDelight')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Warren, Henry S.. Hacker's Delight. 2002</p>
     235<a class="quiet" href="javascript:hidecite('cite-HackersDelight')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Warren, Henry S.. Hacker's Delight. Addison-Wesley Professional. 2003.</p>
    236236</div>
    237237<div class="inline-citation" id="cite-lu2007advances" style="display:none;width: 240px">
    238 <a class="quiet" href="javascript:hidecite('cite-lu2007advances')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. 2007</p>
     238<a class="quiet" href="javascript:hidecite('cite-lu2007advances')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. Geoinformatica 11:131-157.  2007.</p>
    239239</div>
    240240<div class="inline-citation" id="cite-lake2004geography" style="display:none;width: 240px">
    241 <a class="quiet" href="javascript:hidecite('cite-lake2004geography')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. 2004</p>
     241<a class="quiet" href="javascript:hidecite('cite-lake2004geography')" style="font-size:90%"><img src="eks.png" alt="[x]" style="float:right;clear:both;margin:1px"></a><p style="margin:0ex">Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. Wiley.  Chichester.  2004.</p>
    242242</div>
    243243<div id="mast"><div class="content">
    244 <h2 class="article-title" id="idp76432">icXML:  Accelerating a Commercial XML
     244<h2 class="article-title" id="idp73008">icXML:  Accelerating a Commercial XML
    245245     Parser Using SIMD and Multicore Technologies</h2>
    246246<div class="author">
     
    292292<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ashriram.cs.sfu.ca">ashriram.cs.sfu.ca</a>&gt;</code></h5>
    293293</div>
    294 <div class="legalnotice-block"><p id="idp284896">Copyright © 2013 Nigel Medforth, Dan Lin, Kenneth S. Herdy, Robert D. Cameron  and Arrvindh Shriraman.
     294<div class="legalnotice-block"><p id="idp282976">Copyright © 2013 Nigel Medforth, Dan Lin, Kenneth S. Herdy, Robert D. Cameron  and Arrvindh Shriraman.
    295295            This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative
    296296            Works 2.5 Canada License.</p></div>
    297297<div class="mast-box">
    298 <p class="title"><a href="javascript:toggle('idp77216')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp77216"></a> <span onclick="javascript:toggle('idp77216');return true">Abstract</span></p>
    299 <div class="folder" id="folder-idp77216" style="display:none"><p id="idp77520">Prior research on the acceleration of XML processing using single-instruction
     298<p class="title"><a href="javascript:toggle('idp73792')" class="quiet"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp73792"></a> <span onclick="javascript:toggle('idp73792');return true">Abstract</span></p>
     299<div class="folder" id="folder-idp73792" style="display:none"><p id="idp74096">Prior research on the acceleration of XML processing using single-instruction
    300300           multiple-data (SIMD) and multi-core
    301301            parallelism has lead to a number of interesting research prototypes. This work is
     
    323323<p><b>Table of Contents</b></p>
    324324<dl>
    325 <dt><span class="section"><a href="#idp286832" class="toc">Introduction</a></span></dt>
     325<dt><span class="section"><a href="#idp284912" class="toc">Introduction</a></span></dt>
    326326<dt><span class="section"><a href="#background" class="toc">Background</a></span></dt>
    327327<dd><dl>
    328328<dt><span class="section"><a href="#background-xerces" class="toc">Xerces C++ Structure</a></span></dt>
    329 <dt><span class="section"><a href="#idp361744" class="toc">The Parabix Framework</a></span></dt>
    330 <dt><span class="section"><a href="#idp457376" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
     329<dt><span class="section"><a href="#idp360272" class="toc">The Parabix Framework</a></span></dt>
     330<dt><span class="section"><a href="#idp455536" class="toc">Sequential vs. Parallel Paradigm</a></span></dt>
    331331</dl></dd>
    332332<dt><span class="section"><a href="#architecture" class="toc">Architecture</a></span></dt>
    333333<dd><dl>
    334 <dt><span class="section"><a href="#idp465008" class="toc">Overview</a></span></dt>
     334<dt><span class="section"><a href="#idp463168" class="toc">Overview</a></span></dt>
    335335<dt><span class="section"><a href="#character-set-adapter" class="toc">Character Set Adapters</a></span></dt>
    336336<dt><span class="section"><a href="#par-filter" class="toc">Combined Parallel Filtering</a></span></dt>
     
    342342<dt><span class="section"><a href="#performance" class="toc">Performance</a></span></dt>
    343343<dd><dl>
    344 <dt><span class="section"><a href="#idp654464" class="toc">Xerces C++ SAXCount</a></span></dt>
    345 <dt><span class="section"><a href="#idp680992" class="toc">GML2SVG</a></span></dt>
     344<dt><span class="section"><a href="#idp669920" class="toc">Xerces C++ SAXCount</a></span></dt>
     345<dt><span class="section"><a href="#idp696544" class="toc">GML2SVG</a></span></dt>
    346346</dl></dd>
    347347<dt><span class="section"><a href="#conclusion" class="toc">Conclusion and Future Work</a></span></dt>
     
    349349</div>
    350350<div class="mast-box">
    351 <p class="title"><a href="javascript:toggle('idp79584')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp79584"></a> <span onclick="javascript:toggle('idp79584');return true">Nigel Medforth</span></p>
    352 <div class="folder" id="folder-idp79584" style="display:none">
     351<p class="title"><a href="javascript:toggle('idp76160')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp76160"></a> <span onclick="javascript:toggle('idp76160');return true">Nigel Medforth</span></p>
     352<div class="folder" id="folder-idp76160" style="display:none">
    353353<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:nmedfort@sfu.ca">nmedfort@sfu.ca</a>&gt;</code></h5>
    354354<div class="affiliation">
     
    361361</div>
    362362<div class="personblurb">
    363 <p id="idp61840">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
     363<p id="idp58512">Nigel Medforth is a M.Sc. student at Simon Fraser University and the lead
    364364               developer of icXML. He earned a Bachelor of Technology in Information Technology at
    365365               Kwantlen Polytechnic University in 2009 and was awarded the Dean’s Medal for
    366366               Outstanding Achievement.</p>
    367 <p id="idp62848">Nigel is currently researching ways to leverage both the Parabix framework and
     367<p id="idp59520">Nigel is currently researching ways to leverage both the Parabix framework and
    368368               stream-processing models to further accelerate XML parsing within icXML.</p>
    369369</div>
     
    371371</div>
    372372<div class="mast-box">
    373 <p class="title"><a href="javascript:toggle('idp66496')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp66496"></a> <span onclick="javascript:toggle('idp66496');return true">Dan Lin</span></p>
    374 <div class="folder" id="folder-idp66496" style="display:none">
     373<p class="title"><a href="javascript:toggle('idp63168')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp63168"></a> <span onclick="javascript:toggle('idp63168');return true">Dan Lin</span></p>
     374<div class="folder" id="folder-idp63168" style="display:none">
    375375<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:lindanl@sfu.ca">lindanl@sfu.ca</a>&gt;</code></h5>
    376376<div class="affiliation">
     
    378378<p class="orgname">School of Computing Science, Simon Fraser University </p>
    379379</div>
    380 <div class="personblurb"><p id="idp68208">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
     380<div class="personblurb"><p id="idp64880">Dan Lin is a Ph.D student at Simon Fraser University. She earned a Master of Science
    381381             in Computing Science at Simon Fraser University in 2010. Her research focus on on high
    382382             performance algorithms that exploit parallelization strategies on various multicore platforms.
     
    385385</div>
    386386<div class="mast-box">
    387 <p class="title"><a href="javascript:toggle('idp70752')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp70752"></a> <span onclick="javascript:toggle('idp70752');return true">Kenneth Herdy</span></p>
    388 <div class="folder" id="folder-idp70752" style="display:none">
     387<p class="title"><a href="javascript:toggle('idp67424')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp67424"></a> <span onclick="javascript:toggle('idp67424');return true">Kenneth Herdy</span></p>
     388<div class="folder" id="folder-idp67424" style="display:none">
    389389<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ksherdy@sfu.ca">ksherdy@sfu.ca</a>&gt;</code></h5>
    390390<div class="affiliation">
     
    393393</div>
    394394<div class="personblurb">
    395 <p id="idp271952"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
     395<p id="idp270032"> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
    396396               Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor
    397397               of Science in Computing Science with a Certificate in Spatial Information Systems at
    398398               Simon Fraser University in 2005. </p>
    399 <p id="idp272688"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
     399<p id="idp270768"> Ken is currently pursuing PhD studies in Computing Science at Simon Fraser
    400400               University with industrial scholarship support from the Natural Sciences and
    401401               Engineering Research Council of Canada, the Mathematics of Information Technology and
     
    407407</div>
    408408<div class="mast-box">
    409 <p class="title"><a href="javascript:toggle('idp275424')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp275424"></a> <span onclick="javascript:toggle('idp275424');return true">Rob Cameron</span></p>
    410 <div class="folder" id="folder-idp275424" style="display:none">
     409<p class="title"><a href="javascript:toggle('idp273504')" class="linkbox"><img class="toc-icon" src="plus.png" alt="expand" id="icon-idp273504"></a> <span onclick="javascript:toggle('idp273504');return true">Rob Cameron</span></p>
     410<div class="folder" id="folder-idp273504" style="display:none">
    411411<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
    412412<div class="affiliation">
     
    418418<p class="orgname">International Characters, Inc.</p>
    419419</div>
    420 <div class="personblurb"><p id="idp277088">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
     420<div class="personblurb"><p id="idp275168">Dr. Rob Cameron is Professor of Computing Science and Associate Dean of Applied
    421421               Sciences at Simon Fraser University. His research interests include programming
    422422               language and software system technology, with a specific focus on high performance
     
    434434<div id="main">
    435435<div class="article">
    436 <h2 class="article-title" id="idp76432">icXML:  Accelerating a Commercial XML
     436<h2 class="article-title" id="idp73008">icXML:  Accelerating a Commercial XML
    437437     Parser Using SIMD and Multicore Technologies</h2>
    438 <div class="section" id="idp286832">
     438<div class="section" id="idp284912">
    439439<h2 class="title" style="clear: both">Introduction</h2>
    440 <p id="idp287472">   
     440<p id="idp285552">   
    441441        Parallelization and acceleration of XML parsing is a widely
    442442        studied problem that has seen the development of a number
    443         of interesting research prototypes using both SIMD and
    444         multicore parallelism.   Most works have investigated
     443        of interesting research prototypes using both single-instruction
     444           multiple-data (SIMD) and
     445        multi-core parallelism.   Most works have investigated
    445446        data parallel solutions on multicore
    446447        architectures using various strategies to break input
     
    448449        For example, one possibility for data
    449450        parallelization is to add a pre-parsing step to compute
    450         a skeleton tree structure of an  XML document <a class="xref" id="idp288288" href="javascript:showcite('cite-GRID2006','idp288288')">[Lu and Chiu 2006]</a>.
     451        a skeleton tree structure of an  XML document <a class="xref" id="idp286400" href="javascript:showcite('cite-GRID2006','idp286400')">Lu and Chiu 2006</a>.
    451452        The parallelization of the pre-parsing stage itself can be tackled with
    452           state machines <a class="xref" id="idp301312" href="javascript:showcite('cite-E-SCIENCE2007','idp301312')">[Pan and Zhang 2007]</a>, <a class="xref" id="idp302064" href="javascript:showcite('cite-IPDPS2008','idp302064')">[Pan and Zhang 2008b]</a>.
    453         Methods without pre-parsing have used speculation <a class="xref" id="idp302880" href="javascript:showcite('cite-HPCC2011','idp302880')">[You and Wang 2011]</a> or post-processing that
    454         combines the partial results <a class="xref" id="idp303712" href="javascript:showcite('cite-ParaDOM2009','idp303712')">[Shah and Rao 2009]</a>.
     453          state machines <a class="xref" id="idp299376" href="javascript:showcite('cite-E-SCIENCE2007','idp299376')">Pan and Zhang 2007</a>, <a class="xref" id="idp300128" href="javascript:showcite('cite-IPDPS2008','idp300128')">Pan and Zhang 2008b</a>.
     454        Methods without pre-parsing have used speculation <a class="xref" id="idp300944" href="javascript:showcite('cite-HPCC2011','idp300944')">You and Wang 2011</a> or post-processing that
     455        combines the partial results <a class="xref" id="idp301776" href="javascript:showcite('cite-ParaDOM2009','idp301776')">Shah and Rao 2009</a>.
    455456        A hybrid technique that combines data and pipeline parallelism was proposed to
    456         hide the latency of a "job" that has to be done sequentially <a class="xref" id="idp304576" href="javascript:showcite('cite-ICWS2008','idp304576')">[Pan and Zhang 2008a]</a>.
     457        hide the latency of a "job" that has to be done sequentially <a class="xref" id="idp302640" href="javascript:showcite('cite-ICWS2008','idp302640')">Pan and Zhang 2008a</a>.
    457458      </p>
    458 <p id="idp305456">
     459<p id="idp303520">
    459460        Fewer efforts have investigated SIMD parallelism, although this approach
    460461        has the potential advantage of improving single core performance as well
    461         as offering savings in energy consumption <a class="xref" id="idp305920" href="javascript:showcite('cite-HPCA2012','idp305920')">[Lin and Medforth 2012]</a>.
     462        as offering savings in energy consumption <a class="xref" id="idp303984" href="javascript:showcite('cite-HPCA2012','idp303984')">Lin and Medforth 2012</a>.
    462463        Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension
    463         and showed how they can be used to improve the performance of XML parsing <a class="xref" id="idp306864" href="javascript:showcite('cite-XMLSSE42','idp306864')">[Lei 2008]</a>.
     464        and showed how they can be used to improve the performance of XML parsing <a class="xref" id="idp304928" href="javascript:showcite('cite-XMLSSE42','idp304928')">Lei 2008</a>.
    464465        The Parabix framework uses generic SIMD extensions and bit parallel methods to
    465         process hundreds of XML input characters simultaneously <a class="xref" id="idp307776" href="javascript:showcite('cite-Cameron2009','idp307776')">[Balisage 2009]</a> <a class="xref" id="idp308528" href="javascript:showcite('cite-cameron-EuroPar2011','idp308528')">[Parabix2 2011]</a>.
     466        process hundreds of XML input characters simultaneously <a class="xref" id="idp305840" href="javascript:showcite('cite-Cameron2009','idp305840')">Balisage 2009</a> <a class="xref" id="idp306592" href="javascript:showcite('cite-cameron-EuroPar2011','idp306592')">Parabix2 2011</a>.
    466467        Parabix prototypes have also combined SIMD methods with thread-level parallelism to
    467         achieve further acceleration on multicore systems <a class="xref" id="idp309440" href="javascript:showcite('cite-HPCA2012','idp309440')">[Lin and Medforth 2012]</a>.
     468        achieve further acceleration on multicore systems <a class="xref" id="idp307504" href="javascript:showcite('cite-HPCA2012','idp307504')">Lin and Medforth 2012</a>.
    468469      </p>
    469 <p id="idp310208">
     470<p id="idp308272">
    470471        In this paper, we move beyond research prototypes to consider
    471472        the detailed integration of both SIMD and multicore parallelism into the
     
    486487        multiple cores.
    487488      </p>
    488 <p id="idp311648">
     489<p id="idp309712">
    489490        The remainder of this paper is organized as follows.   
    490           <a class="xref" href="#background" title="Background">section “Background”</a> discusses the structure of the Xerces and Parabix XML parsers and the fundamental
     491          <a class="xref" href="#background" title="Background">Section “Background”</a> discusses the structure of the Xerces and Parabix XML parsers and the fundamental
    491492        differences between the two parsing models.   
    492         <a class="xref" href="#architecture" title="Architecture">section “Architecture”</a> then presents the icXML design based on a restructured Xerces architecture to
     493        <a class="xref" href="#architecture" title="Architecture">Section “Architecture”</a> then presents the icXML design based on a restructured Xerces architecture to
    493494        incorporate SIMD parallelism using Parabix methods.   
    494         <a class="xref" href="#multithread" title="Multithreading with Pipeline Parallelism">section “Multithreading with Pipeline Parallelism”</a> moves on to consider the multithreading of the icXML architecture
     495        <a class="xref" href="#multithread" title="Multithreading with Pipeline Parallelism">Section “Multithreading with Pipeline Parallelism”</a> moves on to consider the multithreading of the icXML architecture
    495496        using the pipeline parallelism model. 
    496         <a class="xref" href="#performance" title="Performance">section “Performance”</a> analyzes the performance of both the single-threaded and
     497        <a class="xref" href="#performance" title="Performance">Section “Performance”</a> analyzes the performance of both the single-threaded and
    497498        multi-threaded versions of icXML in comparison to original Xerces,
    498499        demonstrating substantial end-to-end acceleration of
    499500        a GML-to-SVG translation application written against the Xerces API.
    500           <a class="xref" href="#conclusion" title="Conclusion and Future Work">section “Conclusion and Future Work”</a> concludes the paper with a discussion of future work and the potential for
     501          <a class="xref" href="#conclusion" title="Conclusion and Future Work">Section “Conclusion and Future Work”</a> concludes the paper with a discussion of future work and the potential for
    501502        applying the techniques discussed herein in other application domains.
    502503      </p>
     
    506507<div class="section" id="background-xerces">
    507508<h3 class="title" style="clear: both">Xerces C++ Structure</h3>
    508 <p id="idp318976"> The Xerces C++ parser is a widely-used standards-conformant
     509<p id="idp317296"> The Xerces C++ parser is a widely-used standards-conformant
    509510            XML parser produced as open-source software
    510511             by the Apache Software Foundation.
     
    517518            parsing using either pull parsing or SAX/SAX2 push-style parsing as well as a DOM
    518519            tree-based parsing interface. </p>
    519 <p id="idp321104">
     520<p id="idp319424">
    520521            Xerces,
    521522            like all traditional parsers, processes XML documents sequentially a byte-at-a-time from
     
    536537<div class="table-wrapper" id="xerces-profile">
    537538<p class="title">Table I</p>
    538 <div class="caption"><p id="idp13392">Execution Time of Top 10 Xerces Functions</p></div>
     539<div class="caption"><p id="idm848368">Execution Time of Top 10 Xerces Functions</p></div>
    539540<table class="table" xml:id="xerces-profile">
    540541<colgroup span="1">
     
    591592</div>
    592593</div>
    593 <div class="section" id="idp361744">
     594<div class="section" id="idp360272">
    594595<h3 class="title" style="clear: both">The Parabix Framework</h3>
    595 <p id="idp362384"> The Parabix (parallel bit stream) framework is a transformative approach to XML
     596<p id="idp360944"> The Parabix (parallel bit stream) framework is a transformative approach to XML
    596597            parsing (and other forms of text processing.) The key idea is to exploit the
    597598            availability of wide SIMD registers (e.g., 128-bit) in commodity processors to represent
     
    622623<div class="table-wrapper" id="xml-bytes">
    623624<p class="title">Table II</p>
    624 <div class="caption"><p id="idp376256">XML Source Data</p></div>
     625<div class="caption"><p id="idp375584">XML Source Data</p></div>
    625626<table class="table" xml:id="xml-bytes">
    626627<colgroup span="1">
     
    651652<div class="table-wrapper" id="xml-bits">
    652653<p class="title">Table III</p>
    653 <div class="caption"><p id="idp392528">8-bit ASCII Basis Bit Streams</p></div>
     654<div class="caption"><p id="idp391856">8-bit ASCII Basis Bit Streams</p></div>
    654655<table class="table" xml:id="xml-bits">
    655656<colgroup span="1">
     
    718719</table>
    719720</div>
    720 <p id="idp432656"> Consider, for example, the XML source data stream shown in the first line of <a class="xref" href="#derived">Table IV</a>.
     721<p id="idp431536"> Consider, for example, the XML source data stream shown in the first line of <a class="xref" href="#derived">Table IV</a>.
    721722The remaining lines of this figure show
    722723            several parallel bit streams that are computed in Parabix-style parsing, with each bit
     
    728729            character immediately following the opener (i.e., "<code class="code">/</code>") or
    729730            not. The remaining three lines show streams that can be computed in subsequent parsing
    730             (using the technique of bitstream addition <a class="xref" id="idp435760" href="javascript:showcite('cite-cameron-EuroPar2011','idp435760')">[Parabix2 2011]</a>), namely streams
     731            (using the technique of bitstream addition <a class="xref" id="idp434480" href="javascript:showcite('cite-cameron-EuroPar2011','idp434480')">Parabix2 2011</a>), namely streams
    731732            marking the element names, attribute names and attribute values of tags. </p>
    732733<div class="table-wrapper" id="derived">
    733734<p class="title">Table IV</p>
    734 <div class="caption"><p id="idp437472">XML Source Data and Derived Parallel Bit Streams</p></div>
     735<div class="caption"><p id="idp436032">XML Source Data and Derived Parallel Bit Streams</p></div>
    735736<table class="table" xml:id="derived">
    736737<colgroup span="1">
     
    782783</table>
    783784</div>
    784 <p id="idp450576"> Two intuitions may help explain how the Parabix approach can lead to improved XML
     785<p id="idp448832"> Two intuitions may help explain how the Parabix approach can lead to improved XML
    785786            parsing performance. The first is that the use of the full register width offers a
    786787            considerable information advantage over sequential byte-at-a-time parsing. That is,
     
    791792            individual decision-bits, an approach that computes many of them in parallel (e.g., 128
    792793            bytes at a time using 128-bit registers) should provide substantial benefit. </p>
    793 <p id="idp452688"> Previous studies have shown that the Parabix approach improves many aspects of XML
    794             processing, including transcoding <a class="xref" id="idp453088" href="javascript:showcite('cite-Cameron2008','idp453088')">[u8u16 2008]</a>, character classification and
     794<p id="idp450944"> Previous studies have shown that the Parabix approach improves many aspects of XML
     795            processing, including transcoding <a class="xref" id="idp451344" href="javascript:showcite('cite-Cameron2008','idp451344')">u8u16 2008</a>, character classification and
    795796            validation, tag parsing and well-formedness checking. The first Parabix parser used
    796797            processor bit scan instructions to considerably accelerate sequential scanning loops for
    797             individual characters <a class="xref" id="idp453984" href="javascript:showcite('cite-CameronHerdyLin2008','idp453984')">[Parabix1 2008]</a>. Recent work has incorporated a method
    798             of parallel scanning using bitstream addition <a class="xref" id="idp454800" href="javascript:showcite('cite-cameron-EuroPar2011','idp454800')">[Parabix2 2011]</a>, as well as
     798            individual characters <a class="xref" id="idp452192" href="javascript:showcite('cite-CameronHerdyLin2008','idp452192')">Parabix1 2008</a>. Recent work has incorporated a method
     799            of parallel scanning using bitstream addition <a class="xref" id="idp452960" href="javascript:showcite('cite-cameron-EuroPar2011','idp452960')">Parabix2 2011</a>, as well as
    799800            combining SIMD methods with 4-stage pipeline parallelism to further improve throughput
    800             <a class="xref" id="idp455584" href="javascript:showcite('cite-HPCA2012','idp455584')">[Lin and Medforth 2012]</a>. Although these research prototypes handled the full syntax of
     801            <a class="xref" id="idp453744" href="javascript:showcite('cite-HPCA2012','idp453744')">Lin and Medforth 2012</a>. Although these research prototypes handled the full syntax of
    801802            schema-less XML documents, they lacked the functionality required by full XML parsers. </p>
    802 <p id="idp456528"> Commercial XML processors support transcoding of multiple character sets and can
     803<p id="idp454688"> Commercial XML processors support transcoding of multiple character sets and can
    803804            parse and validate against multiple document vocabularies. Additionally, they provide
    804805            API facilities beyond those found in research prototypes, including the widely used SAX,
    805806            SAX2 and DOM interfaces. </p>
    806807</div>
    807 <div class="section" id="idp457376">
     808<div class="section" id="idp455536">
    808809<h3 class="title" style="clear: both">Sequential vs. Parallel Paradigm</h3>
    809 <p id="idp458064"> Xerces—like all traditional XML parsers—processes XML documents
     810<p id="idp456224"> Xerces—like all traditional XML parsers—processes XML documents
    810811            sequentially. Each character is examined to distinguish between the XML-specific markup,
    811812            such as a left angle bracket <code class="code">&lt;</code>, and the content held within the
    812813            document. As the parser progresses through the document, it alternates between markup
    813814            scanning, validation and content processing modes. </p>
    814 <p id="idp459600"> In other words, Xerces belongs to an equivalence class of applications termed FSM
     815<p id="idp457760"> In other words, Xerces belongs to an equivalence class of applications termed FSM
    815816           applications<sup class="fn-label"><a href="#FSM" class="footnoteref">[1]</a></sup>.<sup class="fn-label"><a href="#FSM" class="footnoteref" id="FSM-ref">[1]</a></sup> Each state transition indicates the processing context of
    816817            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
    817818            character could induce a state transition. </p>
    818 <p id="idp462080"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
     819<p id="idp460240"> Parabix-style XML parsers utilize a concept of layered processing. A block of source
    819820            text is transformed into a set of lexical bitstreams, which undergo a series of
    820821            operations that can be grouped into logical layers, e.g., transposition, character
    821822            classification, and lexical analysis. Each layer is pipeline parallel and require
    822             neither speculation nor pre-parsing stages<a class="xref" id="idp462768" href="javascript:showcite('cite-HPCA2012','idp462768')">[Lin and Medforth 2012]</a>. To meet the API requirements
     823            neither speculation nor pre-parsing stages<a class="xref" id="idp460928" href="javascript:showcite('cite-HPCA2012','idp460928')">Lin and Medforth 2012</a>. To meet the API requirements
    823824            of the document-ordered Xerces output, the results of the Parabix processing layers must
    824825            be interleaved to produce the equivalent behaviour. </p>
     
    827828<div class="section" id="architecture">
    828829<h2 class="title" style="clear: both">Architecture</h2>
    829 <div class="section" id="idp465008">
     830<div class="section" id="idp463168">
    830831<h3 class="title" style="clear: both">Overview</h3>
    831 <p id="idp466064"> icXML is more than an optimized version of Xerces. Many components were grouped,
     832<p id="idp464224"> icXML is more than an optimized version of Xerces. Many components were grouped,
    832833            restructured and rearchitected with pipeline parallelism in mind. In this section, we
    833834            highlight the core differences between the two systems. As shown in Figure
     
    855856<p class="title">Figure 1: Xerces Architecture</p>
    856857<div class="figure-contents">
    857 <div class="mediaobject" id="idp474032"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
     858<div class="mediaobject" id="idp472192"><img alt="png image (xerces.png)" src="xerces.png" width="150cm"></div>
    858859<div class="caption"></div>
    859860</div>
    860861</div>
    861 <p id="idp476352"> In icXML functions are grouped into logical components. As shown in
     862<p id="idp474512"> In icXML functions are grouped into logical components. As shown in
    862863             <a class="xref" href="#xerces-arch" title="Xerces Architecture">Figure 1</a>, two major categories exist: (1) the Parabix Subsystem and (2) the
    863                Markup Processor. All tasks in (1) use the Parabix Framework <a class="xref" id="idp477440" href="javascript:showcite('cite-HPCA2012','idp477440')">[Lin and Medforth 2012]</a>, which
     864               Markup Processor. All tasks in (1) use the Parabix Framework <a class="xref" id="idp475600" href="javascript:showcite('cite-HPCA2012','idp475600')">Lin and Medforth 2012</a>, which
    864865            represents data as a set of parallel bitstreams. The <span class="ital">Character Set
    865               Adapter</span>, discussed in <a class="xref" href="#character-set-adapter" title="Character Set Adapters">section “Character Set Adapters”</a>, mirrors
     866              Adapter</span>, discussed in <a class="xref" href="#character-set-adapter" title="Character Set Adapters">Section “Character Set Adapters”</a>, mirrors
    866867            Xerces's Transcoder duties; however instead of producing UTF-16 it produces a set of
    867               lexical bitstreams, similar to those shown in <a class="xref" id="idp479904" href="javascript:showcite('cite-CameronHerdyLin2008','idp479904')">[Parabix1 2008]</a>. These lexical
     868              lexical bitstreams, similar to those shown in <a class="xref" id="idp478000" href="javascript:showcite('cite-CameronHerdyLin2008','idp478000')">Parabix1 2008</a>. These lexical
    868869            bitstreams are later transformed into UTF-16 in the Content Stream Generator, after
    869870            additional processing is performed. The first precursor to producing UTF-16 is the
     
    876877            icXML must provide the Line and Column position of each error. The <span class="ital">Line-Column Tracker</span> uses the lexical information to keep track of the
    877878            document position(s) through the use of an optimized population count algorithm,
    878               described in <a class="xref" href="#errorhandling" title="Error Handling">section “Error Handling”</a>. From here, two data-independent
     879              described in <a class="xref" href="#errorhandling" title="Error Handling">Section “Error Handling”</a>. From here, two data-independent
    879880            branches exist: the Symbol Resolver and Content Preparation Unit. </p>
    880 <p id="idp483888"> A typical XML file contains few unique element and attribute names—but
     881<p id="idp482048"> A typical XML file contains few unique element and attribute names—but
    881882            each of them will occur frequently. icXML stores these as distinct data structures,
    882883            called symbols, each with their own global identifier (GID). Using the symbol marker
     
    884885               Resolver</span> scans through the raw data to produce a sequence of GIDs, called
    885886            the <span class="ital">symbol stream</span>. </p>
    886 <p id="idp486544"> The final components of the Parabix Subsystem are the <span class="ital">Content
     887<p id="idp484704"> The final components of the Parabix Subsystem are the <span class="ital">Content
    887888               Preparation Unit</span> and <span class="ital">Content Stream
    888889            Generator</span>. The former takes the (transposed) basis bitstreams and selectively
    889890            filters them, according to the information provided by the Parallel Markup Parser, and
    890             the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in <a class="xref" href="#contentstream" title="Content Stream">section “Content Stream”</a>. </p>
    891 <p id="idp490144"> Combined, the symbol and content stream form icXML's compressed IR of the XML
     891            the latter transforms the filtered streams into the tagged UTF-16 <span class="ital">content stream</span>, discussed in <a class="xref" href="#contentstream" title="Content Stream">Section “Content Stream”</a>. </p>
     892<p id="idp488304"> Combined, the symbol and content stream form icXML's compressed IR of the XML
    892893            document. The <span class="ital">Markup Processor</span>
    893894            parses the IR to
     
    897898            that produces a series of URI identifiers (URI IDs), the <span class="ital">URI
    898899               stream</span>, which are associated with each symbol occurrence. This is
    899                  discussed in <a class="xref" href="#namespace-handling" title="Namespace Handling">section “Namespace Handling”</a>. Finally, the <span class="ital">Validation</span> layer implements the Xerces's validator. However,
     900                 discussed in <a class="xref" href="#namespace-handling" title="Namespace Handling">Section “Namespace Handling”</a>. Finally, the <span class="ital">Validation</span> layer implements the Xerces's validator. However,
    900901            preprocessing associated with each symbol greatly reduces the work of this stage. </p>
    901902<div class="figure" id="icxml-arch">
    902903<p class="title">Figure 2: icXML Architecture</p>
    903904<div class="figure-contents">
    904 <div class="mediaobject" id="idp496608"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
     905<div class="mediaobject" id="idp494768"><img alt="png image (icxml.png)" src="icxml.png" width="500cm"></div>
    905906<div class="caption"></div>
    906907</div>
     
    909910<div class="section" id="character-set-adapter">
    910911<h3 class="title" style="clear: both">Character Set Adapters</h3>
    911 <p id="idp500096"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
     912<p id="idp498256"> In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of
    912913            Xerces itself and provide the end-consumer with a single encoding format. In the
    913914            important case of UTF-8 to UTF-16 transcoding, the transcoding costs can be significant,
     
    916917            other cases, transcoding may involve table look-up operations for each byte of input. In
    917918            any case, transcoding imposes at least a cost of buffer copying. </p>
    918 <p id="idp501152"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
     919<p id="idp499312"> In icXML, however, the concept of Character Set Adapters (CSAs) is used to minimize
    919920            transcoding costs. Given a specified input encoding, a CSA is responsible for checking
    920921            that input code units represent valid characters, mapping the characters of the encoding
     
    922923            item streams), as well as supporting ultimate transcoding requirements. All of this work
    923924            is performed using the parallel bitstream representation of the source input. </p>
    924 <p id="idp40944"> An important observation is that many character sets are an extension to the legacy
     925<p id="idp37488"> An important observation is that many character sets are an extension to the legacy
    925926            7-bit ASCII character set. This includes the various ISO Latin character sets, UTF-8,
    926927            UTF-16 and many others. Furthermore, all significant characters for parsing XML are
    927928            confined to the ASCII repertoire. Thus, a single common set of lexical item calculations
    928929            serves to compute lexical item streams for all such ASCII-based character sets. </p>
    929 <p id="idp41824"> A second observation is that—regardless of which character set is
     930<p id="idp38368"> A second observation is that—regardless of which character set is
    930931            used—quite often all of the characters in a particular block of input will be
    931932            within the ASCII range. This is a very simple test to perform using the bitstream
     
    934935            be skipped. Transcoding to UTF-16 becomes trivial as the high eight bitstreams of the
    935936            UTF-16 form are each set to zero in this case. </p>
    936 <p id="idp43744"> A third observation is that repeated transcoding of the names of XML elements,
     937<p id="idp40288"> A third observation is that repeated transcoding of the names of XML elements,
    937938            attributes and so on can be avoided by using a look-up mechanism. That is, the first
    938939            occurrence of each symbol is stored in a look-up table mapping the input encoding to a
     
    941942            symbol look up is required to apply various XML validation rules, there is achieves the
    942943            effect of transcoding each occurrence without additional cost. </p>
    943 <p id="idp44800"> The cost of individual character transcoding is avoided whenever a block of input is
     944<p id="idp41344"> The cost of individual character transcoding is avoided whenever a block of input is
    944945            confined to the ASCII subset and for all but the first occurrence of any XML element or
    945946            attribute name. Furthermore, when transcoding is required, the parallel bitstream
    946947            representation supports efficient transcoding operations. In the important case of UTF-8
    947948            to UTF-16 transcoding, the corresponding UTF-16 bitstreams can be calculated in bit
    948               parallel fashion based on UTF-8 streams <a class="xref" id="idp45600" href="javascript:showcite('cite-Cameron2008','idp45600')">[u8u16 2008]</a>, and all but the final bytes
     949              parallel fashion based on UTF-8 streams <a class="xref" id="idp42144" href="javascript:showcite('cite-Cameron2008','idp42144')">u8u16 2008</a>, and all but the final bytes
    949950            of multi-byte sequences can be marked for deletion as discussed in the following
    950951            subsection. In other cases, transcoding within a block only need be applied for
     
    954955<div class="section" id="par-filter">
    955956<h3 class="title" style="clear: both">Combined Parallel Filtering</h3>
    956 <p id="idp47952"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
     957<p id="idp44496"> As just mentioned, UTF-8 to UTF-16 transcoding involves marking all but the last
    957958            bytes of multi-byte UTF-8 sequences as positions for deletion. For example, the two
    958959            Chinese characters <code class="code">䜠奜</code> are represented as two
     
    967968            input bytes is the bit sequence <code class="code">110110</code>. Using this approach, transcoding
    968969            may then be completed by applying parallel deletion and inverse transposition of the
    969             UTF-16 bitstreams<a class="xref" id="idp521312" href="javascript:showcite('cite-Cameron2008','idp521312')">[u8u16 2008]</a>. </p>
    970 <p id="idp522112"> Rather than immediately paying the costs of deletion and transposition just for
     970            UTF-16 bitstreams<a class="xref" id="idp519472" href="javascript:showcite('cite-Cameron2008','idp519472')">u8u16 2008</a>. </p>
     971<p id="idp520272"> Rather than immediately paying the costs of deletion and transposition just for
    971972            transcoding, however, icXML defers these steps so that the deletion masks for several
    972973            stages of processing may be combined. In particular, this includes core XML requirements
     
    983984<div class="figure-contents">
    984985<div class="caption">Line Break Normalization Logic</div>
    985 <pre class="programlisting" id="idp525120">
     986<pre class="programlisting" id="idp523376">
    986987# XML 1.0 line-break normalization rules.
    987988if lex.CR:
     
    9991000</div>
    10001001         </p>
    1001 <p id="idp526592"> In essence, the deletion masks for transcoding and for line break normalization each
     1002<p id="idp524720"> In essence, the deletion masks for transcoding and for line break normalization each
    10021003            represent a bitwise filter; these filters can be combined using bitwise-or so that the
    10031004            parallel deletion algorithm need only be applied once. </p>
    1004 <p id="idp527248"> A further application of combined filtering is the processing of XML character and
     1005<p id="idp525376"> A further application of combined filtering is the processing of XML character and
    10051006           entity references. Consider, for example, the references <code class="code">&amp;amp;</code> or
    10061007             <code class="code">&amp;#x3C;</code> which must be replaced in XML processing with the single
     
    10151016            UTF-16 code unit. In the case, that this is not true, it is addressed in
    10161017            post-processing. </p>
    1017 <p id="idp532096"> The final step of combined filtering occurs during the process of reducing markup
     1018<p id="idp530288"> The final step of combined filtering occurs during the process of reducing markup
    10181019            data to tag bytes preceding each significant XML transition as described in
    1019               <a class="xref" href="#contentstream" title="Content Stream">section “Content Stream”</a>. Overall, icXML avoids separate buffer copying
     1020              <a class="xref" href="#contentstream" title="Content Stream">Section “Content Stream”</a>. Overall, icXML avoids separate buffer copying
    10201021            operations for each of the these filtering steps, paying the cost of parallel deletion
    10211022            and inverse transposition only once. Currently, icXML employs the parallel-prefix
    1022             compress algorithm of Steele <a class="xref" id="idp533408" href="javascript:showcite('cite-HackersDelight','idp533408')">[Warren 2002]</a>. Performance is independent of the
     1023            compress algorithm of Steele <a class="xref" id="idp531600" href="javascript:showcite('cite-HackersDelight','idp531600')">Warren 2002</a>. Performance is independent of the
    10231024            number of positions deleted. Future versions of icXML are expected to take advantage of
    1024             the parallel extract operation <a class="xref" id="idp534304" href="javascript:showcite('cite-HilewitzLee2006','idp534304')">[Hilewitz and Lee 2006]</a> that Intel is now providing in its
     1025            the parallel extract operation <a class="xref" id="idp532544" href="javascript:showcite('cite-HilewitzLee2006','idp532544')">Hilewitz and Lee 2006</a> that Intel is now providing in its
    10251026            Haswell architecture. </p>
    10261027</div>
    10271028<div class="section" id="contentstream">
    10281029<h3 class="title" style="clear: both">Content Stream</h3>
    1029 <p id="idp536352"> A relatively-unique concept for icXML is the use of a filtered content stream.
     1030<p id="idp534592"> A relatively-unique concept for icXML is the use of a filtered content stream.
    10301031            Rather that parsing an XML document in its original format, the input is transformed
    10311032            into one that is easier for the parser to iterate through and produce the sequential
     
    10331034             <code class="code"> &lt;document&gt;fee&lt;element a1='fie' a2 = 'foe'&gt;&lt;/element&gt;fum&lt;/document&gt;</code>
    10341035             is transformed into <code class="code"><span class="ital">0</span>fee<span class="ital">0</span>=fie<span class="ital">0</span>=foe<span class="ital">0</span>&gt;<span class="ital">0</span>/fum<span class="ital">0</span>/</code>
    1035             through the parallel filtering algorithm, described in <a class="xref" href="#par-filter" title="Combined Parallel Filtering">section “Combined Parallel Filtering”</a>. </p>
     1036            through the parallel filtering algorithm, described in <a class="xref" href="#par-filter" title="Combined Parallel Filtering">Section “Combined Parallel Filtering”</a>. </p>
    10361037<div class="table-wrapper" id="fig-parabix2">
    10371038<p class="title">Table V</p>
     
    10721073</table>
    10731074</div>
    1074 <p id="idp557712"> Combined with the symbol stream, the parser traverses the content stream to
     1075<p id="idp555904"> Combined with the symbol stream, the parser traverses the content stream to
    10751076            effectively reconstructs the input document in its output form. The initial <span class="ital">0</span> indicates an empty content string. The following
    10761077               <code class="code">&gt;</code> indicates that a start tag without any attributes is the first
     
    10841085            null character in the content stream in parallel, which in turn means the parser can
    10851086            directly jump to the end of every string without scanning for it. </p>
    1086 <p id="idp561792"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
     1087<p id="idp559984"> Following <code class="code">'fee'</code> is a <code class="code">=</code>, which marks the
    10871088            existence of an attribute. Because all of the intra-element was performed in the Parabix
    10881089            Subsystem, this must be a legal attribute. Since attributes can only occur within start
     
    11001101<div class="section" id="namespace-handling">
    11011102<h3 class="title" style="clear: both">Namespace Handling</h3>
    1102 <p id="idp567360"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
     1103<p id="idp565552"> In XML, namespaces prevents naming conflicts when multiple vocabularies are used
    11031104            together. It is especially important when a vocabulary application-dependant meaning,
    11041105            such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound
     
    11191120<div class="table-wrapper" id="namespace-ex">
    11201121<p class="title">Table VI</p>
    1121 <div class="caption"><p id="idp576096">XML Namespace Example</p></div>
     1122<div class="caption"><p id="idp574288">XML Namespace Example</p></div>
    11221123<table class="table" xml:id="namespace-ex">
    11231124<colgroup span="1">
     
    11531154</table>
    11541155</div>
    1155 <p id="idp585136"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
     1156<p id="idp583232"> In both Xerces and icXML, every URI has a one-to-one mapping to a URI ID. These
    11561157            persist for the lifetime of the application through the use of a global URI pool. Xerces
    11571158            maintains a stack of namespace scopes that is pushed (popped) every time a start tag
     
    11611162            those that declare a set of namespaces upfront and never change them, and (2) those that
    11621163            repeatedly modify the namespaces in predictable patterns. </p>
    1163 <p id="idp586272"> For that reason, icXML contains an independent namespace stack and utilizes bit
     1164<p id="idp584368"> For that reason, icXML contains an independent namespace stack and utilizes bit
    11641165            vectors to cheaply perform
    11651166             When a prefix is
     
    11771178<div class="table-wrapper" id="namespace-binding">
    11781179<p class="title">Table VII</p>
    1179 <div class="caption"><p id="idp592944">Namespace Binding Table Example</p></div>
     1180<div class="caption"><p id="idp590896">Namespace Binding Table Example</p></div>
    11801181<table class="table" xml:id="namespace-binding">
    11811182<colgroup span="1">
     
    12181219</table>
    12191220</div>
    1220 <p id="idp609264">
     1221<p id="idp607456">
    12211222           
    12221223           
     
    12241225           
    12251226         </p>
    1226 <p id="idp611168"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
     1227<p id="idp609360"> To ensure that scoping rules are adhered to, whenever a start tag is encountered,
    12271228            any modification to the currently visible namespaces is calculated and stored within a
    12281229            stack of bit vectors denoting the locally modified namespace bindings. When an end tag
     
    12351236<div class="section" id="errorhandling">
    12361237<h3 class="title" style="clear: both">Error Handling</h3>
    1237 <p id="idp613600">
     1238<p id="idp611792">
    12381239           
    12391240            Xerces outputs error messages in two ways: through the programmer API and as thrown
     
    12441245            <a class="xref" href="#icxml-arch" title="icXML Architecture">Figure 2</a>, icXML is divided into two sections: the Parabix Subsystem and
    12451246            Markup Processor, each with its own system for detecting and producing error messages. </p>
    1246 <p id="idp616160"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
     1247<p id="idp501360"> Within the Parabix Subsystem, all computations are performed in parallel, a block at
    12471248            a time. Errors are derived as artifacts of bitstream calculations, with a 1-bit marking
    12481249            the byte-position of an error within a block, and the type of error is determined by the
     
    12771278            detected, the sum of those skipped positions is subtracted from the distance to
    12781279            determine the actual column number. </p>
    1279 <p id="idp621680"> The Markup Processor is a state-driven machine. As such, error detection within it
     1280<p id="idp507088"> The Markup Processor is a state-driven machine. As such, error detection within it
    12801281            is very similar to Xerces. However, reporting the correct line/column is a much more
    12811282            difficult problem. The Markup Processor parses the content stream, which is a series of
     
    12931294<div class="section" id="multithread">
    12941295<h2 class="title" style="clear: both">Multithreading with Pipeline Parallelism</h2>
    1295 <p id="idp625216"> As discussed in section <a class="xref" href="#background-xerces" title="Xerces C++ Structure">section “Xerces C++ Structure”</a>, Xerces can be considered a FSM
     1296<p id="idp511376"> As discussed in section <a class="xref" href="#background-xerces" title="Xerces C++ Structure">Section “Xerces C++ Structure”</a>, Xerces can be considered a FSM
    12961297         application. These are "embarrassingly
    1297          sequential."<a class="xref" id="idp626368" href="javascript:showcite('cite-Asanovic-EECS-2006-183','idp626368')">[Asanovic et al. 2006]</a> and notoriously difficult to
     1298         sequential."<a class="xref" id="idp512480" href="javascript:showcite('cite-Asanovic-EECS-2006-183','idp512480')">Asanovic et al. 2006</a> and notoriously difficult to
    12981299         parallelize. However, icXML is designed to organize processing into logical layers. In
    12991300         particular, layers within the Parabix Subsystem are designed to operate over significant
     
    13011302         well into the general model of pipeline parallelism, in which each thread is in charge of a
    13021303         single module or group of modules. </p>
    1303 <p id="idp627680"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
     1304<p id="idp513840"> The most straightforward division of work in icXML is to separate the Parabix Subsystem
    13041305         and the Markup Processor into distinct logical layers into two separate stages. The
    13051306         resultant application, <span class="ital">icXML-p</span>, is a course-grained
     
    13221323            <code class="code">T<sub>2</sub></code> to finish reading the shared data before it can
    13231324         reuse the memory space. </p>
    1324 <p id="idp638752">
     1325<p id="idp654144">
    13251326        <div class="figure" id="threads_timeline1">
    13261327<p class="title">Figure 4: Thread Balance in Two-Stage Pipelines: Stage 1 Dominant</p>
    1327 <div class="figure-contents"><div class="mediaobject" id="idp640080"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div></div>
     1328<div class="figure-contents"><div class="mediaobject" id="idp655472"><img alt="png image (threads_timeline1.png)" src="threads_timeline1.png" width="500cm"></div></div>
    13281329</div>
    13291330        <div class="figure" id="threads_timeline2">
    13301331<p class="title">Figure 5: Thread Balance in Two-Stage Pipelines: Stage 2 Dominant</p>
    1331 <div class="figure-contents"><div class="mediaobject" id="idp643088"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div></div>
     1332<div class="figure-contents"><div class="mediaobject" id="idp658480"><img alt="png image (threads_timeline2.png)" src="threads_timeline2.png" width="500cm"></div></div>
    13321333</div>
    13331334      </p>
    1334 <p id="idp645120"> Overall, our design is intended to benefit a range of applications. Conceptually, we
     1335<p id="idp660512"> Overall, our design is intended to benefit a range of applications. Conceptually, we
    13351336         consider two design points. The first, the parsing performed by the Parabix Subsystem
    13361337         dominates at 67% of the overall cost, with the cost of application processing (including
     
    13381339         scenario, the cost of application processing dominates at 60%, while the cost of XML
    13391340         parsing represents an overhead of 40%. </p>
    1340 <p id="idp646032"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
     1341<p id="idp661424"> Our design is predicated on a goal of using the Parabix framework to achieve a 50% to
    13411342         100% improvement in the parsing engine itself. In a best case scenario, a 100% improvement
    13421343         of the Parabix Subsystem for the design point in which XML parsing dominates at 67% of the
     
    13461347         about 33% of the original work. In this case, Amdahl's law predicts that we could expect up
    13471348         to a 3x speedup at best. </p>
    1348 <p id="idp647152"> At the other extreme of our design range, we consider an application in which core
     1349<p id="idp662544"> At the other extreme of our design range, we consider an application in which core
    13491350         parsing cost is 40%. Assuming the 2x speedup of the Parabix Subsystem over the
    13501351         corresponding Xerces core, single-threaded icXML delivers a 25% speedup. However, the most
     
    13521353         the entire latency of parsing within the serial time required by the application. In this
    13531354         case, we achieve an overall speedup in processing time by 1.67x. </p>
    1354 <p id="idp648096"> Although the structure of the Parabix Subsystem allows division of the work into
     1355<p id="idp663488"> Although the structure of the Parabix Subsystem allows division of the work into
    13551356         several pipeline stages and has been demonstrated to be effective for four pipeline stages
    1356          in a research prototype <a class="xref" id="idp648576" href="javascript:showcite('cite-HPCA2012','idp648576')">[Lin and Medforth 2012]</a>, our analysis here suggests that the further
     1357         in a research prototype <a class="xref" id="idp663968" href="javascript:showcite('cite-HPCA2012','idp663968')">Lin and Medforth 2012</a>, our analysis here suggests that the further
    13571358         pipelining of work within the Parabix Subsystem is not worthwhile if the cost of
    13581359         application logic is little as 33% of the end-to-end cost using Xerces. To achieve benefits
     
    13621363<div class="section" id="performance">
    13631364<h2 class="title" style="clear: both">Performance</h2>
    1364 <p id="idp650960"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
     1365<p id="idp666352"> We evaluate Xerces-C++ 3.1.1, icXML, icXML-p against two benchmarking applications: the
    13651366         Xerces C++ SAXCount sample application, and a real world GML to SVG transformation
    13661367         application. We investigated XML parser performance using an Intel Core i7 quad-core (Sandy
     
    13681369         L1 cache, 256 kB (per core) L2 cache, 8 MB L3 cache) running the 64-bit version of Ubuntu
    13691370         12.04 (Linux). </p>
    1370 <p id="idp651872"> We analyzed the execution profiles of each XML parser using the performance counters
     1371<p id="idp667264"> We analyzed the execution profiles of each XML parser using the performance counters
    13711372         found in the processor. We chose several key hardware events that provide insight into the
    13721373         profile of each application and indicate if the processor is doing useful work. The set of
    13731374         events included in our study are: processor cycles, branch instructions, branch
    13741375         mispredictions, and cache misses. The Performance Application Programming Interface (PAPI)
    1375          Version 5.5.0 <a class="xref" id="idp652640" href="javascript:showcite('cite-papi','idp652640')">[PAPI]</a> toolkit was installed on the test system to facilitate the
     1376         Version 5.5.0 <a class="xref" id="idp668032" href="javascript:showcite('cite-papi','idp668032')">PAPI</a> toolkit was installed on the test system to facilitate the
    13761377         collection of hardware performance monitoring statistics. In addition, we used the Linux
    1377          perf <a class="xref" id="idp653568" href="javascript:showcite('cite-perf','idp653568')">[perf]</a> utility to collect per core hardware events. </p>
    1378 <div class="section" id="idp654464">
     1378         perf <a class="xref" id="idp668960" href="javascript:showcite('cite-perf','idp668960')">perf</a> utility to collect per core hardware events. </p>
     1379<div class="section" id="idp669920">
    13791380<h3 class="title" style="clear: both">Xerces C++ SAXCount</h3>
    1380 <p id="idp655104"> Xerces comes with sample applications that demonstrate salient features of the
     1381<p id="idp670560"> Xerces comes with sample applications that demonstrate salient features of the
    13811382            parser. SAXCount is the simplest such application: it counts the elements, attributes
    13821383            and characters of a given XML file using the (event based) SAX API and prints out the
    13831384            totals. </p>
    1384 <p id="idp655808"> <a class="xref" href="#XMLdocs">Table VIII</a> shows the document characteristics of the XML input files
     1385<p id="idp671264"> <a class="xref" href="#XMLdocs">Table VIII</a> shows the document characteristics of the XML input files
    13851386            selected for the Xerces C++ SAXCount benchmark. The jaw.xml represents document-oriented
    13861387            XML inputs and contains the three-byte and four-byte UTF-8 sequence required for the
     
    13891390  <div class="table-wrapper" id="XMLdocs">
    13901391<p class="title">Table VIII</p>
    1391 <div class="caption"><p id="idp658256">XML Document Characteristics</p></div>
     1392<div class="caption"><p id="idp673648">XML Document Characteristics</p></div>
    13921393<table class="table" xml:id="XMLdocs">
    13931394<colgroup span="1">
     
    14381439</div>           
    14391440</p>
    1440 <p id="idp673856"> A key predictor of the overall parsing performance of an XML file is markup
    1441            density<sup class="fn-label"><a href="#idp674224" class="footnoteref" id="idp674224-ref">[2]</a></sup>. This metric has substantial influence on the
     1441<p id="idp689296"> A key predictor of the overall parsing performance of an XML file is markup
     1442           density<sup class="fn-label"><a href="#idp689664" class="footnoteref" id="idp689664-ref">[2]</a></sup>. This metric has substantial influence on the
    14421443            performance of traditional recursive descent XML parsers because it directly corresponds
    14431444            to the number of state transitions that occur when parsing a document. We use a mixture
    14441445            of document-oriented and data-oriented XML files to analyze performance over a spectrum
    14451446            of markup densities. </p>
    1446 <p id="idp675392"> <a class="xref" href="#perf_SAX" title="SAXCount Performance Comparison">Figure 6</a> compares the performance of Xerces, icXML and pipelined icXML
     1447<p id="idp690832"> <a class="xref" href="#perf_SAX" title="SAXCount Performance Comparison">Figure 6</a> compares the performance of Xerces, icXML and pipelined icXML
    14471448            in terms of CPU cycles per byte for the SAXCount application. The speedup for icXML over
    14481449            Xerces is 1.3x to 1.8x. With two threads on the multicore machine, icXML-p can achieve
     
    14511452            icXML-p performs better as markup-density increases because the work performed by each
    14521453            stage is well balanced in this application. </p>
    1453 <p id="idp677136">
     1454<p id="idp692688">
    14541455        <div class="figure" id="perf_SAX">
    14551456<p class="title">Figure 6: SAXCount Performance Comparison</p>
    14561457<div class="figure-contents">
    1457 <div class="mediaobject" id="idp678448"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
     1458<div class="mediaobject" id="idp694000"><img alt="png image (perf_SAX.png)" src="perf_SAX.png" width="500cm"></div>
    14581459<div class="caption"></div>
    14591460</div>
     
    14611462         </p>
    14621463</div>
    1463 <div class="section" id="idp680992">
     1464<div class="section" id="idp696544">
    14641465<h3 class="title" style="clear: both">GML2SVG</h3>
    1465 <p id="idp681664">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
     1466<p id="idp697216">       As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
    14661467was chosen.   This application transforms geospatially encoded data represented using
    1467 an XML representation in the form of Geography Markup Language (GML) <a class="xref" id="idp682192" href="javascript:showcite('cite-lake2004geography','idp682192')">[Lake and Burggraf 2004]</a>
     1468an XML representation in the form of Geography Markup Language (GML) <a class="xref" id="idp697744" href="javascript:showcite('cite-lake2004geography','idp697744')">Lake and Burggraf 2004</a>
    14681469into a different XML format  suitable for displayable maps:
    1469 Scalable Vector Graphics (SVG) format <a class="xref" id="idp683088" href="javascript:showcite('cite-lu2007advances','idp683088')">[Lu and Dos Santos 2007]</a>. In the GML2SVG benchmark, GML feature elements
     1470Scalable Vector Graphics (SVG) format <a class="xref" id="idp698592" href="javascript:showcite('cite-lu2007advances','idp698592')">Lu and Dos Santos 2007</a>. In the GML2SVG benchmark, GML feature elements
    14701471and GML geometry elements tags are matched. GML coordinate data are then extracted
    14711472and transformed to the corresponding SVG path data encodings.
     
    14751476a known XML format for the purpose of analysis and restructuring to meet
    14761477the requirements of an alternative format.</p>
    1477 <p id="idp684464">Our GML to SVG data translations are executed on GML source data
     1478<p id="idp700080">Our GML to SVG data translations are executed on GML source data
    14781479modelling the city of Vancouver, British Columbia, Canada.
    14791480The GML source document set
     
    14851486<p class="title">Figure 7: Performance Comparison for GML2SVG</p>
    14861487<div class="figure-contents">
    1487 <div class="mediaobject" id="idp686464"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
     1488<div class="mediaobject" id="idp702128"><img alt="png image (Throughput.png)" src="Throughput.png" width="500cm"></div>
    14881489<div class="caption"></div>
    14891490</div>
    14901491</div>
    1491 <p id="idp688752"><a class="xref" href="#perf_GML2SVG" title="Performance Comparison for GML2SVG">Figure 7</a> compares the performance of the GML2SVG application linked against
     1492<p id="idp704416"><a class="xref" href="#perf_GML2SVG" title="Performance Comparison for GML2SVG">Figure 7</a> compares the performance of the GML2SVG application linked against
    14921493the Xerces, icXML and icXML-p.   
    14931494On the GML workload with this application, single-thread icXML
     
    14961497Using icXML-p, a further throughput increase to 111 MB/sec was recorded,
    14971498approximately a 2X speedup.</p>
    1498 <p id="idp690160">An important aspect of icXML is the replacement of much branch-laden
     1499<p id="idp705824">An important aspect of icXML is the replacement of much branch-laden
    14991500sequential code inside Xerces with straight-line SIMD code using far
    15001501fewer branches.  <a class="xref" href="#branchmiss_GML2SVG" title="Comparative Branch Misprediction Rate">Figure 8</a> shows the corresponding
     
    15071508<p class="title">Figure 8: Comparative Branch Misprediction Rate</p>
    15081509<div class="figure-contents">
    1509 <div class="mediaobject" id="idp692896"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
     1510<div class="mediaobject" id="idp708560"><img alt="png image (BM.png)" src="BM.png" width="500cm"></div>
    15101511<div class="caption"></div>
    15111512</div>
    15121513</div>
    1513 <p id="idp695184">The behaviour of the three versions with respect to L1 cache misses per kB is shown
     1514<p id="idp710848">The behaviour of the three versions with respect to L1 cache misses per kB is shown
    15141515in <a class="xref" href="#cachemiss_GML2SVG" title="Comparative Cache Miss Rate">Figure 9</a>.   Improvements are shown in both instruction-
    15151516and data-cache performance with the improvements in instruction-cache
     
    15231524<p class="title">Figure 9: Comparative Cache Miss Rate</p>
    15241525<div class="figure-contents">
    1525 <div class="mediaobject" id="idp697984"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
     1526<div class="mediaobject" id="idp713648"><img alt="png image (CM.png)" src="CM.png" width="500cm"></div>
    15261527<div class="caption"></div>
    15271528</div>
    15281529</div>
    1529 <p id="idp700272">One caveat with this study is that the GML2SVG application did not exhibit
     1530<p id="idp715936">One caveat with this study is that the GML2SVG application did not exhibit
    15301531a relative balance of processing between application code and Xerces library
    15311532code reaching the 33% figure.  This suggests that for this application and
     
    15371538<div class="section" id="conclusion">
    15381539<h2 class="title" style="clear: both">Conclusion and Future Work</h2>
    1539 <p id="idp702432"> This paper is the first case study documenting the significant performance benefits
     1540<p id="idp718496"> This paper is the first case study documenting the significant performance benefits
    15401541         that may be realized through the integration of parallel bitstream technology into existing
    15411542         widely-used software libraries. In the case of the Xerces-C++ XML parser, the combined
     
    15471548         technologies, this is an important case study demonstrating the general feasibility of
    15481549         these techniques. </p>
    1549 <p id="idp703712"> The further development of icXML to move beyond 2-stage pipeline parallelism is
     1550<p id="idp719776"> The further development of icXML to move beyond 2-stage pipeline parallelism is
    15501551         ongoing, with realistic prospects for four reasonably balanced stages within the library.
    15511552         For applications such as GML2SVG which are dominated by time spent on XML parsing, such a
    15521553         multistage pipelined parsing library should offer substantial benefits. </p>
    1553 <p id="idp704480"> The example of XML parsing may be considered prototypical of finite-state machines
     1554<p id="idp720544"> The example of XML parsing may be considered prototypical of finite-state machines
    15541555         applications which have sometimes been considered "embarassingly
    15551556         sequential" and so difficult to parallelize that "nothing
     
    15571558         point in making the case that parallelization can indeed be helpful across a broad array of
    15581559         application types. </p>
    1559 <p id="idp705856"> To overcome the software engineering challenges in applying parallel bitstream
     1560<p id="idp721920"> To overcome the software engineering challenges in applying parallel bitstream
    15601561         technology to existing software systems, it is clear that better library and tool support
    15611562         is needed. The techniques used in the implementation of icXML and documented in this paper
     
    15641565      </p>
    15651566</div>
    1566 <div class="bibliography" id="idp706864">
     1567<div class="bibliography" id="idp722928">
    15671568<h2 class="title" style="clear:both">Bibliography</h2>
    1568 <p class="bibliomixed" id="CameronHerdyLin2008"><a href="#idp453984">[[Parabix1 2008]] </a>Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. 2008 New York, NY, USA</p>
    1569 <p class="bibliomixed" id="papi"><a href="#idp652640">[[PAPI]] </a>Innovative Computing Laboratory, University of Texas. Performance Application Programming Interface.<a href="http://icl.cs.utk.edu/papi/" class="link" target="_new">http://icl.cs.utk.edu/papi/</a></p>
    1570 <p class="bibliomixed" id="perf"><a href="#idp653568">[[perf]] </a>Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf.<a href="https://perf.wiki.kernel.org/index.php/Tutorial" class="link" target="_new">https://perf.wiki.kernel.org/index.php/Tutorial</a></p>
    1571 <p class="bibliomixed" id="Cameron2008"><a href="#idp453088">[[u8u16 2008]] </a>Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2008 New York, NY, USA</p>
    1572 <p class="bibliomixed" id="ParaDOM2009"><a href="#idp303712">[[Shah and Rao 2009]] </a>Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009</p>
    1573 <p class="bibliomixed" id="XMLSSE42"><a href="#idp306864">[[Lei 2008]] </a>Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). 2008<a href="Intel%20Software%20Network" class="link" target="_new">Intel Software Network</a></p>
    1574 <p class="bibliomixed" id="Cameron2009"><a href="#idp307776">[[Balisage 2009]] </a>Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. 2009</p>
    1575 <p class="bibliomixed" id="HilewitzLee2006"><a href="#idp534304">[[Hilewitz and Lee 2006]] </a>Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. 2006 Washington, DC, USA</p>
    1576 <p class="bibliomixed" id="Asanovic-EECS-2006-183"><a href="#idp626368">[[Asanovic et al. 2006]] </a>Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. 2006</p>
    1577 <p class="bibliomixed" id="GRID2006"><a href="#idp288288">[[Lu and Chiu 2006]] </a>Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. 2006 Washington, DC, USA</p>
    1578 <p class="bibliomixed" id="cameron-EuroPar2011"><a href="#idp308528">[[Parabix2 2011]] </a>Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II. 2011 Berlin, Heidelberg</p>
    1579 <p class="bibliomixed" id="HPCA2012"><a href="#idp305920">[[Lin and Medforth 2012]] </a>Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. 2012 Los Alamitos, CA, USA</p>
    1580 <p class="bibliomixed" id="HPCC2011"><a href="#idp302880">[[You and Wang 2011]] </a>You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. 2011 Los Alamitos, CA, USA</p>
    1581 <p class="bibliomixed" id="E-SCIENCE2007"><a href="#idp301312">[[Pan and Zhang 2007]] </a>Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing. 2007 Los Alamitos, CA, USA</p>
    1582 <p class="bibliomixed" id="ICWS2008"><a href="#idp304576">[[Pan and Zhang 2008a]] </a>Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. 2008 Los Alamitos, CA, USA</p>
    1583 <p class="bibliomixed" id="IPDPS2008"><a href="#idp302064">[[Pan and Zhang 2008b]] </a>Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. 2008 Los Alamitos, CA, USA</p>
    1584 <p class="bibliomixed" id="HackersDelight"><a href="#idp533408">[[Warren 2002]] </a>Warren, Henry S.. Hacker's Delight. 2002</p>
    1585 <p class="bibliomixed" id="lu2007advances"><a href="#idp683088">[[Lu and Dos Santos 2007]] </a>Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. 2007</p>
    1586 <p class="bibliomixed" id="lake2004geography"><a href="#idp682192">[[Lake and Burggraf 2004]] </a>Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. 2004</p>
     1569<p class="bibliomixed" id="CameronHerdyLin2008"><a href="#idp452192">[Parabix1 2008] </a>Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. Richmond Hill, Ontario, Canada. 2008.</p>
     1570<p class="bibliomixed" id="papi"><a href="#idp668032">[PAPI] </a>Innovative Computing Laboratory, University of Texas. Performance Application Programming Interface.<a href="http://icl.cs.utk.edu/papi/" class="link" target="_new">http://icl.cs.utk.edu/papi/</a></p>
     1571<p class="bibliomixed" id="perf"><a href="#idp668960">[perf] </a>Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf. <a href="https://perf.wiki.kernel.org/index.php/Tutorial" class="link" target="_new">https://perf.wiki.kernel.org/index.php/Tutorial</a></p>
     1572<p class="bibliomixed" id="Cameron2008"><a href="#idp451344">[u8u16 2008] </a>Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Salt Lake City, USA. 2008.</p>
     1573<p class="bibliomixed" id="ParaDOM2009"><a href="#idp301776">[Shah and Rao 2009] </a>Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009.</p>
     1574<p class="bibliomixed" id="XMLSSE42"><a href="#idp304928">[Lei 2008] </a>Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). <a href="Intel%20Software%20Network" class="link" target="_new">Intel Software Network</a>.  2008.</p>
     1575<p class="bibliomixed" id="Cameron2009"><a href="#idp305840">[Balisage 2009] </a>Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Montreal, Quebec, Canada.  2009.</p>
     1576<p class="bibliomixed" id="HilewitzLee2006"><a href="#idp532544">[Hilewitz and Lee 2006] </a>Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. Steamboat Springs, Colorado, USA.  2006.</p>
     1577<p class="bibliomixed" id="Asanovic-EECS-2006-183"><a href="#idp512480">[Asanovic et al. 2006] </a>Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department, University of California, Berkeley.  2006.</p>
     1578<p class="bibliomixed" id="GRID2006"><a href="#idp286400">[Lu and Chiu 2006] </a>Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. Barcelona, Spain.  2006.</p>
     1579<p class="bibliomixed" id="cameron-EuroPar2011"><a href="#idp306592">[Parabix2 2011] </a>Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II.  Bordeaux, Frane. 2011.</p>
     1580<p class="bibliomixed" id="HPCA2012"><a href="#idp303984">[Lin and Medforth 2012] </a>Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. New Orleans, LA. 2012.</p>
     1581<p class="bibliomixed" id="HPCC2011"><a href="#idp300944">[You and Wang 2011] </a>You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. Banff, Alberta, Canada. 2011.</p>
     1582<p class="bibliomixed" id="E-SCIENCE2007"><a href="#idp299376">[Pan and Zhang 2007] </a>Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing.   Bangalore, India.  2007.</p>
     1583<p class="bibliomixed" id="ICWS2008"><a href="#idp302640">[Pan and Zhang 2008a] </a>Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. Beijing, China.  2008.</p>
     1584<p class="bibliomixed" id="IPDPS2008"><a href="#idp300128">[Pan and Zhang 2008b] </a>Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. Miami, Florida, USA.  2008.</p>
     1585<p class="bibliomixed" id="HackersDelight"><a href="#idp531600">[Warren 2002] </a>Warren, Henry S.. Hacker's Delight. Addison-Wesley Professional. 2003.</p>
     1586<p class="bibliomixed" id="lu2007advances"><a href="#idp698592">[Lu and Dos Santos 2007] </a>Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. Geoinformatica 11:131-157.  2007.</p>
     1587<p class="bibliomixed" id="lake2004geography"><a href="#idp697744">[Lake and Burggraf 2004] </a>Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. Wiley.  Chichester.  2004.</p>
    15871588</div>
    15881589<div class="footnotes">
     
    15911592            behaviour is defined by the inputs, current state and the events associated with
    15921593              transitions of states.</p></div>
    1593 <div id="idp674224" class="footnote"><p><sup class="fn-label"><a href="#idp674224-ref" class="footnoteref">[2]</a></sup> Markup Density: the ratio of markup bytes used to define the structure
     1594<div id="idp689664" class="footnote"><p><sup class="fn-label"><a href="#idp689664-ref" class="footnoteref">[2]</a></sup> Markup Density: the ratio of markup bytes used to define the structure
    15941595             of the document vs. its file size.</p></div>
    15951596</div>
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3395 r3397  
    146146        Parallelization and acceleration of XML parsing is a widely
    147147        studied problem that has seen the development of a number
    148         of interesting research prototypes using both SIMD and
    149         multicore parallelism.   Most works have investigated
     148        of interesting research prototypes using both single-instruction
     149           multiple-data (SIMD) and
     150        multi-core parallelism.   Most works have investigated
    150151        data parallel solutions on multicore
    151152        architectures using various strategies to break input
     
    193194      <para>
    194195        The remainder of this paper is organized as follows.   
    195           <xref linkend="background"/> discusses the structure of the Xerces and Parabix XML parsers and the fundamental
     196          <xref linkend="background" endterm="sec.title.link"/> discusses the structure of the Xerces and Parabix XML parsers and the fundamental
    196197        differences between the two parsing models.   
    197198        <xref linkend="architecture"/> then presents the icXML design based on a restructured Xerces architecture to
     
    11121113<bibliography>
    11131114  <title>Bibliography</title>
    1114   <bibliomixed xml:id="CameronHerdyLin2008" xreflabel="[Parabix1 2008]">Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. 2008 New York, NY, USA</bibliomixed>
    1115   <bibliomixed xml:id="papi" xreflabel="[PAPI]">Innovative Computing Laboratory, University of Texas. Performance Application Programming Interface.<link>http://icl.cs.utk.edu/papi/</link></bibliomixed>
    1116   <bibliomixed xml:id="perf" xreflabel="[perf]">Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf.<link>https://perf.wiki.kernel.org/index.php/Tutorial</link></bibliomixed>
    1117   <bibliomixed xml:id="Cameron2008" xreflabel="[u8u16 2008]">Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2008 New York, NY, USA</bibliomixed>
    1118   <bibliomixed xml:id="ParaDOM2009" xreflabel="[Shah and Rao 2009]">Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009</bibliomixed>
    1119   <bibliomixed xml:id="XMLSSE42" xreflabel="[Lei 2008]">Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). 2008<link>Intel Software Network</link></bibliomixed>
    1120   <bibliomixed xml:id="Cameron2009" xreflabel="[Balisage 2009]">Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. 2009</bibliomixed>
    1121   <bibliomixed xml:id="HilewitzLee2006" xreflabel="[Hilewitz and Lee 2006]">Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. 2006 Washington, DC, USA</bibliomixed>
    1122   <bibliomixed xml:id="Asanovic-EECS-2006-183" xreflabel="[Asanovic et al. 2006]">Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. 2006</bibliomixed>
    1123   <bibliomixed xml:id="GRID2006" xreflabel="[Lu and Chiu 2006]">Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. 2006 Washington, DC, USA</bibliomixed>
    1124   <bibliomixed xml:id="cameron-EuroPar2011" xreflabel="[Parabix2 2011]">Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II. 2011 Berlin, Heidelberg</bibliomixed>
    1125   <bibliomixed xml:id="HPCA2012" xreflabel="[Lin and Medforth 2012]">Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. 2012 Los Alamitos, CA, USA</bibliomixed>
    1126   <bibliomixed xml:id="HPCC2011" xreflabel="[You and Wang 2011]">You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. 2011 Los Alamitos, CA, USA</bibliomixed>
    1127   <bibliomixed xml:id="E-SCIENCE2007" xreflabel="[Pan and Zhang 2007]">Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing. 2007 Los Alamitos, CA, USA</bibliomixed>
    1128   <bibliomixed xml:id="ICWS2008" xreflabel="[Pan and Zhang 2008a]">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. 2008 Los Alamitos, CA, USA</bibliomixed>
    1129   <bibliomixed xml:id="IPDPS2008" xreflabel="[Pan and Zhang 2008b]">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. 2008 Los Alamitos, CA, USA</bibliomixed>
    1130   <bibliomixed xml:id="HackersDelight" xreflabel="[Warren 2002]">Warren, Henry S.. Hacker's Delight. 2002</bibliomixed>
    1131   <bibliomixed xml:id="lu2007advances" xreflabel="[Lu and Dos Santos 2007]">Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. 2007</bibliomixed>
    1132   <bibliomixed xml:id="lake2004geography" xreflabel="[Lake and Burggraf 2004]">Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. 2004</bibliomixed>
     1115  <bibliomixed xml:id="CameronHerdyLin2008" xreflabel="Parabix1 2008">Cameron, Robert D., Herdy, Kenneth S. and Lin, Dan. High performance XML parsing using parallel bit stream technology. CASCON'08: Proc. 2008 conference of the center for advanced studies on collaborative research. Richmond Hill, Ontario, Canada. 2008.</bibliomixed>
     1116  <bibliomixed xml:id="papi" xreflabel="PAPI">Innovative Computing Laboratory, University of Texas. Performance Application Programming Interface.<link>http://icl.cs.utk.edu/papi/</link></bibliomixed>
     1117  <bibliomixed xml:id="perf" xreflabel="perf">Eranian, Stephane, Gouriou, Eric, Moseley, Tipp and Bruijn, Willem de. Linux kernel profiling with perf. <link>https://perf.wiki.kernel.org/index.php/Tutorial</link></bibliomixed>
     1118  <bibliomixed xml:id="Cameron2008" xreflabel="u8u16 2008">Cameron, Robert D.. A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding. Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Salt Lake City, USA. 2008.</bibliomixed>
     1119  <bibliomixed xml:id="ParaDOM2009" xreflabel="Shah and Rao 2009">Shah, Bhavik, Rao, Praveen, Moon, Bongki and Rajagopalan, Mohan. A Data Parallel Algorithm for XML DOM Parsing. Database and XML Technologies. 2009.</bibliomixed>
     1120  <bibliomixed xml:id="XMLSSE42" xreflabel="Lei 2008">Lei, Zhai. XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4). <link>Intel Software Network</link>.  2008.</bibliomixed>
     1121  <bibliomixed xml:id="Cameron2009" xreflabel="Balisage 2009">Cameron, Rob, Herdy, Ken and Amiri, Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Int'l Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Montreal, Quebec, Canada.  2009.</bibliomixed>
     1122  <bibliomixed xml:id="HilewitzLee2006" xreflabel="Hilewitz and Lee 2006">Hilewitz, Yedidya and Lee, Ruby B.. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions. ASAP '06: Proc. IEEE 17th Int'l Conference on Application-specific Systems, Architectures and Processors. Steamboat Springs, Colorado, USA.  2006.</bibliomixed>
     1123  <bibliomixed xml:id="Asanovic-EECS-2006-183" xreflabel="Asanovic et al. 2006">Asanovic, Krste and others. The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department, University of California, Berkeley.  2006.</bibliomixed>
     1124  <bibliomixed xml:id="GRID2006" xreflabel="Lu and Chiu 2006">Lu, Wei, Chiu, Kenneth and Pan, Yinfei. A Parallel Approach to XML Parsing. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing. Barcelona, Spain.  2006.</bibliomixed>
     1125  <bibliomixed xml:id="cameron-EuroPar2011" xreflabel="Parabix2 2011">Cameron, Robert D., Amiri, Ehsan, Herdy, Kenneth S., Lin, Dan, Shermer, Thomas C. and Popowich, Fred P.. Parallel Scanning with Bitstream Addition: An XML Case Study. Euro-Par 2011, LNCS 6853, Part II.  Bordeaux, Frane. 2011.</bibliomixed>
     1126  <bibliomixed xml:id="HPCA2012" xreflabel="Lin and Medforth 2012">Lin, Dan, Medforth, Nigel, Herdy, Kenneth S., Shriraman, Arrvindh and Cameron, Rob. Parabix: Boosting the efficiency of text processing on commodity processors. International Symposium on High-Performance Computer Architecture. New Orleans, LA. 2012.</bibliomixed>
     1127  <bibliomixed xml:id="HPCC2011" xreflabel="You and Wang 2011">You, Cheng-Han and Wang, Sheng-De. A Data Parallel Approach to XML Parsing and Query. 10th IEEE International Conference on High Performance Computing and Communications. Banff, Alberta, Canada. 2011.</bibliomixed>
     1128  <bibliomixed xml:id="E-SCIENCE2007" xreflabel="Pan and Zhang 2007">Pan, Yinfei, Zhang, Ying, Chiu, Kenneth and Lu, Wei. Parallel XML Parsing Using Meta-DFAs. International Conference on e-Science and Grid Computing.   Bangalore, India.  2007.</bibliomixed>
     1129  <bibliomixed xml:id="ICWS2008" xreflabel="Pan and Zhang 2008a">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Hybrid Parallelism for XML SAX Parsing. IEEE International Conference on Web Services. Beijing, China.  2008.</bibliomixed>
     1130  <bibliomixed xml:id="IPDPS2008" xreflabel="Pan and Zhang 2008b">Pan, Yinfei, Zhang, Ying and Chiu, Kenneth. Simultaneous transducers for data-parallel XML parsing. International Parallel and Distributed Processing Symposium. Miami, Florida, USA.  2008.</bibliomixed>
     1131  <bibliomixed xml:id="HackersDelight" xreflabel="Warren 2002">Warren, Henry S.. Hacker's Delight. Addison-Wesley Professional. 2003.</bibliomixed>
     1132  <bibliomixed xml:id="lu2007advances" xreflabel="Lu and Dos Santos 2007">Lu, C.T., Dos Santos, R.F., Sripada, L.N. and Kou, Y.. Advances in GML for geospatial applications. Geoinformatica 11:131-157.  2007.</bibliomixed>
     1133  <bibliomixed xml:id="lake2004geography" xreflabel="Lake and Burggraf 2004">Lake, R., Burggraf, D.S., Trninic, M. and Rae, L.. Geography mark-up language (GML) [foundation for the geo-web]. Wiley.  Chichester.  2004.</bibliomixed>
    11331134</bibliography>
    11341135
  • docs/Balisage13/balisage-1-3-xsl/balisage-html.xsl

    r3040 r3397  
    11641164
    11651165  <xsl:template match="d:section" mode="label-text">
    1166     <xsl:text>section</xsl:text>
     1166    <xsl:text>Section</xsl:text>
    11671167    <xsl:for-each select="d:title">
    11681168      <xsl:text> &#x201c;</xsl:text>
Note: See TracChangeset for help on using the changeset viewer.