Changeset 3038


Ignore:
Timestamp:
Apr 18, 2013, 2:20:02 PM (6 years ago)
Author:
ksherdy
Message:

Moved over Bal 09 document to use as a template. Document validates agains 1-3.dtd.

Location:
docs/Balisage13/Bal2013came0601
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.html

    r3037 r3038  
    22<head>
    33<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    4 <title>Raleigh's Discoveries in the New World</title>
     4<title>Parallel Bit Stream Technology as a Foundation for XML Parsing Performance</title>
    55<link rel="stylesheet" href="balisage-plain.css" type="text/css">
    6 <meta name="keywords" content="New World discoveries, American colonial history, English 16th century history">
     6<meta name="keywords" content=", , ">
    77</head>
    88<body>
     
    1212<div lang="en" class="article">
    1313<div class="titlepage">
    14 <h2 class="article-title" id="idp282872">Raleigh's Discoveries in the New World</h2>
    15 <h3 class="subtitle"><i>The lure of vegetable culture</i></h3>
     14<h2 class="article-title" id="idp346280">Parallel Bit Stream Technology as a Foundation for XML Parsing Performance</h2>
    1615<div class="author">
    17 <h3 class="author">Tonia Renae Gaillard</h3>
     16<h3 class="author">Rob Cameron</h3>
    1817<div class="affiliation">
    19 <p class="jobtitle">Department Chair</p>
    20 <p class="orgname">Quetrane University</p>
    21 </div>
    22 <h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:tgaillard@quetrane.hist.edu">tgaillard@quetrane.hist.edu</a>&gt;</code></h5>
    23 </div>
     18<p class="jobtitle">Professor of Computing Science</p>
     19<p class="orgname">Simon Fraser University</p>
     20</div>
     21<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:cameron@cs.sfu.ca">cameron@cs.sfu.ca</a>&gt;</code></h5>
     22</div>
     23<div class="author">
     24<h3 class="author">Ken Herdy</h3>
     25<div class="affiliation">
     26<p class="jobtitle">Graduate Student, School of Computing Science</p>
     27<p class="orgname">Simon Fraser University </p>
     28</div>
     29<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:ksherdy@cs.sfu.ca">ksherdy@cs.sfu.ca</a>&gt;</code></h5>
     30</div>
     31<div class="author">
     32<h3 class="author">Ehsan Amiri</h3>
     33<div class="affiliation">
     34<p class="jobtitle">Graduate Student, School of Computing Science</p>
     35<p class="orgname">Simon Fraser University</p>
     36</div>
     37<h5 class="author-email"><code class="email">&lt;<a class="email" href="mailto:eamiri@cs.sfu.ca">eamiri@cs.sfu.ca</a>&gt;</code></h5>
     38</div>
     39<div class="legalnotice" id="idp498304"><p id="idp498432">Copyright © 2009 Robert D. Cameron, Kenneth S. Herdy and Ehsan Amiri.
     40            This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative
     41            Works 2.5 Canada License.</p></div>
    2442<div class="abstract">
    2543<p class="title"><b>Abstract</b></p>
    26 <p id="idp283648">In 1584, Queen Elizabeth I charged Sir Walter Raleigh to discover and colonize lands in
    27         the New World on behalf of England. This article discusses the efforts and expeditions
    28         organized under Raleigh's patronage to found a colony in the current-day North Carolina.
    29         Going beyond the chronological history of the doomed Roanoke colony, the author provides
    30         insights into relations between the colonists and their native counterparts, discusses the
    31         botanical riches of the region, and argues that Raleigh's efforts did not end in failure, as
    32         commonly might be thought.</p>
     44<p id="idp514976">By first transforming the octets (bytes) of XML texts into eight parallel bit
     45            streams, the SIMD features of commodity processors can be exploited for parallel
     46            processing of blocks of 128 input bytes at a time. Established transcoding and parsing
     47            techniques are reviewed followed by new techniques including parsing with bitstream
     48            addition. Further opportunities are discussed in light of expected advances in CPU
     49            architecture and compiler technology. Implications for various APIs and information
     50            models are presented as well opportunities for collaborative open-source
     51         development.</p>
    3352</div>
    3453<hr>
     
    3756<p><b>Table of Contents</b></p>
    3857<dl>
    39 <dt><span class="section"><a href="#idp264184" class="toc">Introduction</a></span></dt>
    40 <dt><span class="section"><a href="#idp221368" class="toc">Securing a Permanent Colony in the Claimed Lands</a></span></dt>
     58<dt><span class="section"><a href="#idp499912" class="toc">Introduction</a></span></dt>
     59<dt><span class="section"><a href="#idp644272" class="toc">A Catalog of Parallel Bit Streams for XML</a></span></dt>
    4160<dd><dl>
    42 <dt><span class="section"><a href="#idp278696" class="toc">Native Plants and Wildlife</a></span></dt>
     61<dt><span class="section"><a href="#idp644640" class="toc">Introduction</a></span></dt>
     62<dt><span class="section"><a href="#idp646032" class="toc">Basis Bit Streams</a></span></dt>
     63<dt><span class="section"><a href="#idp696728" class="toc">General Streams</a></span></dt>
    4364<dd><dl>
    44 <dt><span class="section"><a href="#idp424432" class="toc">Gourds</a></span></dt>
    45 <dt><span class="section"><a href="#idp429752" class="toc">Tomatoes</a></span></dt>
    46 <dt><span class="section"><a href="#idp433784" class="toc">Potatoes</a></span></dt>
    47 <dt><span class="section"><a href="#idp434984" class="toc">Fruits and Nuts</a></span></dt>
     65<dt><span class="section"><a href="#idp697328" class="toc">Deletion Mask Streams</a></span></dt>
     66<dt><span class="section"><a href="#idp707448" class="toc">Error Flag Streams </a></span></dt>
     67</dl></dd>
     68<dt><span class="section"><a href="#idp716704" class="toc">Lexical Item Streams</a></span></dt>
     69<dt><span class="section"><a href="#idp755744" class="toc">UTF-8 Byte Classification, Scope and Validation Streams</a></span></dt>
     70<dd><dl>
     71<dt><span class="section"><a href="#idp757104" class="toc">UTF-8 Byte Classification Streams</a></span></dt>
     72<dt><span class="section"><a href="#idp758312" class="toc">UTF-8 Scope Streams</a></span></dt>
     73<dt><span class="section"><a href="#idp778672" class="toc">UTF-8 Validation Streams</a></span></dt>
     74<dt><span class="section"><a href="#idp781056" class="toc">XML Character Validation Streams</a></span></dt>
     75<dt><span class="section"><a href="#idp782512" class="toc">UTF-8 to UTF-16 Transcoding</a></span></dt>
     76<dt><span class="section"><a href="#idp785384" class="toc">UTF-8 Indexed UTF-16 Streams</a></span></dt>
     77</dl></dd>
     78<dt><span class="section"><a href="#idp786760" class="toc">Control Character Streams</a></span></dt>
     79<dd><dl>
     80<dt><span class="section"><a href="#idp787768" class="toc">XML Character Validation</a></span></dt>
     81<dt><span class="section"><a href="#idp789136" class="toc">XML 1.0 End-of-line Handling</a></span></dt>
     82</dl></dd>
     83<dt><span class="section"><a href="#idp797864" class="toc">Call Out Streams</a></span></dt>
     84<dd><dl>
     85<dt><span class="section"><a href="#idp798776" class="toc">Comment, Processing Instruction and CDATA Section Call Out Streams</a></span></dt>
     86<dt><span class="section"><a href="#idp807344" class="toc">Reference Call Out Streams</a></span></dt>
     87<dt><span class="section"><a href="#idp809320" class="toc">Tag Call Out Streams</a></span></dt>
    4888</dl></dd>
    4989</dl></dd>
    50 <dt><span class="appendix"><a href="#mul83" class="toc">Appendix A. The First Expeditionary Force’s Colony, 1585-1586
    51     </a></span></dt>
    52 <dt><span class="appendix"><a href="#mul88" class="toc">Appendix B. The Roanoke Colony, 1587
    53     </a></span></dt>
     90<dt><span class="section"><a href="#idp822632" class="toc">SIMD Beyond Bitstreams: Names and Numbers</a></span></dt>
     91<dd><dl>
     92<dt><span class="section"><a href="#idp823448" class="toc">Name Lookup</a></span></dt>
     93<dt><span class="section"><a href="#idp830280" class="toc">Numeric Processing</a></span></dt>
     94</dl></dd>
     95<dt><span class="section"><a href="#idp833080" class="toc">APIs and Parallel Bit Streams</a></span></dt>
     96<dd><dl>
     97<dt><span class="section"><a href="#idp833440" class="toc">The ILAX Streaming API</a></span></dt>
     98<dt><span class="section"><a href="#idp836736" class="toc">Efficient XML in Java Using Array Set Models</a></span></dt>
     99<dd><dl><dt><span class="section"><a href="#idp845360" class="toc">Saxon-B TinyTree Example</a></span></dt></dl></dd>
     100</dl></dd>
     101<dt><span class="section"><a href="#idp853000" class="toc">Compiler Technology</a></span></dt>
     102<dd><dl>
     103<dt><span class="section"><a href="#idp855592" class="toc">Character Class Compiler</a></span></dt>
     104<dt><span class="section"><a href="#idp857544" class="toc">Regular Expression Compilation</a></span></dt>
     105<dt><span class="section"><a href="#idp858856" class="toc">Unbounded Bit Stream Compilation</a></span></dt>
     106</dl></dd>
     107<dt><span class="section"><a href="#idp860384" class="toc">Conclusion</a></span></dt>
     108<dt><span class="section"><a href="#idp864448" class="toc">Acknowledgments</a></span></dt>
    54109</dl>
    55110</div>
    56 <div class="section" id="idp264184">
     111<div class="section" id="idp499912">
    57112<h2 class="title" style="clear: both">Introduction</h2>
    58 <p id="idp264528">On March 25, 1584, Queen Elizabeth I of England charged Sir Walter Raleigh to<div class="blockquote"><blockquote class="blockquote">
    59 <p id="idp264872">discover, search, find out, and view such remote, heathen and barbarous lands,
    60           countries, and territories, not actually possessed of any Christian Prince, nor inhabited
    61           by Christian People</p>
    62 <p class="attribution">— <a class="xref" href="#thorpe97" id="idp265344">Thorpe 1997</a></p>
    63 </blockquote></div> That same year Raleigh sent two captains, Philip Amades and Arthur Barlowe, from
    64       England to Hispaniola and the Canary Islands; from there, the captains were instructed to
    65       scout the lands northeast of those already claimed by Spain, to wit, Florida. This land — now
    66       encompassing the Carolinas and Virginia — was claimed on behalf of England and named
    67         <q>Virginia,</q> in honor of the <q>Virgin Queen.</q></p>
    68 <p id="idp273064">Information processing, especially text markup, was primitive in the colony. For example,
    69       most text stores were in XML! Documents may have looked like this:
    70       <pre class="programlisting" id="idp273368">&lt;teiHeader&gt;
    71   &lt;fileDesc&gt;
    72     &lt;titleStmt&gt;
    73       &lt;title&gt;Farming in the New World&lt;/title&gt;
    74       ...
    75     &lt;/titleStmt&gt;
    76   &lt;/fileDesc&gt;
    77 &lt;/teiHeader&gt;</pre>
    78       Notice the paired Tags: <code class="code">&lt;title&gt;</code> and <code class="code">&lt;/title&gt;</code> and the
    79       primitive use of indenting. Unusual features of the colonists data processing practices included:<div class="variablelist" id="idp274544"><table border="0" class="variablelist">
    80 <col valign="top" align="left">
     113<p id="idp500256"> While particular XML applications may benefit from special-purpose hardware such as XML
     114         chips [<a class="xref" href="#XMLChip09" id="idp500496">Leventhal and Lemoine 2009</a>] or appliances [<a class="xref" href="#Datapower09" id="idp506976">Salz, Achilles and Maze 2009</a>], the bulk
     115         of the world's XML processing workload will continue to be handled by XML software stacks
     116         on commodity processors. Exploiting the SIMD capabilities of such processors such as the
     117         SSE instructions of x86 chips, parallel bit stream technology offers the potential of
     118         dramatic improvement over byte-at-a-time processing for a variety of XML processing tasks.
     119         Character set issues such as Unicode validation and transcoding [<a class="xref" href="#PPoPP08" id="idp507840">Cameron 2007</a>], normalization of line breaks and white space and XML character validation can be
     120         handled fully in parallel using this representation. Lexical item streams, such as the bit
     121         stream marking the positions of opening angle brackets, can also be formed in parallel.
     122         Bit-scan instructions of commodity processors may then be used on lexical item streams to
     123         implement rapid single-instruction scanning across variable-length multi-byte text blocks
     124         as in the Parabix XML parser [<a class="xref" href="#CASCON08" id="idp508744">Cameron, Herdy and Lin 2008</a>]. Overall, these techniques may be
     125         combined to yield end-to-end performance that may be 1.5X to 15X faster than alternatives
     126            [<a class="xref" href="#SVGOpen08" id="idp509256">Herdy, Burggraf and Cameron 2008</a>].</p>
     127<p id="idp509720">Continued research in parallel bit stream techniques as well as more conventional
     128         application of SIMD techniques in XML processing offers further prospects for improvement
     129         of core XML components as well as for tackling performance-critical tasks further up the
     130         stack. A newly prototyped technique for parallel tag parsing using bitstream addition is
     131         expected to improve parsing performance even beyond that achieved using sequential bit
     132         scans. Several techniques for improved symbol table performance are being investigated,
     133         including parallel hash value calculation and length-based sorting using the cheap length
     134         determination afforded by bit scans. To deliver the benefits of parallel bit stream
     135         technology to the Java world, we are developing Array Set Model (ASM) representations of
     136         XML Infoset and other XML information models for efficient transmission across the JNI
     137         boundary.</p>
     138<p id="idp510920">Amplifying these software advances, continuing hardware advances in commodity processors
     139         increase the relative advantage of parallel bit stream techniques over traditional
     140         byte-at-a-time processors. For example, the Intel Core architecture improved SSE processing
     141         to give superscalar execution of bitwise logic operations (3 instructions per cycle vs. 1
     142         in Pentium 4). Upcoming 256-bit AVX technology extends the register set and replaces
     143         destructive two-operand instructions with a nondestructive three-operand form. General
     144         purpose programming on graphic processing units (GPGPU) such as the upcoming 512-bit
     145         Larrabee processor may also be useful for XML applications using parallel bit streams. New
     146         instruction set architectures may also offer dramatic improvements in core algorithms.
     147         Using the relatively simple extensions to support the principle of inductive doubling, a 3X
     148         improvement in several core parallel bit stream algorithms may be achieved [<a class="xref" href="#ASPLOS09" id="idp638256">Cameron and Lin 2009</a>]. Other possibilities include direct implementation of parallel
     149         extract and parallel deposit (pex/pdep) instructions [<a class="xref" href="#Pex06" id="idp638640">Hilewitz and Lee 2006</a>], and
     150         bit-level interleave operations as in Larrabee, each of which would have important
     151         application to parallel bit stream processing.</p>
     152<p id="idp639128">Further prospects for XML performance improvement arise from leveraging the
     153         intraregister parallelism of parallel bit stream technology to exploit the interchip
     154         parallelism of multicore computing. Parallel bit stream techniques can support multicore
     155         parallelism in both data partitioning and task partitioning models. For example, the
     156         datasection partitioning approach of Wu, Zhang, Yu and Li may be used to partition blocks
     157         for speculative parallel parsing on separate cores followed by a postprocessing step to
     158         join partial S-trees [<a class="xref" href="#Wu08" id="idp639856">Wu et al. 2008</a>].</p>
     159<p id="idp640296">In our view, the established and expected performance advantages of parallel bit stream
     160         technology over traditional byte-at-a-time processing are so compelling that parallel bit
     161         stream technology should ultimately form the foundation of every high-performance XML
     162         software stack. We envision a common high-performance XML kernel that may be customized to
     163         a variety of processor architectures and that supports a wide range of existing and new XML
     164         APIs. Widespread deployment of this technology should greatly benefit the XML community in
     165         addressing both the deserved and undeserved criticism of XML on performance grounds. A
     166         further benefit of improved performance is a substantial greening of XML technologies.</p>
     167<p id="idp641272">To complement our research program investigating fundamental algorithms and issues in
     168         high-performance XML processing, our work also involves development of open source software
     169         implementing these algorithms, with a goal of full conformance to relevant specifications.
     170         From the research perspective, this approach is valuable in ensuring that the full
     171         complexity of required XML processing is addressed in reporting and assessing processing
     172         results. However, our goal is also to use this open source software as a basis of
     173         technology transfer. A Simon Fraser University spin-off company, called International
     174         Characters, Inc., has been created to commercialize the results of this work using a
     175         patent-based open source model.</p>
     176<p id="idp642272">To date, we have not yet been successful in establishing a broader community of
     177         participation with our open source code base. Within open-source communities, there is
     178         often a general antipathy towards software patents; this may limit engagement with our
     179         technology, even though it has been dedicated for free use in open source. </p>
     180<p id="idp642832">A further complication is the inherent difficulty of SIMD programming in general, and
     181         parallel bit stream programming in particular. Considerable work is required with each new
     182         algorithmic technique being investigated as well as in retargetting our techniques for each
     183         new development in SIMD and multicore processor technologies. To address these concerns, we
     184         have increasingly shifted the emphasis of our research program towards compiler technology
     185         capable of generating parallel bit stream code from higher-level specifications.</p>
     186</div>
     187<div class="section" id="idp644272">
     188<h2 class="title" style="clear: both">A Catalog of Parallel Bit Streams for XML</h2>
     189<div class="section" id="idp644640">
     190<h3 class="title" style="clear: both">Introduction</h3>
     191<p id="idp644984">In this section, we introduce the fundamental concepts of parallel bit stream
     192            technology and present a comprehensive catalog of parallel bit streams for use in XML
     193            processing. In presenting this catalog, the focus is on the specification of the bit
     194            streams as data streams in one-to-one correspondence with the character code units of an
     195            input XML stream. The goal is to define these bit streams in the abstract without
     196            initially considering memory layouts, register widths or other issues related to
     197            particular target architectures. In cataloging these techniques, we also hope to convey
     198            a sense of the breadth of applications of parallel bit stream technology to XML
     199            processing tasks. </p>
     200</div>
     201<div class="section" id="idp646032">
     202<h3 class="title" style="clear: both">Basis Bit Streams</h3>
     203<p id="idp646376">Given a byte-oriented text stream represented in UTF-8, for example, we define a
     204            transform representation of this text consisting of a set of eight parallel bit streams
     205            for the individual bits of each byte. Thus, the <code class="code">Bit0</code> stream is the stream
     206            of bits consisting of bit 0 of each byte in the input byte stream, <code class="code">Bit1</code> is
     207            the bit stream consisting of bit 1 of each byte in the input stream and so on. The set
     208            of streams <code class="code">Bit0</code> through <code class="code">Bit7</code> are known as the <span class="ital">basis
     209               streams</span> of the parallel bit stream representation. The following table
     210            shows an example XML character stream together with its representation as a set of 8
     211            basis streams. <div class="table-wrapper" id="idp648288">
     212<p class="title">Table I</p>
     213<div class="caption"><p id="idp451472">XML Character Stream Transposition.</p></div>
     214<table class="table">
     215<colgroup span="1">
     216<col align="left" valign="top" span="1">
     217<col align="left" valign="top" span="1">
     218<col align="left" valign="top" span="1">
     219<col align="left" valign="top" span="1">
     220<col align="left" valign="top" span="1">
     221<col align="left" valign="top" span="1">
     222</colgroup>
    81223<tbody>
    82 <tr id="idp274672">
    83 <td class="varlist-term"><p><span class="term">Tags</span></p></td>
    84 <td class="varlist-item"><p id="idp219936">Meaningful descriptions of the information enclosed by the markers</p></td>
    85 </tr>
    86 <tr id="idp220328">
    87 <td class="varlist-term"><p><span class="term">Balance</span></p></td>
    88 <td class="varlist-item"><p id="idp220792">All markup is both opened and closed (or explicitly empty)</p></td>
    89 </tr>
    90 </tbody>
    91 </table></div>
    92     </p>
    93 </div>
    94 <div class="section" id="idp221368">
    95 <h2 class="title" style="clear: both">Securing a Permanent Colony in the Claimed Lands</h2>
    96 <p id="idp221744">With land claimed in the New World, an expedition was mounted to establish a settlement.
    97       The first expedition failed. Led by Sir Richard Grenville in April 1585, it encompassed 600
    98       men of which 105 remained in the colony while Grenville returned to England for additional
    99       provisions. (<span class="ital">See</span>
    100       <a class="xref" href="#mul83" title="The First Expeditionary Force’s Colony, 1585-1586 This Appendix contains an actual list of the individuals who sailed to the Roanoke
    101           Colony in 1585. (See
    102           Durant, David N., Raleigh’s Lost Colony, Appendix I, Atheneum,
    103             NY: 1981.)
    104     ">Appendix A</a>.) However, when almost a year passed without Grenville’s return, the
    105       remainder of the expeditionary force took advantage of Sir Francis Drake’s arrival to seek
    106       return passage to England. <sup class="fn-label"><a href="#mul74" class="footnoteref" id="mul74-ref">[1]</a></sup>
    107     </p>
    108 <p id="idp277352">The second expedition, organized by John White in 1587, fared better. It sailed with seven
    109       ships filled with Devon families intent upon establishing a colony in that part of Virginia
    110       called Roanoke, a word deriving from the speech of native peoples. [<a class="xref" href="#mul88" title="The Roanoke Colony, 1587 This Appendix contains an actual list of the individuals who sailed to the Roanoke
    111           Colony in 1587, as well as the names of children born at the colony. (See
    112           Durant, David N., Raleigh’s Lost Colony, Appendix I, Atheneum,
    113             NY: 1981.)
    114     ">Appendix B</a>]
    115       Two years after founding the <q>Cittie of Raleigh,</q> houses had been built for
    116       almost all families residing in the colony, and the colony had celebrated the birth of its
    117       first children born in the New World. The first child, grandchild of John White and child of
    118       Ananias and Eleanor Dare, was been named Virginia in honor of the sovereign.</p>
    119 <div class="section" id="idp278696">
    120 <h3 class="title" style="clear: both">Native Plants and Wildlife</h3>
    121 <p id="idp279016">The European settlers found the New World abundant with commodities <q>known to
    122           yield victual and sustenance of man’s life</q>. The first expeditionary force noted
    123         that a great variety of berries grew wildly, including raspberries, blueberries, and
    124         strawberries. Along with maize, native grain, which could be made into bread, grew in the
    125         area. Two other plants — more properly called roots — which could be saved for winter
    126         consumption were <q>cassida</q> and <q>chyna.</q> The settlers discovered
    127         that while some roots could be eaten much in appearance as they were dug, others had to be
    128         boiled before use as a foodstuff. As more fully described below, other plants included
    129         beans, and several crops previously unknown to the Europeans: <div class="itemizedlist" id="idp421712"><ul>
    130 <li id="idp421840"><p id="idp421968"><q>macocqwer</q> (gourds),</p></li>
    131 <li id="idp422352"><p id="idp422480"><q>melden</q> (an herb),</p></li>
    132 <li id="idp422864"><p id="idp422992"><q>planta solis</q> (sunflower — used in a type of bread, as well as for
    133               broth),</p></li>
    134 <li id="idp423536"><p id="idp423664">peas (powdered in a mortar), and</p></li>
    135 <li id="idp423920"><p id="idp424048">potatoes.</p></li>
    136 </ul></div>
    137       </p>
    138 <div class="section" id="idp424432">
    139 <h4 class="title" style="clear: both">Gourds</h4>
    140 <p id="idp424752">The native people grew a variety of large broad-leafed, ground-covering vines which
    141           produced what they called <q>macocqwer</q> or gourds. (<span class="ital">See</span>
    142           <a class="xref" href="#gourds" title="Gourds">Figure 1</a>.) Varying in color among shades of green, yellow, and orange,
    143           these gourds served a number of functions, not chief of which was as a food source. There
    144           were two distinct types, soft-shelled and hard-shelled. Of particular interest to the
    145           settlers were pumpkins; grown throughout the summer, this gourd remained in the fields
    146           until late autumn’s frost. Following harvest, the gourd could be stored throughout the
    147           winter and its flesh made into stews.</p>
    148 <div class="figure" id="gourds">
    149 <p class="title">Figure 1: Gourds</p>
    150 <div class="figure-contents">
    151 <div class="mediaobject" id="idp427408"><img alt="jpg image (19450212-2.jpg)" src="19450212-2.jpg" width="50%"></div>
    152 <div class="caption"><p id="idp428408">While gourds, pumpkins and squashes were new to the English, they were soon
    153               discovered to be very useful for warding off starvation.</p></div>
    154 </div>
    155 </div>
    156 <p id="idp428880">However, far more important was the hard-skinned gourd. The value of this gourd lay
    157           not in its potential as a food source, but rather as a container and serving vessel. Once
    158           dried, these gourds were cut and hollowed for use as storage containers, as well as for
    159           bowls, ladles, cups, and other types of serving utensils. Indeed, since gourds grew in a
    160           variety of shapes and sizes, particular gourds could be selected for their resemblance to
    161           the items sought. For the adventurous, the durable objects could be carved and decorated
    162           with plant dyes.</p>
    163 </div>
    164 <div class="section" id="idp429752">
    165 <h4 class="title" style="clear: both">Tomatoes</h4>
    166 <p id="idp430072">Also new to the colonists was the tomato. (<span class="ital">See</span>
    167           <a class="xref" href="#Tomatoes" title="Tomatoes">Figure 2</a>.) Tomatoes were described as thin-skinned succulent fruits with
    168           pulpy interiors. Almost <q>Bristol red</q> in color, the fruit was somewhat round
    169           in form and the size of a chicken’s egg — not anything approaching the size of modern,
    170           cultivated varieties. Of particular importance was the plant’s fecundity. The flowering,
    171           bush-like plant bore fruit over a period of months. Thus, a plant could produce as many as
    172           15 tomatoes at a given time.</p>
    173 <div class="figure" id="Tomatoes">
    174 <p class="title">Figure 2: Tomatoes</p>
    175 <div class="figure-contents">
    176 <div class="mediaobject" id="idp432328"><img alt="jpg image (19450212-1.jpg)" src="19450212-1.jpg" width="50%"></div>
    177 <div class="caption"><p id="idp433328">Tomatoes were among the new fruits discovered in the New World.</p></div>
    178 </div>
    179 </div>
    180 </div>
    181 <div class="section" id="idp433784">
    182 <h4 class="title" style="clear: both">Potatoes</h4>
    183 <p id="idp434104">Another root that proved beneficial was the potato. Similarly, to the
    184             <q>cassida</q> mentioned earlier, potatoes were considered roots rather than
    185           plants, as the edible portion of the plant lay underground. Given the curious nature of
    186           this legume, several examples were brought back from the New World. In fact, Raleigh
    187           attempted to cultivate the potato at his estate, Youghall, in Ireland.</p>
    188 </div>
    189 <div class="section" id="idp434984">
    190 <h4 class="title" style="clear: both">Fruits and Nuts</h4>
    191 <p id="idp435304">The colony abounded with a wealth of fruits and nuts, some not previously known to the
    192           Europeans. In addition to mulberries, strawberries, and blueberries already mentioned, one
    193           of the more curious fruits found was called <q>medlar</q> by the natives. Medlar
    194           was a fruit not unlike cherries in size and color, but with a much sweeter taste. An
    195           equally unusual fruit was <q>metaqvesvnnaqvk</q>; notwithstanding its red fruit,
    196           that plant’s more important feature lay in the cochinile insects which fed upon its
    197           prickly thick leaves <sup class="fn-label"><a href="#mul90" class="footnoteref" id="mul90-ref">[2]</a></sup>.</p>
    198 <p id="idp437256">Equally abundant in variety and size were nuts. In addition to chestnuts and walnuts,
    199           the natives <q>harvested</q> no less than five types of <q>acorns</q>.
    200           These somewhat small nuts were either dried in a manner similar to that used in England
    201           for malt, or were boiled for broth.</p>
    202 <div class="table-wrapper" id="idp438144">
    203 <p class="title">Table I</p>
    204 <div class="caption"><p id="idp438400">North American Native Plants</p></div>
    205 <table class="table">
    206 <col align="right" valign="top" span="1">
    207 <col valign="top" span="1">
    208 <col align="center" valign="top" span="1">
    209 <thead><tr valign="top">
    210 <th>Use</th>
    211 <th>Plant Part</th>
    212 <th>Examples</th>
    213 </tr></thead>
    214 <tbody>
    215 <tr valign="top"><td rowspan="8"><span class="bold">Vegetables</span></td></tr>
    216 <tr valign="top">
    217 <td rowspan="2">Seeds</td>
    218 <td>Corn</td>
    219 </tr>
    220 <tr valign="top"><td>Beans</td></tr>
    221 <tr valign="top">
    222 <td rowspan="3">Fruits</td>
    223 <td>Squash</td>
    224 </tr>
    225 <tr valign="top"><td>Peppers</td></tr>
    226 <tr valign="top"><td>Tomatoes</td></tr>
    227 <tr valign="top">
    228 <td rowspan="2">Roots</td>
    229 <td>Potatoes</td>
    230 </tr>
    231 <tr valign="top"><td>Sweet Potatoes</td></tr>
    232 <tr valign="top"><td rowspan="4"><span class="bold">Teas</span></td></tr>
    233 <tr valign="top">
    234 <td rowspan="2">Leaves</td>
    235 <td>Mountain Mint (Namewuskons)</td>
    236 </tr>
    237 <tr valign="top"><td>Dawn Mint (Wabinowusk)</td></tr>
    238 <tr valign="top">
    239 <td>Berries</td>
    240 <td>Wintergreen</td>
     224<tr valign="top">
     225<td>Input Data</td>
     226<td>
     227                        <code class="code">&lt;</code>
     228                     </td>
     229<td>
     230                        <code class="code">t</code>
     231                     </td>
     232<td>
     233                        <code class="code">a</code>
     234                     </td>
     235<td>
     236                        <code class="code">g</code>
     237                     </td>
     238<td>
     239                        <code class="code">/</code>
     240                     </td>
     241<td>
     242                        <code class="code">&gt;</code>
     243                     </td>
     244</tr>
     245<tr valign="top">
     246<td>ASCII</td>
     247<td>
     248                        <code class="code">00111100</code>
     249                     </td>
     250<td>
     251                        <code class="code">01110100</code>
     252                     </td>
     253<td>
     254                        <code class="code">01100001</code>
     255                     </td>
     256<td>
     257                        <code class="code">01100111</code>
     258                     </td>
     259<td>
     260                        <code class="code">00101111</code>
     261                     </td>
     262<td>
     263                        <code class="code">00111110</code>
     264                     </td>
     265</tr>
     266<tr valign="top">
     267<td>Bit0</td>
     268<td>
     269                        <code class="code">0</code>
     270                     </td>
     271<td>
     272                        <code class="code">0</code>
     273                     </td>
     274<td>
     275                        <code class="code">0</code>
     276                     </td>
     277<td>
     278                        <code class="code">0</code>
     279                     </td>
     280<td>
     281                        <code class="code">0</code>
     282                     </td>
     283<td>
     284                        <code class="code">0</code>
     285                     </td>
     286</tr>
     287<tr valign="top">
     288<td>Bit1</td>
     289<td>
     290                        <code class="code">0</code>
     291                     </td>
     292<td>
     293                        <code class="code">1</code>
     294                     </td>
     295<td>
     296                        <code class="code">1</code>
     297                     </td>
     298<td>
     299                        <code class="code">1</code>
     300                     </td>
     301<td>
     302                        <code class="code">0</code>
     303                     </td>
     304<td>
     305                        <code class="code">0</code>
     306                     </td>
     307</tr>
     308<tr valign="top">
     309<td>Bit2</td>
     310<td>
     311                        <code class="code">1</code>
     312                     </td>
     313<td>
     314                        <code class="code">1</code>
     315                     </td>
     316<td>
     317                        <code class="code">1</code>
     318                     </td>
     319<td>
     320                        <code class="code">1</code>
     321                     </td>
     322<td>
     323                        <code class="code">1</code>
     324                     </td>
     325<td>
     326                        <code class="code">1</code>
     327                     </td>
     328</tr>
     329<tr valign="top">
     330<td>Bit3</td>
     331<td>
     332                        <code class="code">1</code>
     333                     </td>
     334<td>
     335                        <code class="code">1</code>
     336                     </td>
     337<td>
     338                        <code class="code">0</code>
     339                     </td>
     340<td>
     341                        <code class="code">0</code>
     342                     </td>
     343<td>
     344                        <code class="code">0</code>
     345                     </td>
     346<td>
     347                        <code class="code">1</code>
     348                     </td>
     349</tr>
     350<tr valign="top">
     351<td>Bit4</td>
     352<td>
     353                        <code class="code">1</code>
     354                     </td>
     355<td>
     356                        <code class="code">0</code>
     357                     </td>
     358<td>
     359                        <code class="code">0</code>
     360                     </td>
     361<td>
     362                        <code class="code">0</code>
     363                     </td>
     364<td>
     365                        <code class="code">1</code>
     366                     </td>
     367<td>
     368                        <code class="code">1</code>
     369                     </td>
     370</tr>
     371<tr valign="top">
     372<td>Bit5</td>
     373<td>
     374                        <code class="code">1</code>
     375                     </td>
     376<td>
     377                        <code class="code">1</code>
     378                     </td>
     379<td>
     380                        <code class="code">0</code>
     381                     </td>
     382<td>
     383                        <code class="code">1</code>
     384                     </td>
     385<td>
     386                        <code class="code">1</code>
     387                     </td>
     388<td>
     389                        <code class="code">1</code>
     390                     </td>
     391</tr>
     392<tr valign="top">
     393<td>Bit6</td>
     394<td>
     395                        <code class="code">0</code>
     396                     </td>
     397<td>
     398                        <code class="code">0</code>
     399                     </td>
     400<td>
     401                        <code class="code">0</code>
     402                     </td>
     403<td>
     404                        <code class="code">1</code>
     405                     </td>
     406<td>
     407                        <code class="code">1</code>
     408                     </td>
     409<td>
     410                        <code class="code">1</code>
     411                     </td>
     412</tr>
     413<tr valign="top">
     414<td>Bit7</td>
     415<td>
     416                        <code class="code">0</code>
     417                     </td>
     418<td>
     419                        <code class="code">0</code>
     420                     </td>
     421<td>
     422                        <code class="code">1</code>
     423                     </td>
     424<td>
     425                        <code class="code">1</code>
     426                     </td>
     427<td>
     428                        <code class="code">1</code>
     429                     </td>
     430<td>
     431                        <code class="code">0</code>
     432                     </td>
    241433</tr>
    242434</tbody>
    243435</table>
    244436</div>
    245 </div>
    246 </div>
    247 </div>
    248 <div class="appendix" id="mul83">
    249 <h2 class="title" style="clear: both">Appendix A. The First Expeditionary Force’s Colony, 1585-1586 <sup class="fn-label"><a href="#mul5" class="footnoteref" id="mul5-ref">[3]</a></sup>
    250     </h2>
    251 <div class="itemizedlist" id="idp452152"><ul>
    252 <li id="idp452280"><p id="idp452408">Master Philip Amades, Admirall of the countrie</p></li>
    253 <li id="idp452720"><p id="idp452848">Master Hariot</p></li>
    254 <li id="idp453128"><p id="idp453256">Master Acton</p></li>
    255 <li id="idp453536"><p id="idp453664">Master Edward Stafford</p></li>
    256 <li id="idp453952"><p id="idp454080">Thomas Luddington</p></li>
    257 <li id="idp454360"><p id="idp454488">Master Marvyn</p></li>
    258 <li id="idp454768"><p id="idp454896">Master Gardyner</p></li>
    259 <li id="idp455176"><p id="idp455304">Captaine Vaughan</p></li>
    260 <li id="idp455584"><p id="idp455712">Master Kendall</p></li>
    261 <li id="idp455992"><p id="idp456120">Master Prideox</p></li>
    262 <li id="idp456400"><p id="idp456528">Robert Holecroft</p></li>
    263 <li id="idp456808"><p id="idp456936">Rise Courtney</p></li>
    264 <li id="idp457216"><p id="idp457344">Master Hugh Rogers</p></li>
    265 <li id="idp457624"><p id="idp457752">Thomas Foxe</p></li>
    266 <li id="idp458024"><p id="idp458152">Edward Nugent</p></li>
    267 <li id="idp458432"><p id="idp458560">Darby Glande</p></li>
    268 <li id="idp458840"><p id="idp458968">Edward Kelle</p></li>
    269 <li id="idp459248"><p id="idp459376">John Gostigo</p></li>
    270 <li id="idp459656"><p id="idp459784">Erasmus Clefs</p></li>
    271 <li id="idp460064"><p id="idp460192">Edward Ketcheman</p></li>
    272 <li id="idp460472"><p id="idp460600">John Linsey</p></li>
    273 <li id="idp460872"><p id="idp461000">Thomas Rottenbury</p></li>
    274 <li id="idp461280"><p id="idp461408">Roger Deane</p></li>
    275 <li id="idp461680"><p id="idp461808">John Harris</p></li>
    276 <li id="idp462080"><p id="idp462208">Frauncis Norris</p></li>
    277 <li id="idp462488"><p id="idp462616">Matthewe Lyne</p></li>
    278 <li id="idp462896"><p id="idp463024">Edward Kettell</p></li>
    279 <li id="idp463304"><p id="idp463432">Thomas Wisse</p></li>
    280 <li id="idp463712"><p id="idp463840">Master Thomas Harvie</p></li>
    281 <li id="idp464128"><p id="idp464256">Master Snelling</p></li>
    282 <li id="idp464536"><p id="idp464664">Master Anthony Russe</p></li>
    283 <li id="idp464952"><p id="idp465080">Master Allyne</p></li>
    284 <li id="idp465360"><p id="idp465488">Master Michel Polyson</p></li>
    285 <li id="idp465776"><p id="idp465904">John Cage</p></li>
    286 <li id="idp466176"><p id="idp466304">Thomas Parre</p></li>
    287 <li id="idp466584"><p id="idp466712">William Randes</p></li>
    288 <li id="idp466992"><p id="idp467120">Geffrey Churchman</p></li>
    289 <li id="idp467400"><p id="idp467528">William Farthowe</p></li>
    290 <li id="idp467808"><p id="idp467936">John Taylor</p></li>
    291 <li id="idp468208"><p id="idp468336">Philppe Robyns</p></li>
    292 <li id="idp468616"><p id="idp468744">Thomas Phillipes</p></li>
    293 <li id="idp469024"><p id="idp469152">Valentine Beale</p></li>
    294 <li id="idp469432"><p id="idp469560">James Skinner</p></li>
    295 <li id="idp469840"><p id="idp469968">George Eseven</p></li>
    296 <li id="idp470248"><p id="idp470376">John Chaundeler</p></li>
    297 <li id="idp470656"><p id="idp470784">Philip Blunt</p></li>
    298 <li id="idp471064"><p id="idp471192">Richard Poore</p></li>
    299 <li id="idp471472"><p id="idp471600">Robert Yong</p></li>
    300 <li id="idp471872"><p id="idp472000">Marmaduke Constable</p></li>
    301 <li id="idp472280"><p id="idp472408">Thomas Hesket</p></li>
    302 <li id="idp472688"><p id="idp472816">William Wasse</p></li>
    303 <li id="idp473096"><p id="idp473224">John Fever</p></li>
    304 <li id="idp473496"><p id="idp473624">Daniel</p></li>
    305 <li id="idp473896"><p id="idp474024">Thomas Taylor</p></li>
    306 <li id="idp474304"><p id="idp474432">Richard Humfrey</p></li>
    307 <li id="idp474712"><p id="idp474840">John Wright</p></li>
    308 <li id="idp475112"><p id="idp475240">Gabriell North</p></li>
    309 <li id="idp475520"><p id="idp475648">Robert Biscombe</p></li>
    310 <li id="idp475928"><p id="idp476056">William Backhouse</p></li>
    311 <li id="idp476336"><p id="idp476464">William White</p></li>
    312 <li id="idp476744"><p id="idp476872">Henry Potkin</p></li>
    313 <li id="idp477152"><p id="idp477280">Dennis Barnes</p></li>
    314 <li id="idp477560"><p id="idp477688">Joseph Borges</p></li>
    315 <li id="idp477968"><p id="idp478096">Doughan Gannes</p></li>
    316 <li id="idp478376"><p id="idp478504">William Tenche</p></li>
    317 <li id="idp478784"><p id="idp478912">Randall Latham</p></li>
    318 <li id="idp479192"><p id="idp479320">Thomas Hulme</p></li>
    319 <li id="idp479600"><p id="idp479728">Walter Myll</p></li>
    320 <li id="idp480000"><p id="idp480128">Richard Gilbert</p></li>
    321 <li id="idp480408"><p id="idp480536">Steven Pomarie</p></li>
    322 <li id="idp480816"><p id="idp480944">John Brocke</p></li>
    323 <li id="idp481216"><p id="idp481344">Bennet Harrye</p></li>
    324 <li id="idp481624"><p id="idp481752">James Stevenson</p></li>
    325 <li id="idp482032"><p id="idp482160">Charles Stevenson</p></li>
    326 <li id="idp482440"><p id="idp482568">Christopher Lowde</p></li>
    327 <li id="idp482848"><p id="idp482976">Jeremie Man</p></li>
    328 <li id="idp483248"><p id="idp483376">James Mason</p></li>
    329 <li id="idp483648"><p id="idp483776">David Salter</p></li>
    330 <li id="idp484056"><p id="idp484184">Richard Ireland</p></li>
    331 <li id="idp484464"><p id="idp484592">Thomas Bookener</p></li>
    332 <li id="idp484872"><p id="idp485000">William Philippes</p></li>
    333 <li id="idp485280"><p id="idp485408">Randall Mayne</p></li>
    334 <li id="idp485688"><p id="idp485816">Bennet Chappell</p></li>
    335 <li id="idp486096"><p id="idp486224">Richard Sare</p></li>
    336 <li id="idp486504"><p id="idp486632">James Lasie</p></li>
    337 <li id="idp486904"><p id="idp487032">Smolkin</p></li>
    338 <li id="idp487304"><p id="idp487432">Thomas Smart</p></li>
    339 <li id="idp487712"><p id="idp487840">Robert</p></li>
    340 <li id="idp488112"><p id="idp488240">John Evans</p></li>
    341 <li id="idp488512"><p id="idp488640">Roger Large</p></li>
    342 <li id="idp488912"><p id="idp489040">Humfrey Garden</p></li>
    343 <li id="idp489320"><p id="idp489448">Frauncis Whitton</p></li>
    344 <li id="idp489728"><p id="idp489856">Rowland Griffyn</p></li>
    345 <li id="idp490136"><p id="idp490264">William Millard</p></li>
    346 <li id="idp490544"><p id="idp490672">John Twyt</p></li>
    347 <li id="idp490944"><p id="idp491072">Edwarde Seklemore</p></li>
    348 <li id="idp491352"><p id="idp491480">John Anwike</p></li>
    349 <li id="idp491752"><p id="idp491880">Christopher Marshall</p></li>
    350 <li id="idp492168"><p id="idp492296">David Williams</p></li>
    351 <li id="idp492576"><p id="idp492704">Nicholas Swabber</p></li>
    352 <li id="idp492984"><p id="idp493112">Edward Chipping</p></li>
    353 <li id="idp493392"><p id="idp493520">Sylvester Beching</p></li>
    354 <li id="idp493800"><p id="idp493928">Vincent Cheyne</p></li>
    355 <li id="idp494208"><p id="idp494336">Haunce Walters</p></li>
    356 <li id="idp494616"><p id="idp494744">Edward Barecombe</p></li>
    357 <li id="idp495024"><p id="idp495152">Thomas Skevelabs</p></li>
    358 <li id="idp495432"><p id="idp495560">William Walters</p></li>
     437         </p>
     438<p id="idp694048"> Depending on the features of a particular processor architecture, there are a number
     439            of algorithms for transposition to parallel bit stream form. Several of these algorithms
     440            employ a three-stage structure. In the first stage, the input byte stream is divided
     441            into a pair of half-length streams consisting of four bits for each byte, for example,
     442            one stream for the high nybble of each byte and another for the low nybble of each byte.
     443            In the second stage, these streams of four bits per byte are each divided into streams
     444            consisting of two bits per original byte, for example streams for the
     445            <code class="code">Bit0/Bit1</code>, <code class="code">Bit2/Bit3</code>, <code class="code">Bit4/Bit5</code>, and
     446               <code class="code">Bit6/Bit7</code> pairs. In the final stage, the streams are further subdivided
     447            in the individual bit streams. </p>
     448<p id="idp695744"> Using SIMD capabilities, this process is quite efficient, with an amortized cost of
     449            1.1 CPU cycles per input byte on Intel Core 2 with SSE, or 0.6 CPU cycles per input byte
     450            on Power PC G4 with Altivec. With future advances in processor technology, this
     451            transposition overhead is expected to reduce, possibly taking advantage of upcoming
     452            parallel extract (pex) instructions on Intel technology. In the ideal, only 24
     453            instructions are needed to transform a block of 128 input bytes using 128-bit SSE
     454            registers using the inductive doubling instruction set architecture, representing an
     455            overhead of less than 0.2 instructions per input byte. </p>
     456</div>
     457<div class="section" id="idp696728">
     458<h3 class="title" style="clear: both">General Streams</h3>
     459<p id="idp697048">This section describes bit streams which support basic processing operations.</p>
     460<div class="section" id="idp697328">
     461<h4 class="title" style="clear: both">Deletion Mask Streams</h4>
     462<p id="idp697648">DelMask (deletion mask) streams marks character code unit positions for deletion.
     463               Since the deletion operation is dependency free across many stages of XML processing,
     464               it is possible to simply mark and record deletion positions as deletion mask streams for future processing. A single
     465               invocation of a SIMD based parallel deletion algorithm can then perform the deletion of
     466               positions accumulated across a number of stages through a bitwise ORing of deletion
     467               masks. For example, deletion arises in the replacement of predefined entities with a
     468               single character, such as in the replacement of the &amp;amp; entity, with the
     469               &amp; character. Deletion also arises in XML
     470               end-of-line handling, and CDATA section delimeter processing. Several algorithms to
     471               delete bits at positions marked by DelMask are possible [<a class="xref" href="#u8u16" id="idp699152">Cameron 2008</a>]. </p>
     472<p id="idp699464">The following table provides an example of generating a DelMask in the context of
     473               bit stream based parsing of well-formed character references and predefined entities.
     474               The result is the generation of a DelMask stream. <div class="table-wrapper" id="idp699848">
     475<p class="title">Table II</p>
     476<div class="caption"><p id="idp700104">DelMask Stream Generation</p></div>
     477<table class="table">
     478<colgroup span="1">
     479<col align="left" valign="top" span="1">
     480<col align="left" valign="top" span="1">
     481</colgroup>
     482<tbody>
     483<tr valign="top">
     484<td>Input Data</td>
     485<td>
     486                           <code class="code">&amp;gt; &amp;#13; &amp;#x0a;</code>
     487                        </td>
     488</tr>
     489<tr valign="top">
     490<td>GenRefs</td>
     491<td>
     492                           <code class="code">_11______________</code>
     493                        </td>
     494</tr>
     495<tr valign="top">
     496<td>DecRefs</td>
     497<td>
     498                           <code class="code">_______11________</code>
     499                        </td>
     500</tr>
     501<tr valign="top">
     502<td>HexRefs</td>
     503<td>
     504                           <code class="code">______________11_</code>
     505                        </td>
     506</tr>
     507<tr valign="top">
     508<td>DelMask</td>
     509<td>
     510                           <code class="code">111__1111__11111_</code>
     511                        </td>
     512</tr>
     513<tr valign="top">
     514<td>ErrorFlag</td>
     515<td>
     516                           <code class="code">_________________</code>
     517                        </td>
     518</tr>
     519</tbody>
     520</table>
     521</div>
     522            </p>
     523</div>
     524<div class="section" id="idp707448">
     525<h4 class="title" style="clear: both">Error Flag Streams </h4>
     526<p id="idp707768">Error flag streams indicates the character code unit positions of syntactical
     527               errors. XML processing examples which benefit from the marking of error positions
     528               include UTF-8 character sequence validation and XML parsing [<a class="xref" href="#u8u16" id="idp708152">Cameron 2008</a>].</p>
     529<p id="idp708464">The following table provides an example of using bit streams to parse character
     530               references and predefined entities which fail to meet the XML 1.0 well-formedness
     531               constraints. The result is the generation of an error flag stream that marks the
     532               positions of mal-formed decimal and hexical character references respectively. <div class="table-wrapper" id="idp708968">
     533<p class="title">Table III</p>
     534<div class="caption"><p id="idp709224">Error Flag Stream Generation</p></div>
     535<table class="table">
     536<colgroup span="1">
     537<col align="left" valign="top" span="1">
     538<col align="left" valign="top" span="1">
     539</colgroup>
     540<tbody>
     541<tr valign="top">
     542<td>Input Data</td>
     543<td>
     544                           <code class="code">&amp;gt; &amp;#, &amp;#x; </code>
     545                        </td>
     546</tr>
     547<tr valign="top">
     548<td>GenRefs</td>
     549<td>
     550                           <code class="code">_11___________</code>
     551                        </td>
     552</tr>
     553<tr valign="top">
     554<td>DecRefs</td>
     555<td>
     556                           <code class="code">______________</code>
     557                        </td>
     558</tr>
     559<tr valign="top">
     560<td>HexRefs</td>
     561<td>
     562                           <code class="code">______________</code>
     563                        </td>
     564</tr>
     565<tr valign="top">
     566<td>DelMask</td>
     567<td>
     568                           <code class="code">111__11__111__</code>
     569                        </td>
     570</tr>
     571<tr valign="top">
     572<td>ErrorFlag</td>
     573<td>
     574                           <code class="code">_______1____1_</code>
     575                        </td>
     576</tr>
     577</tbody>
     578</table>
     579</div>
     580            </p>
     581</div>
     582</div>
     583<div class="section" id="idp716704">
     584<h3 class="title" style="clear: both">Lexical Item Streams</h3>
     585<p id="idp717056">Lexical item streams differ from traditional streams of tokens in that they are bit
     586            streams that mark the positions of tokens, whitespace or delimiters. Additional bit
     587            streams, such as the reference streams and callout streams, are subsequently constructed
     588            based on the information held within the set of lexical items streams. Differentiation
     589            between the actual tokens that may occur at a particular point (e.g., the different XML
     590            tokens that begin “&lt;”) may be performed using multicharacter recognizers on the
     591            bytestream representation [<a class="xref" href="#CASCON08" id="idp718216">Cameron, Herdy and Lin 2008</a>].</p>
     592<p id="idp718568">A key role of lexical item streams in XML parsing is to facilitate fast scanning
     593            operations. For example, a left angle bracket lexical item stream may be formed to
     594            identify those character code unit positions at which a “&lt;” character occurs.
     595            Hardware register bit scan operations may then be used by the XML parser on the left
     596            angle bracket stream to efficiently identify the position of the next “&lt;”. Based
     597            on the capabilities of current commodity processors, a single register bit scan
     598            operation may effectively scan up to 64 byte positions with a single instruction.</p>
     599<p id="idp720152">Overall, the construction of the full set of lexical item stream computations
     600            requires approximately 1.0 CPU cycles per byte when implemented for 128 positions at a
     601            time using 128-bit SSE registers on Intel Core2 processors [<a class="xref" href="#CASCON08" id="idp720536">Cameron, Herdy and Lin 2008</a>].
     602            The following table defines the core lexical item streams defined by the Parabix XML
     603            parser.</p>
     604<p id="idp721048">
     605            <div class="table-wrapper" id="idp721176">
     606<p class="title">Table IV</p>
     607<div class="caption"><p id="idp721432">Lexical item stream descriptions.</p></div>
     608<table class="table"><tbody>
     609<tr>
     610<td align="left"> LAngle </td>
     611<td align="left"> Marks the position of any left angle bracket character.</td>
     612</tr>
     613<tr>
     614<td align="left"> RAngle </td>
     615<td align="left"> Marks the position of any right angle bracket character.</td>
     616</tr>
     617<tr>
     618<td align="left"> LBracket </td>
     619<td align="left"> Marks the position of any left square bracker character.</td>
     620</tr>
     621<tr>
     622<td align="left"> RBracket </td>
     623<td align="left"> Marks the position of any right square bracket
     624                     character.</td>
     625</tr>
     626<tr>
     627<td align="left"> Exclam </td>
     628<td align="left"> Marks the position of any exclamation mark character.</td>
     629</tr>
     630<tr>
     631<td align="left"> QMark </td>
     632<td align="left"> Marks the position of any question mark character.</td>
     633</tr>
     634<tr>
     635<td align="left"> Hyphen </td>
     636<td align="left"> Marks the position of any hyphen character.</td>
     637</tr>
     638<tr>
     639<td align="left"> Equals </td>
     640<td align="left"> Marks the position of any equal sign character.</td>
     641</tr>
     642<tr>
     643<td align="left"> SQuote </td>
     644<td align="left"> Marks the position of any single quote character.</td>
     645</tr>
     646<tr>
     647<td align="left"> DQuote </td>
     648<td align="left"> Marks the position of any double quote character.</td>
     649</tr>
     650<tr>
     651<td align="left"> Slash </td>
     652<td align="left"> Marks the position of any forward slash character</td>
     653</tr>
     654<tr>
     655<td align="left"> NameScan </td>
     656<td align="left"> Marks the position of any XML name character.</td>
     657</tr>
     658<tr>
     659<td align="left"> WS </td>
     660<td align="left"> Marks the position of any XML 1.0 whitespace character.</td>
     661</tr>
     662<tr>
     663<td align="left"> PI_start </td>
     664<td align="left"> Marks the position of the start of any processing instruction
     665                        at the '?' character position.</td>
     666</tr>
     667<tr>
     668<td align="left"> PI_end </td>
     669<td align="left"> Marks the position of any end of any processing instruction
     670                        at the '&gt;' character position.</td>
     671</tr>
     672<tr>
     673<td align="left"> CtCD_start </td>
     674<td align="left"> Marks the position of the start of any comment or CDATA
     675                        section at the '!' character position.</td>
     676</tr>
     677<tr>
     678<td align="left"> EndTag_start </td>
     679<td align="left"> Marks the position of any end tag at the '/' character
     680                        position.</td>
     681</tr>
     682<tr>
     683<td align="left"> CD_end </td>
     684<td align="left"> Marks the position of the end of any CDATA section at the '&gt;'
     685                        character position. </td>
     686</tr>
     687<tr>
     688<td align="left"> DoubleHyphen </td>
     689<td align="left"> Marks the position of any double hyphen character.</td>
     690</tr>
     691<tr>
     692<td align="left"> RefStart </td>
     693<td align="left"> Marks the position of any ampersand character.</td>
     694</tr>
     695<tr>
     696<td align="left"> Hash </td>
     697<td align="left"> Marks the position of any hash character.</td>
     698</tr>
     699<tr>
     700<td align="left"> x </td>
     701<td align="left"> Marks the position of any 'x' character.</td>
     702</tr>
     703<tr>
     704<td align="left"> Digit </td>
     705<td align="left"> Marks the position of any digit.</td>
     706</tr>
     707<tr>
     708<td align="left"> Hex </td>
     709<td align="left"> Marks the position of any hexidecimal character.</td>
     710</tr>
     711<tr>
     712<td align="left"> Semicolon </td>
     713<td align="left"> Marks the position of any semicolon character.</td>
     714</tr>
     715</tbody></table>
     716</div>
     717         </p>
     718<p id="idp745408"> The following illustrates a number of the lexical item streams. </p>
     719<p id="idp745672">
     720            <div class="table-wrapper" id="idp745800">
     721<p class="title">Table V</p>
     722<div class="caption"><p id="idp746056">Lexical Item Streams</p></div>
     723<table class="table">
     724<colgroup span="1">
     725<col align="left" valign="top" span="1">
     726<col align="left" valign="top" span="1">
     727</colgroup>
     728<tbody>
     729<tr valign="top">
     730<td>Input Data</td>
     731<td>
     732                        <code class="code">&lt;tag&gt;&lt;tag&gt; text &amp;lt;
     733                           &amp;#x3e; &lt;/tag&gt;&lt;/tag&gt;</code>
     734                     </td>
     735</tr>
     736<tr valign="top">
     737<td>LAngle</td>
     738<td>
     739                        <code class="code">1____1______________________1_____1_____</code>
     740                     </td>
     741</tr>
     742<tr valign="top">
     743<td>RAngle</td>
     744<td>
     745                        <code class="code">____1____1_______________________1_____1</code>
     746                     </td>
     747</tr>
     748<tr valign="top">
     749<td>WS</td>
     750<td>
     751                        <code class="code">__________1____1____1______1____________</code>
     752                     </td>
     753</tr>
     754<tr valign="top">
     755<td>RefStart</td>
     756<td>
     757                        <code class="code">________________1____1__________________</code>
     758                     </td>
     759</tr>
     760<tr valign="top">
     761<td>Hex</td>
     762<td>
     763                        <code class="code">__1____1____1___________11_____1_____1__</code>
     764                     </td>
     765</tr>
     766<tr valign="top">
     767<td>Semicolon</td>
     768<td>
     769                        <code class="code">___________________1______1_____________</code>
     770                     </td>
     771</tr>
     772<tr valign="top">
     773<td>Slash</td>
     774<td>
     775                        <code class="code">_____________________________1_____1____</code>
     776                     </td>
     777</tr>
     778</tbody>
     779</table>
     780</div>
     781         </p>
     782</div>
     783<div class="section" id="idp755744">
     784<h3 class="title" style="clear: both">UTF-8 Byte Classification, Scope and Validation Streams</h3>
     785<p id="idp756128"> An XML parser must accept the UTF-8 encoding of Unicode [<a class="xref" href="#XML10" id="idp756320">XML 1.0</a>].
     786            It is a fatal error if an XML document determined to be in UTF-8 contains byte sequences
     787            that are not legal in that encoding. UTF-8 byte classification, scope, XML character
     788            validation and error flag bit streams are defined to validate UTF-8 byte sequences and
     789            support transcoding to UTF-16.</p>
     790<div class="section" id="idp757104">
     791<h4 class="title" style="clear: both">UTF-8 Byte Classification Streams</h4>
     792<p id="idp757464">UTF-8 byte classification bit streams classify UTF-8 bytes based on their role in
     793               forming single and multibyte sequences. The u8Prefix and u8Suffix bit streams
     794               identify bytes that represent, respectively, prefix or suffix bytes of multibyte
     795               UTF-8 sequences. The u8UniByte bit stream identifies those bytes that may be
     796               considered single-byte sequences. The u8Prefix2, u8Prefix3, and u8Prefix4 refine the
     797               u8Prefix respectively indicating prefixes of two, three or four byte
     798            sequences respectively.</p>
     799</div>
     800<div class="section" id="idp758312">
     801<h4 class="title" style="clear: both">UTF-8 Scope Streams</h4>
     802<p id="idp758656"> Scope streams represent expectations established by UTF-8 prefix bytes. For
     803               example, the u8Scope22 bit stream represents the positions at which the second byte of a
     804               two-byte sequence is expected based on the occurrence of a two-byte prefix in the
     805               immediately preceding position. The u8scope32, u8Scope33, u8Scope42, u8scope43, and
     806               u8Scope44 complete the set of UTF-8 scope streams.</p>
     807<p id="idp477976"> The following example demonstrates the UTF-8 character encoding validation
     808               process using parallel bit stream techniques. The result of this validation process
     809               is an error flag stream identifying the positions at which errors occur.</p>
     810<p id="idp478440">
     811               <div class="table-wrapper" id="idp478568">
     812<p class="title">Table VI</p>
     813<div class="caption"><p id="idp478824">UTF-8 Scope Streams</p></div>
     814<table class="table">
     815<colgroup span="1">
     816<col align="left" valign="top" span="1">
     817<col align="left" valign="top" span="1">
     818</colgroup>
     819<tbody>
     820<tr valign="top">
     821<td>Input Data</td>
     822<td><code class="code">A Text in Farsi: ى ك  Ù
     823 ت ن  ف ا ر س ى</code></td>
     824</tr>
     825<tr valign="top">
     826<td>High Nybbles</td>
     827<td>
     828                           <code class="code">42567726624677632D8DBDBDAD82D8DAD82D8D8</code>
     829                        </td>
     830</tr>
     831<tr valign="top">
     832<td>Low Nybbles</td>
     833<td>
     834                           <code class="code">10458409E061239A099838187910968A9509399</code>
     835                        </td>
     836</tr>
     837<tr valign="top">
     838<td>u8Unibyte</td>
     839<td>
     840                           <code class="code">11111111111111111__________1______1____</code>
     841                           
     842                        </td>
     843</tr>
     844<tr valign="top">
     845<td>u8Prefix</td>
     846<td>
     847                           <code class="code">_________________1_1_1_1_1__1_1_1__1_1_</code>
     848                        </td>
     849</tr>
     850<tr valign="top">
     851<td>u8Suffix</td>
     852<td>
     853                           <code class="code">__________________1_1_1_1_1__1_1_1__1_1</code>
     854                        </td>
     855</tr>
     856<tr valign="top">
     857<td>u8Prefix2</td>
     858<td>
     859                           <code class="code">_________________1_1_1_1_1__1_1_1__1_1_</code>
     860                        </td>
     861</tr>
     862<tr valign="top">
     863<td>u8Scope22</td>
     864<td>
     865                           <code class="code">__________________1_1_1_1_1__1_1_1__1_1</code>
     866                           
     867                        </td>
     868</tr>
     869<tr valign="top">
     870<td>ErrorFlag</td>
     871<td>
     872                           <code class="code">_______________________________________</code>
     873                        </td>
     874</tr>
     875</tbody>
     876</table>
     877</div>
     878               
     879
     880            </p>
     881</div>
     882<div class="section" id="idp778672">
     883<h4 class="title" style="clear: both">UTF-8 Validation Streams</h4>
     884<p id="idp779024"> Proper formation of UTF-8 byte sequences requires that the correct number of
     885               suffix bytes always follow a UTF-8 prefix byte, and that certain illegal byte
     886               combinations are ruled out. For example, sequences beginning with the prefix bytes
     887               0xF5 through 0xFF are illegal as they would represent code point values above 10FFFF.
     888               In addition, there are constraints on the first suffix byte following certain special
     889               prefixes, namely that a suffix following the prefix 0xE0 must fall in the range
     890               0xA0–0xBF, a suffix following the prefix 0xED must fall in the range 0x80–0x9F, a
     891               suffix following the prefix 0xF0 must fall in the range 0x90–0xBF and a suffix
     892               following the prefix 0xF4 must fall in the range 0x80–0x8F. The task of ensuring that
     893               each of these constraints hold is known as UTF-8 validation. The bit streams xE0,
     894               xED, xF0, xF4, xA0_xBF, x80_x9F, x90_xBF, and x80_x8F are constructed to flag the
     895               aforementioned UTF-8 validation errors. The result of UTF-8 validation is a UTF-8
     896               error flag bit stream contructed as the ORing of a series of UTF-8 validation tests.
     897            </p>
     898</div>
     899<div class="section" id="idp781056">
     900<h4 class="title" style="clear: both">XML Character Validation Streams</h4>
     901<p id="idp781416">The UTF-8 character sequences <span class="ital">0xEF 0xBF 0xBF</span> and
     902                  <span class="ital">0xEF 0xBF 0xBE</span> correspond to the Unicode code points 0xFFFE
     903               and 0xFFFF respectively. In XML 1.0, 0xFFFE and 0xFFFF represent characters outside
     904               the legal XML character ranges. As such, bit streams which mark 0xEF, 0xBF, and 0xBE
     905               character are constructed to flag illegal UTF-8 character sequences. </p>
     906</div>
     907<div class="section" id="idp782512">
     908<h4 class="title" style="clear: both">UTF-8 to UTF-16 Transcoding</h4>
     909<p id="idp782864">UTF-8 is often preferred for storage and data exchange, it is suitable for
     910               processing, but it is significantly more complex to process than UTF-16 [<a class="xref" href="#Unicode" id="idp783160">Unicode</a>]. As such, XML documents are typically encoded in UTF-8 for
     911               serialization and transport, and subsequently transcoded to UTF-16 for processing
     912               with programming languages such as Java and C#. Following the parallel bit stream
     913               methods developed for the u8u16 transcoder, a high-performance standalone UTF-8 to
     914               UTF-16 transcoder [<a class="xref" href="#u8u16" id="idp783920">Cameron 2008</a>], transcoding to UTF-16 may be achieved by
     915               computing a series of 16 bit streams. One stream for each of the individual bits of a
     916               UTF-16 code unit. </p>
     917<p id="idp784488">The bit streams for UTF-16 are conveniently divided into groups: the eight streams
     918               u16Hi0, u16Hi1, ..., u16Hi7 for the high byte of each UTF-16 code unit and the eight
     919               streams u16Lo1, ..., u16Lo7 for the low byte. Upon conversion of the parallel bit
     920               stream data back to byte streams, eight sequential byte streams U16h0, U16h1, ...,
     921               U16Hi7 are used for the high byte of each UTF-16 code unit, while U16Lo0, U16Lo1,...,
     922               U16Lo7 are used for the corresponding low byte. Interleaving these streams then
     923               produces the full UTF-16 doublebyte stream.</p>
     924</div>
     925<div class="section" id="idp785384">
     926<h4 class="title" style="clear: both">UTF-8 Indexed UTF-16 Streams</h4>
     927<p id="idp785744">UTF-16 bit streams are initially defined in UTF-8 indexed form. That is, with sets
     928               of bits in one-to-one correspondence with UTF-8 bytes. However, only one set of
     929               UTF-16 bits is required for encoding two or three-byte UTF-8 sequences and only two
     930               sets are required for surrogate pairs corresponding to four-byte UTF-8 sequences. The
     931               u8LastByte (u8UniByte , u8Scope22 , u8Scope33 , and u8Scope44 ) and u8Scope42 streams
     932               mark the positions at which the correct UTF-16 bits are computed. The bit sets at
     933               other positions must be deleted to compress the streams to the UTF-16 indexed form.
     934            </p>
     935</div>
     936</div>
     937<div class="section" id="idp786760">
     938<h3 class="title" style="clear: both">Control Character Streams</h3>
     939<p id="idp787112">The control character bit streams marks ASCII control characters in the range
     940            0x00-0x1F. Additional control character bit streams mark the tab, carriage return, line
     941            feed, and space character. In addition, a bit stream to mark carriage return line
     942            combinations is also constructed. Presently, control character bit streams support the
     943            operations of XML 1.0 character validation and XML end-of-line handling.</p>
     944<div class="section" id="idp787768">
     945<h4 class="title" style="clear: both">XML Character Validation</h4>
     946<p id="idp788120">Legal characters in XML are the tab, carriage return, and line feed characters,
     947               together with all Unicode characters and excluding the surrogate blocks, as well as hexadecimal OxFFFE and
     948               OxFFFF [<a class="xref" href="#XML10" id="idp788480">XML 1.0</a>]. The x00_x1F bit stream is constructed and used in
     949               combination with the additional control character bit streams to flags the positions
     950               of illegal control characters.</p>
     951</div>
     952<div class="section" id="idp789136">
     953<h4 class="title" style="clear: both">XML 1.0 End-of-line Handling</h4>
     954<p id="idp789496">In XML 1.0 the two-character sequence CR LF (carriage return, line feed) as well as
     955               any CR character not followed by a LF character must be converted to a single LF
     956               character [<a class="xref" href="#XML10" id="idp789840">XML 1.0</a>].</p>
     957<p id="idp790224">By defining carriage return, line feed, and carriage return line feed bit streams,
     958               dentoted CR, LF and CRLF respectively, end-of-line normalization processing can be
     959               performed in parallel using only a small number of logical and shift operations.</p>
     960<p id="idp790704"></p>
     961<p id="idp790832">The following example demonstrates the generation of the CRLF deletion mask. In
     962               this example, the position of all CR characters followed by LF characters are marked
     963               for deletion. Isolated carriage returns are then replaced with LF characters.
     964               Completion of this process satisfies the XML 1.0 end-of-line handling requirements.
     965               For clarity, this example encodes input data carriage returns as
     966               <span class="ital">C</span> characters, whereas line feed characters are shown as
     967                  <span class="ital">L</span> characters.</p>
     968<p id="idp792008">
     969               <div class="table-wrapper" id="idp792136">
     970<p class="title">Table VII</p>
     971<div class="caption"><p id="idp792392">XML 1.0 End-of-line Handling</p></div>
     972<table class="table">
     973<colgroup span="1"><col align="left" valign="top" span="1"></colgroup>
     974<tbody>
     975<tr valign="top">
     976<td>Input Data</td>
     977<td>
     978                           <code class="code">first line C second line CL third line L one more C nothing
     979                           left</code>
     980                        </td>
     981</tr>
     982<tr valign="top">
     983<td>CR</td>
     984<td>
     985                           <code class="code">-----------1-------------1------------------------1-------------</code>
     986                        </td>
     987</tr>
     988<tr valign="top">
     989<td>LF</td>
     990<td>
     991                           <code class="code">--------------------------1------------1------------------------</code>
     992                        </td>
     993</tr>
     994<tr valign="top">
     995<td>DelMask</td>
     996<td>
     997                           <code class="code">--------------------------1-------------------------------------</code>
     998                        </td>
     999</tr>
     1000</tbody>
     1001</table>
     1002</div>
     1003
     1004            </p>
     1005</div>
     1006</div>
     1007<div class="section" id="idp797864">
     1008<h3 class="title" style="clear: both">Call Out Streams</h3>
     1009<p id="idp798208"> Call out bit streams mark the extents of XML markup structures such as comments,
     1010            processing instruction and CDATA sections as well as physical structures such as character and
     1011            entity references and general references.  Call out streams are also formed for logical markup structures such
     1012            start tags, end tags and empty element tags. </p>
     1013<div class="section" id="idp798776">
     1014<h4 class="title" style="clear: both">Comment, Processing Instruction and CDATA Section Call Out Streams</h4>
     1015<p id="idp799168">Comments, processing instructions and CDATA sections call out streams, Ct_Span,
     1016               PI_Span and CD_Span respectively, define sections of an XML document which
     1017               contain markup that is not interpreted by an XML processor. As such, the union of
     1018               Ct_Span, PI_Span and CD_Span streams defines the regions of non-interpreteable markup.
     1019               The stream formed by this union is termed the CtCDPI_Mask.</p>
     1020<p id="idp799816">The following tables provides an example of constructing the CtCDPI_Mask. </p>
     1021<div class="table-wrapper" id="idp800088">
     1022<p class="title">Table VIII</p>
     1023<div class="caption"><p id="idp800344">CtCDPI Mask Generation</p></div>
     1024<table class="table">
     1025<colgroup span="1">
     1026<col align="left" valign="top" span="1">
     1027<col align="left" valign="top" span="1">
     1028</colgroup>
     1029<tbody>
     1030<tr valign="top">
     1031<td>Input Data</td>
     1032<td><code class="code">&lt;?php?&gt; &lt;!-- example --&gt; &lt;![CDATA[ shift: a&lt;&lt;1 ]]&gt;</code></td>
     1033</tr>
     1034<tr valign="top">
     1035<td>CD_Span</td>
     1036<td><code class="code">___________________________1111111111111111111111_</code></td>
     1037</tr>
     1038<tr valign="top">
     1039<td>Ct_Span</td>
     1040<td><code class="code">___________111111111111___________________________</code></td>
     1041</tr>
     1042<tr valign="top">
     1043<td>PI_Span</td>
     1044<td><code class="code">_11111____________________________________________</code></td>
     1045</tr>
     1046<tr valign="top">
     1047<td>CtCDPI_Mask</td>
     1048<td><code class="code">_111111__111111111111111__111111111111111111111111</code></td>
     1049</tr>
     1050<tr valign="top">
     1051<td>ErrorFlag</td>
     1052<td><code class="code">__________________________________________________</code></td>
     1053</tr>
     1054</tbody>
     1055</table>
     1056</div>
     1057<p id="idp805984"> With the removal of all non-interpreteable markup, several phases of parallel bit
     1058               stream based SIMD operations may follow operating on up to 128 byte positions on
     1059               current commondity processors and assured of XML markup relevancy. For
     1060               example, with the extents identification of comments, processing instructions and
     1061               CDATA secions, XML names may be identified and length sorted for efficient symbol
     1062               table construction. </p>
     1063<p id="idp806680"> As an aside, comments and CDATA sections must first be validated to ensure
     1064               that comments do not contain "--" sequences and that CDATA sections do not contain illegal
     1065               "]]&gt;" sequences prior to ignorable markup stream generation.</p>
     1066</div>
     1067<div class="section" id="idp807344">
     1068<h4 class="title" style="clear: both">Reference Call Out Streams</h4>
     1069<p id="idp807696">The reference call out streams are the GenRefs, DecRefs, and HexRefs streams. This
     1070               subset of the call out streams marks the extents of all but the closing semicolon of
     1071               general and character references.</p>
     1072<p id="idp808128">Predefined character
     1073               (&amp;lt;,&amp;gt;,&amp;amp;,&amp;apos;,&amp;quot;) and numeric character
     1074               references (&amp;#nnnn;, &amp;#xhhhh;) must be replaced by a single character
     1075                  [<a class="xref" href="#XML10" id="idp808304">XML 1.0</a>]. As previously shown, this subset of call out streams enables the construction of a DelMask for
     1076               references.</p>
     1077</div>
     1078<div class="section" id="idp809320">
     1079<h4 class="title" style="clear: both">Tag Call Out Streams</h4>
     1080<p id="idp809672">Whereas sequential bit scans over lexical item streams form the basis of XML
     1081               parsing, in the current Parabix parser a new method of parallel parsing has been
     1082               developed and prototyped using the concept of bitstream addition. Fundamental to this
     1083               method is the concept of a <span class="ital">cursor</span> stream, a bit stream marking
     1084               the positions of multiple parallel parses currently in process. </p>
     1085<p id="idp810520">The results of parallel parsing using the bit stream addition technique produces a
     1086               set of tag call out bit streams. These streams mark the extents of each start tag,
     1087               end tag and empty element tag. Within tags, additional streams mark start
     1088               and end positions for tag names, as well as attribute names and values. An error flag
     1089               stream marks the positions of any syntactic errors encountered during parsing.</p>
     1090<p id="idp811648"> The set of tag call out streams consists of the ElemNames, AttNames, AttVals, Tags,
     1091               EmptyTagEnds and EndTags bit streams. The following example demonstrates the bit
     1092               stream output produced which from parallel parsing using bit stream addition. </p>
     1093<div class="table-wrapper" id="idp812120">
     1094<p class="title">Table IX</p>
     1095<div class="caption"><p id="idp812376">Tag Call Out Streams</p></div>
     1096<table class="table">
     1097<colgroup span="1">
     1098<col align="left" valign="top" span="1">
     1099<col align="left" valign="top" span="1">
     1100</colgroup>
     1101<tbody>
     1102<tr valign="top">
     1103<td>Input Data</td>
     1104<td>
     1105                        <code class="code">&lt;root&gt;&lt;t1&gt;text&lt;/t1&gt;&lt;t2
     1106                           a1='foo' a2 =
     1107                           'fie'&gt;more&lt;/t2&gt;&lt;tag3
     1108                           att3='b'/&gt;&lt;/root&gt;</code>
     1109                     </td>
     1110</tr>
     1111<tr valign="top">
     1112<td>ElemNames</td>
     1113<td>
     1114                        <code class="code">_1111__11___________11_______________________________1111__________________</code>
     1115                     </td>
     1116</tr>
     1117<tr valign="top">
     1118<td>AttNames</td>
     1119<td>
     1120                        <code class="code">_______________________11_______11________________________1111_____________</code>
     1121                     </td>
     1122</tr>
     1123<tr valign="top">
     1124<td>AttrVals</td>
     1125<td>
     1126                        <code class="code">__________________________11111______11111_____________________111_________</code>
     1127                     </td>
     1128</tr>
     1129<tr valign="top">
     1130<td>EmptyTagEnds</td>
     1131<td>
     1132                        <code class="code">___________________________________________________________________1_______</code>
     1133                     </td>
     1134</tr>
     1135<tr valign="top">
     1136<td>EndTags</td>
     1137<td>
     1138                        <code class="code">_______________111______________________________111__________________11111_</code>
     1139                     </td>
     1140</tr>
     1141<tr valign="top">
     1142<td>Start/EmptyTags</td>
     1143<td>
     1144                        <code class="code">_1111__11___________1111111111111111111111___________11111111111111________</code>
     1145                     </td>
     1146</tr>
     1147<tr valign="top">
     1148<td>ErrorFlag</td>
     1149<td>
     1150                        <code class="code">___________________________________________________________________________</code>
     1151                     </td>
     1152</tr>
     1153</tbody>
     1154</table>
     1155</div>
     1156</div>
     1157</div>
     1158</div>
     1159<div class="section" id="idp822632">
     1160<h2 class="title" style="clear: both">SIMD Beyond Bitstreams: Names and Numbers</h2>
     1161<p id="idp822952">Whereas the fundamental innovation of our work is the use of SIMD technology in
     1162         implementing parallel bit streams for XML, there are also important ways in which more
     1163         traditional byte-oriented SIMD operations can be useful in accelerating other aspects of
     1164         XML processing.</p>
     1165<div class="section" id="idp823448">
     1166<h3 class="title" style="clear: both">Name Lookup</h3>
     1167<p id="idp823784">Efficient symbol table mechanisms for looking up element and attribute names is
     1168            important for almost all XML processing applications. It is also an important technique
     1169            merely for assessing well-formedness of an XML document; rather than validating the
     1170            character-by-character composition of each occurrence of an XML name as it is
     1171            encountered, it is more efficient to validate all but the first occurrence by first
     1172            determining whether the name already exists in a table of prevalidated names.</p>
     1173<p id="idp825088">The first symbol table mechanism deployed in the Parabix parser simply used the
     1174            hashmaps of the C++ standard template library, without deploying any SIMD technology.
     1175            However, with the overhead of character validation, transcoding and parsing dramatically
     1176            reduced by parallel bit stream technology, we found that symbol lookups then accounted
     1177            for about half of the remaining execution time in a statistics gathering application
     1178               [<a class="xref" href="#CASCON08" id="idp825712">Cameron, Herdy and Lin 2008</a>]. Thus, symbol table processing was identified as a major
     1179            target for further performance improvement. </p>
     1180<p id="idp826232"> Our first effort to improve symbol table performance was to employ the splash tables
     1181            with cuckoo hashing as described by Ross [<a class="xref" href="#Ross06" id="idp826512">Ross 2006</a>], using SIMD
     1182            technology for parallel bucket processing. Although this technique did turn out to have
     1183            the advantage of virtually constant-time performance even for very large vocabularies,
     1184            it was not particularly helpful for the relatively small vocabularies typically found in
     1185            XML document processing. </p>
     1186<p id="idp827288"> However, a second approach has been found to be quite useful, taking advantage of
     1187            parallel bit streams for cheap determination of symbol length. In essence, the length of
     1188            a name can be determined very cheaply using a single bit scan operation. This then makes
     1189            it possible to use length-sorted symbol table processing, as follows. First, the
     1190            occurrences of all names are stored in arrays indexed by length. Then the length-sorted
     1191            arrays may each be inserted into the symbol table in turn. The advantage of this is that
     1192            a separate loop may be written for each length. Length sorting makes for very efficient
     1193            name processing. For example hash value computations and name comparisons can be made by
     1194            loading multibyte values and performing appropriate shifting and masking operations,
     1195            without the need for a byte-at-a-time loop. In initial experiments, this length-sorting
     1196            approach was found to reduce symbol lookup cost by a factor of two. </p>
     1197<p id="idp828544"> Current research includes the application of SIMD technology to further enhance the
     1198            performance of length-sorted lookup. We have identified a promising technique for
     1199            parallel processing of multiple name occurrences using a parallel trie lookup technique.
     1200            Given an array of occurrences of names of a particular length, the first one, two or
     1201            four bytes of each name are gathered and stored in a linear array. SIMD techniques are
     1202            then used to compare these prefixes with the possible prefixes for the current position
     1203            within the trie. In general, a very small number of possibilities exist for each trie
     1204            node, allowing for fast linear search through all possibilities. Typically, the
     1205            parallelism is expected to exceed the number of possibilities to search through at each
     1206            node. With length-sorting to separate the top-level trie into many small subtries, we
     1207            expect only a single step of symbol lookup to be needed in most practical instances. </p>
     1208<p id="idp829808">The gather step of this algorithm is actually a common technique in SIMD processing.
     1209            Instruction set support for gather operations is a likely future direction for SIMD
     1210            technology.</p>
     1211</div>
     1212<div class="section" id="idp830280">
     1213<h3 class="title" style="clear: both">Numeric Processing</h3>
     1214<p id="idp830624"> Many XML applications involve numeric data fields as attribute values or element
     1215            content. Although most current XML APIs uniformly return information to applications in
     1216            the form of character strings, it is reasonable to consider direct API support for
     1217            numeric conversions within a high-performance XML engine. With string to numeric
     1218            conversion such a common need, why leave it to application programmers? </p>
     1219<p id="idp831280"> High-performance string to numeric conversion using SIMD operations also can
     1220            considerably outperform the byte-at-a-time loops that most application programmers or
     1221            libraries might employ. A first step is reduction of ASCII bytes to corresponding
     1222            decimal nybbles using a SIMD packing operation. Then an inductive doubling algorithm
     1223            using SIMD operations may be employed. First, 16 sets of adjacent nybble values in the
     1224            range 0-9 can be combined in just a few SIMD operations to 16 byte values in the range
     1225            0-99. Then 8 sets of byte values may similarly be combined with further SIMD processing
     1226            to produce doublebyte values in the range 0-9999. Further combination of doublebyte
     1227            values into 32-bit integers and so on can also be performed using SIMD operations. </p>
     1228<p id="idp832336"> Using appropriate gather operations to bring numeric strings into appropriate array
     1229            structures, an XML engine could offer high-performance numeric conversion services to
     1230            XML application programmers. We expect this to be an important direction for our future
     1231            work, particularly in support of APIs that focus on direct conversion of XML data into
     1232            business objects. </p>
     1233</div>
     1234</div>
     1235<div class="section" id="idp833080">
     1236<h2 class="title" style="clear: both">APIs and Parallel Bit Streams</h2>
     1237<div class="section" id="idp833440">
     1238<h3 class="title" style="clear: both">The ILAX Streaming API</h3>
     1239<p id="idp833792">The In-Line API for XML (ILAX) is the base API provided with the Parabix parser. It
     1240            is intended for low-level extensions compiled right into the engine, with minimum
     1241            possible overhead. It is similar to streaming event-based APIs such as SAX, but
     1242            implemented by inline substitution rather than using callbacks. In essence, an extension
     1243            programmer provides method bodies for event-processing methods declared internal to the
     1244            Parabix parsing engine, compiling the event processing code directly with the core code
     1245            of the engine. </p>
     1246<p id="idp834592"> Although ILAX can be used directly for application programming, its primary use is
     1247            for implementing engine extensions that support higher-level APIs. For example, the
     1248            implementation of C or C++ based streaming APIs based on the Expat [<a class="xref" href="#Expat" id="idp834992">Expat</a>] or general SAX models can be quite directly implemented. C/C++ DOM
     1249            or other tree-based APIs can also be fairly directly implemented. However, delivering
     1250            Parabix performance to Java-based XML applications is challenging due to the
     1251            considerable overhead of crossing the Java Native Interface (JNI) boundary. This issue
     1252            is addressed with the Array Set Model (ASM) concept discussed in the following section. </p>
     1253<p id="idp835872"> With the recent development of parallel parsing using bitstream addition, it is
     1254            likely that the underlying ILAX interface of Parabix will change. In essence, ILAX
     1255            suffers the drawback of all event-based interfaces: they are fundamentally sequential in
     1256            number. As research continues, we expect efficient parallel methods building on parallel
     1257            bit stream foundations to move up the stack of XML processing requirements. Artificially
     1258            imposing sequential processing is thus expected to constrain further advances in XML
     1259            performance. </p>
     1260</div>
     1261<div class="section" id="idp836736">
     1262<h3 class="title" style="clear: both">Efficient XML in Java Using Array Set Models</h3>
     1263<p id="idp837112"> In our GML-to-SVG case study, we identified the lack of high-performance XML
     1264            processing solutions for Java to be of particular interest. Java byte code does not
     1265            provide access to the SIMD capabilities of the underlying machine architecture. Java
     1266            just-in-time compilers might be capable of using some SIMD facilities, but there is no
     1267            real prospect of conventional compiler technology translating byte-at-a-time algorithms
     1268            into parallel bit stream code. So the primary vehicle for delivering high-performance
     1269            XML processing is to call native parallel bit stream code written in C through JNI
     1270            capabilities. </p>
     1271<p id="idp838000">However, each JNI call is expensive, so it is desirable to minimize the number of
     1272            calls and get as much work done during each call as possible. This mitigates against
     1273            direct implementation of streaming APIs in Java through one-to-one mappings to an
     1274            underlying streaming API in C. Instead, we have concentrated on gathering information on
     1275            the C side into data structures that can then be passed to the Java side. However, using
     1276            either C pointer-based structures or C++ objects is problematic because these are
     1277            difficult to interpret on the Java side and are not amenable to Java's automatic storage
     1278            management system. Similarly, Java objects cannot be conveniently created on the C side.
     1279            However, it is possible to transfer arrays of simple data values (bytes or integers)
     1280            between C and Java, so that makes a reasonable focus for bulk data communication between
     1281            C and Java. </p>
     1282<p id="idp839192"><span class="ital">Array Set Models</span> are array-based representations of information
     1283            representing an XML document in accord with XML InfoSet [<a class="xref" href="#InfoSet" id="idp839600">XML Infoset</a>] or
     1284            other XML data models relevant to particular APIs. As well as providing a mechanism for
     1285            efficient bulk data communication across the JNI boundary, ASMs potentially have a
     1286            number of other benefits in high-performance XML processing. <div class="itemizedlist" id="idp840256"><ul>
     1287<li id="idp840384"><p id="idp840512">Prefetching. Commodity processors commonly support hardware and/or software
     1288                     prefetching to ensure that data is available in a processor cache when it is
     1289                     needed. In general, prefetching is most effective in conjunction with the
     1290                     continuous sequential memory access patterns associated with array
     1291                  processing.</p></li>
     1292<li id="idp841160"><p id="idp841288">DMA. Some processing environments provide Direct Memory Access (DMA)
     1293                     controllers for block data movement in parallel with computation. For example,
     1294                     the Cell Broadband Engine uses DMA controllers to move the data to and from the
     1295                     local stores of the synergistic processing units. Arrays of contiguous data
     1296                     elements are well suited to bulk data movement using DMA.</p></li>
     1297<li id="idp842000"><p id="idp842128">SIMD. Single Instruction Multiple Data (SIMD) capabilities of modern
     1298                     processor instruction sets allow simultaneous application of particular
     1299                     instructions to sets of elements from parallel arrays. For effective use of
     1300                     SIMD capabilities, an SoA (Structure of Arrays) model is preferrable to an AoS
     1301                     (Array of Structures) model. </p></li>
     1302<li id="idp842800"><p id="idp842928">Multicore processors. Array-oriented processing can enable the effective
     1303                     distribution of work to the individual cores of a multicore system in two
     1304                     distinct ways. First, provided that sequential dependencies can be minimized or
     1305                     eliminated, large arrays can be divided into separate segments to be processed
     1306                     in parallel on each core. Second, pipeline parallelism can be used to implement
     1307                     efficient multipass processing with each pass consisting of a processing kernel
     1308                     with array-based input and array-based output. </p></li>
     1309<li id="idp844472"><p id="idp844600">Streaming buffers for large XML documents. In the event that an XML document
     1310                     is larger than can be reasonably represented entirely within processor memory,
     1311                     a buffer-based streaming model can be applied to work through a document using
     1312                     sliding windows over arrays of elements stored in document order. </p></li>
    3591313</ul></div>
    360 </div>
    361 <div class="appendix" id="mul88">
    362 <h2 class="title" style="clear: both">Appendix B. The Roanoke Colony, 1587 <sup class="fn-label"><a href="#mul4" class="footnoteref" id="mul4-ref">[4]</a></sup>
    363     </h2>
    364 <div class="itemizedlist" id="idp498112"><ul>
    365 <li id="idp498240">
    366 <p id="idp498368">Men</p>
    367 <div class="itemizedlist" id="idp498560"><ul>
    368 <li id="idp498688"><p id="idp498816">John White [Governor]</p></li>
    369 <li id="idp499104"><p id="idp499232">Roger Bailie [Assistant]</p></li>
    370 <li id="idp499520"><p id="idp499648">Ananias Dare [Assistant]</p></li>
    371 <li id="idp499936"><p id="idp500064">Christopher Cooper [Assistant]</p></li>
    372 <li id="idp500360"><p id="idp500488">Thomas Stevens [Assistant]</p></li>
    373 <li id="idp500776"><p id="idp500904">John Sampson [Assistant]</p></li>
    374 <li id="idp501192"><p id="idp501320">Dyonis Harvie [Assistant]</p></li>
    375 <li id="idp501608"><p id="idp501736">Roger Prat [Assistant]</p></li>
    376 <li id="idp502024"><p id="idp502152">George Howe [Assistant]</p></li>
    377 <li id="idp502440"><p id="idp502568">Simon Fernando [Assistant]</p></li>
    378 <li id="idp502856"><p id="idp502984">William Willes</p></li>
    379 <li id="idp503264"><p id="idp503392">John Brooke</p></li>
    380 <li id="idp503664"><p id="idp503792">Cutbert White</p></li>
    381 <li id="idp504072"><p id="idp504200">John Bright</p></li>
    382 <li id="idp504472"><p id="idp504600">Clement Tayler</p></li>
    383 <li id="idp504880"><p id="idp505008">William Sole</p></li>
    384 <li id="idp505288"><p id="idp505416">John Cotsmur</p></li>
    385 <li id="idp505696"><p id="idp505824">Humfrey Newton</p></li>
    386 <li id="idp506104"><p id="idp506232">Thomas Colman</p></li>
    387 <li id="idp506512"><p id="idp506640">Thomas Gramme</p></li>
    388 <li id="idp506920"><p id="idp507048">Nicholas Johnson</p></li>
    389 <li id="idp507328"><p id="idp507456">Thomas Warner</p></li>
    390 <li id="idp507736"><p id="idp507864">Anthony Cage</p></li>
    391 <li id="idp508144"><p id="idp508272">John Jones</p></li>
    392 <li id="idp508544"><p id="idp508672">John Tydway</p></li>
    393 <li id="idp508944"><p id="idp509072">Ambrose Viccars</p></li>
    394 <li id="idp509352"><p id="idp509480">Edmond English</p></li>
    395 <li id="idp509760"><p id="idp509888">Thomas Topan</p></li>
    396 <li id="idp510168"><p id="idp510296">Henry Berrye</p></li>
    397 <li id="idp510576"><p id="idp510704">Richard Berrye</p></li>
    398 <li id="idp510984"><p id="idp511112">John Spendlove</p></li>
    399 <li id="idp511392"><p id="idp511520">John Hemmington</p></li>
    400 <li id="idp511800"><p id="idp511928">Thomas Butler</p></li>
    401 <li id="idp512208"><p id="idp512336">Edward Powell</p></li>
    402 <li id="idp512616"><p id="idp512744">John Burden</p></li>
    403 <li id="idp513016"><p id="idp513144">James Hynde</p></li>
    404 <li id="idp513416"><p id="idp513544">Thomas Ellis</p></li>
    405 <li id="idp513824"><p id="idp513952">William Browne</p></li>
    406 <li id="idp514232"><p id="idp514360">Michael Myllet</p></li>
    407 <li id="idp514640"><p id="idp514768">Thomas Smith</p></li>
    408 <li id="idp515048"><p id="idp515176">Richard Kemme</p></li>
    409 <li id="idp515456"><p id="idp515584">Thomas Harris</p></li>
    410 <li id="idp515864"><p id="idp515992">Richard Taverner</p></li>
    411 <li id="idp516272"><p id="idp516400">John Earnest</p></li>
    412 <li id="idp516680"><p id="idp516808">Henry Johnson</p></li>
    413 <li id="idp517088"><p id="idp517216">John Starte</p></li>
    414 <li id="idp517488"><p id="idp517616">Richard Darige</p></li>
    415 <li id="idp517896"><p id="idp518024">William Lucas</p></li>
    416 <li id="idp518304"><p id="idp518432">Arnold Archard</p></li>
    417 <li id="idp518712"><p id="idp518840">John Wright</p></li>
    418 <li id="idp519112"><p id="idp519240">William Dutton</p></li>
    419 <li id="idp519520"><p id="idp519648">Morris Allen</p></li>
    420 <li id="idp519928"><p id="idp520056">William Waters</p></li>
    421 <li id="idp520336"><p id="idp520464">Richard Arthur</p></li>
    422 <li id="idp520744"><p id="idp520872">John Chapman</p></li>
    423 <li id="idp521152"><p id="idp521280">William Clement</p></li>
    424 <li id="idp521560"><p id="idp521688">Robert Little</p></li>
    425 <li id="idp521968"><p id="idp522096">Hugh Tayler</p></li>
    426 <li id="idp522368"><p id="idp522496">Richard Wildye</p></li>
    427 <li id="idp522776"><p id="idp522904">Lewes Wotton</p></li>
    428 <li id="idp523184"><p id="idp523312">Michael Bishop</p></li>
    429 <li id="idp523592"><p id="idp523720">Henry Browne</p></li>
    430 <li id="idp524000"><p id="idp524128">Marke Bennet</p></li>
    431 <li id="idp524408"><p id="idp524536">John Gibbes</p></li>
    432 <li id="idp524808"><p id="idp524936">John Stilman</p></li>
    433 <li id="idp525216"><p id="idp525344">Robert Wilkinson</p></li>
    434 <li id="idp525624"><p id="idp525752">Peter Little</p></li>
    435 <li id="idp526032"><p id="idp526160">John Wyles</p></li>
    436 <li id="idp526432"><p id="idp526560">Brian Wyles</p></li>
    437 <li id="idp526832"><p id="idp526960">George Martyn</p></li>
    438 <li id="idp527240"><p id="idp527368">Hugh Pattenson</p></li>
    439 <li id="idp527648"><p id="idp527776">Martyn Sutton</p></li>
    440 <li id="idp528056"><p id="idp528184">John Farre</p></li>
    441 <li id="idp528456"><p id="idp528584">John Bridger</p></li>
    442 <li id="idp528864"><p id="idp528992">Griffen Jones</p></li>
    443 <li id="idp529272"><p id="idp529400">Richard Shaberdge</p></li>
    444 <li id="idp529680"><p id="idp529808">James Lasie</p></li>
    445 <li id="idp530080"><p id="idp530208">John Cheven</p></li>
    446 <li id="idp530480"><p id="idp530608">Thomas Hewet</p></li>
    447 <li id="idp530888"><p id="idp531016">William Berde</p></li>
    448 <li id="idp531296"><p id="idp531424">Henry Rufoote</p></li>
    449 <li id="idp531704"><p id="idp531832">Richard Tomkins</p></li>
    450 <li id="idp532112"><p id="idp532240">Henry Dorrell</p></li>
    451 <li id="idp532520"><p id="idp532648">Charles Florrie</p></li>
    452 <li id="idp532928"><p id="idp533056">Henry Mylton</p></li>
    453 <li id="idp533336"><p id="idp533464">Henry Payne</p></li>
    454 <li id="idp533736"><p id="idp533864">Thomas Harris</p></li>
    455 <li id="idp534144"><p id="idp534272">William Nicholes</p></li>
    456 <li id="idp534552"><p id="idp534680">Thomas Phevens</p></li>
    457 <li id="idp534960"><p id="idp535088">John Borden</p></li>
    458 <li id="idp535360"><p id="idp535488">Thomas Scot</p></li>
    459 </ul></div>
    460 </li>
    461 <li id="idp535888">
    462 <p id="idp536016">Women</p>
    463 <div class="itemizedlist" id="idp536224"><ul>
    464 <li id="idp536352"><p id="idp536480">Elyoner Dare</p></li>
    465 <li id="idp536760"><p id="idp536888">Margery Harvie</p></li>
    466 <li id="idp537168"><p id="idp537296">Agnes Wood</p></li>
    467 <li id="idp537568"><p id="idp537696">Wenefrid Powell</p></li>
    468 <li id="idp537976"><p id="idp538104">Joyce Archard</p></li>
    469 <li id="idp538384"><p id="idp538512">Jane Jones</p></li>
    470 <li id="idp538784"><p id="idp538912">Elizabeth Glane</p></li>
    471 <li id="idp539192"><p id="idp539320">Jane Pierce</p></li>
    472 <li id="idp539592"><p id="idp539720">Audry Tappan</p></li>
    473 <li id="idp540000"><p id="idp540128">Alis Chapman</p></li>
    474 <li id="idp540408"><p id="idp540536">Emme Merrimoth</p></li>
    475 <li id="idp540816"><p id="idp540944">Colman</p></li>
    476 <li id="idp541216"><p id="idp541344">Margaret Lawrence</p></li>
    477 <li id="idp541624"><p id="idp541752">Joan Warren</p></li>
    478 <li id="idp542024"><p id="idp542152">Jane Mannering</p></li>
    479 <li id="idp542432"><p id="idp542560">Rose Payne</p></li>
    480 <li id="idp542832"><p id="idp542960">Elizabeth Viccars</p></li>
    481 </ul></div>
    482 </li>
    483 <li id="idp543368">
    484 <p id="idp543496">Children</p>
    485 <div class="itemizedlist" id="idp543704"><ul>
    486 <li id="idp543832"><p id="idp543960">John Sampson</p></li>
    487 <li id="idp544240"><p id="idp544368">Robert Ellis</p></li>
    488 <li id="idp544648"><p id="idp544776">Ambrose Viccars</p></li>
    489 <li id="idp545056"><p id="idp545184">Thomas Archard</p></li>
    490 <li id="idp545464"><p id="idp545592">Thomas Humfrey</p></li>
    491 <li id="idp545872"><p id="idp546000">Tomas Smart</p></li>
    492 <li id="idp546272"><p id="idp546400">George Howe</p></li>
    493 <li id="idp546672"><p id="idp546800">John Prat</p></li>
    494 <li id="idp547072"><p id="idp547200">William Wythers</p></li>
    495 </ul></div>
    496 </li>
    497 <li id="idp547608">
    498 <p id="idp547736">Children Born at the Colony</p>
    499 <div class="itemizedlist" id="idp547960"><ul>
    500 <li id="idp548088"><p id="idp548216">Virginia Dare</p></li>
    501 <li id="idp548496"><p id="idp548624">Harvye</p></li>
    502 </ul></div>
    503 </li>
    504 <li id="idp549024">
    505 <p id="idp549152">Native Peoples (who having been in England returned to the colony)</p>
    506 <div class="itemizedlist" id="idp549416"><ul>
    507 <li id="idp549544"><p id="idp549672">Manteo</p></li>
    508 <li id="idp549944"><p id="idp550072">Towaye</p></li>
    509 </ul></div>
    510 </li>
    511 </ul></div>
    512 </div>
    513 <div class="bibliography" id="idp550600">
     1314         </p>
     1315<div class="section" id="idp845360">
     1316<h4 class="title" style="clear: both">Saxon-B TinyTree Example</h4>
     1317<p id="idp845712">As a first example of the ASM concept, current work includes a proof-of-concept to
     1318               deliver a high-performance replacement for building the TinyTree data structure used
     1319               in Saxon-B 6.5.5, an open-source XSLT 2.0 processor written in Java [<a class="xref" href="#Saxon" id="idp846112">Saxon</a>]. Although XSLT stylesheets may be cached for performance, the
     1320               caching of source XML documents is typically not possible. A new TinyTree object to
     1321               represent the XML source document is thus commonly constructed with each new query so
     1322               that the overall performance of simple queries on large source XML documents is
     1323               highly dependent on TinyTree build time. Indeed, in a study of Saxon-SA, the
     1324               commercial version of Saxon, query time was shown to be dominated by TinyTree build
     1325               time [<a class="xref" href="#Kay08" id="idp847040">Kay 2008</a>]. Similar performance results are demonstrable for the
     1326               Saxon-B XSLT processor as well. </p>
     1327<p id="idp847568"> The Saxon-B processor studied is a pure Java solution, converting a SAX (Simple
     1328               API for XML) event stream into the TinyTree Java object using the efficient Aelfred
     1329               XML parser [<a class="xref" href="#AElfred" id="idp847912">Ælfred</a>]. The TinyTree structure is itself an
     1330               array-based structure mapping well suited to the ASM concept. It consists of six
     1331               parallel arrays of integers indexed on node number and containing one entry for each
     1332               node in the source document, with the exception of attribute and namespace nodes
     1333                  [<a class="xref" href="#Saxon" id="idp848624">Saxon</a>]. Four of the arrays respectively provide node kind, name
     1334               code, depth, and next sibling information for each node, while the two others are
     1335               overloaded for different purposes based on node kind value. For example, in the
     1336               context of a text node , one of the overloaded arrays holds the text buffer offset
     1337               value whereas the other holds the text buffer length value. Attributes and namespaces
     1338               are represented using similiar parallel array of values. The stored TinyTree values
     1339               are primarily primitive Java types, however, object types such as Java Strings and
     1340               Java StringBuffers are also used to hold attribute values and comment values
     1341               respectively. </p>
     1342<p id="idp849784"> In addition to the TinyTree object, Saxon-B maintains a NamePool object which
     1343               represents a collection of XML name triplets. Each triplet is composed of a Namespace
     1344               URI, a Namespace prefix and a local name and encoded as an integer value known as a
     1345               namecode. Namecodes permit efficient name search and look-up using integer
     1346               comparison. Namecodes may also be subsequently decoded to recover namespace and local
     1347               name information. </p>
     1348<p id="idp850488"> Using the Parabix ILAX interface, a high-performance reimplementation of TinyTree
     1349               and NamePool data structures was built to compare with the Saxon-B implementation. In
     1350               fact, two functionally equivalent versions of the ASM java class were constructed. An
     1351               initial version was constructed based on a set of primitive Java arrays constructed
     1352               and allocated in the Java heap space via JNI New&lt;PrimitiveType&gt;Array
     1353               method call. In this version, the JVM garbage collector is aware of all memory
     1354               allocated in the native code. However, in this approach, large array copy operations
     1355               limited overall performance to approximately a 2X gain over the Saxon-B build time. </p>
     1356<p id="idp851584">To further address the performance penalty imposed by copying large array values,
     1357               a second version of the ASM Java object was constructed based on natively backed
     1358               Direct Memory Byte Buffers [<a class="xref" href="#JNI" id="idp851944">Hitchens 2002</a>]. In this version the JVM garbage
     1359               collector is unaware any native memory resources backing the Direct Memory Byte
     1360               Buffers. Large JNI-based copy operations are avoided; however, system memory must be
     1361               explicitly deallocated via a Java native method call. Using this approach, our
     1362               preliminary results show an approximate total 2.5X gain over Saxon-B build time.
     1363            </p>
     1364</div>
     1365</div>
     1366</div>
     1367<div class="section" id="idp853000">
     1368<h2 class="title" style="clear: both">Compiler Technology</h2>
     1369<p id="idp853344"> An important focus of our recent work is on the development of compiler technology to
     1370         automatically generate the low-level SIMD code necessary to implement bit stream processing
     1371         given suitable high-level specifications. This has several potential benefits. First, it
     1372         can eliminate the tedious and error-prone programming of bit stream operations in terms of
     1373         register-at-a-time SIMD operations. Second, compilation technology can automatically employ
     1374         a variety of performance improvement techniques that are difficult to apply manually. These
     1375         include algorithms for instruction scheduling and register allocation as well as
     1376         optimization techniques for common subexpression expression elimination and register
     1377         rematerialization among others. Third, compiler technology makes it easier to make changes
     1378         to the low-level code for reasons of perfective or adaptive maintenance.</p>
     1379<p id="idp854496">Beyond these reasons, compiler technology also offers the opportunity for retargetting
     1380         the generation of code to accommodate different processor architectures and API
     1381         requirements. Strategies for efficient parallel bit stream code can vary considerably
     1382         depending on processor resources such as the number of registers available, the particular
     1383         instruction set architecture supported, the size of L1 and L2 data caches, the number of
     1384         available cores and so on. Separate implementation of custom code for each processor
     1385         architecture would thus be likely to be prohibitively expensive, prone to errors and
     1386         inconsistencies and difficult to maintain. Using compilation technology, however, the idea
     1387         would be to implement a variety of processor-specific back-ends all using a common front
     1388         end based on parallel bit streams. </p>
     1389<div class="section" id="idp855592">
     1390<h3 class="title" style="clear: both">Character Class Compiler</h3>
     1391<p id="idp855944">The first compiler component that we have implemented is a character class compiler,
     1392            capable of generation all the bit stream logic necessary to produce a set of lexical
     1393            item streams each corresponding to some particular set of characters to be recognized.
     1394            By taking advantage of common patterns between characters within classes, and special
     1395            optimization logic for recognizing character-class ranges, our existing compiler is able
     1396            to generate well-optimized code for complex sets of character classes involving numbers
     1397            of special characters as well as characters within specific sets of ranges. </p>
     1398</div>
     1399<div class="section" id="idp857544">
     1400<h3 class="title" style="clear: both">Regular Expression Compilation</h3>
     1401<p id="idp857904">Based on the character class compiler, we are currently investigating the
     1402            construction of a regular expression compiler that can implement bit-stream based
     1403            parallel regular-expression matching similar to that describe previously for parallel
     1404            parsing by bistream addition. This compiler works with the assumption that bitstream
     1405            regular-expression definitions are deterministic; no backtracking is permitted with the
     1406            parallel bit stream representation. In XML applications, this compiler is primarily
     1407            intended to enforce regular-expression constraints on string datatype specifications
     1408            found in XML schema. </p>
     1409</div>
     1410<div class="section" id="idp858856">
     1411<h3 class="title" style="clear: both">Unbounded Bit Stream Compilation</h3>
     1412<p id="idp859216">The Catalog of XML Bit Streams presented earlier consist of a set of abstract,
     1413            unbounded bit streams, each in one-to-one correspondence with input bytes of a text
     1414            file. Determining how these bit streams are implemented using fixed-width SIMD
     1415            registers, and possibly processed in fixed-length buffers that represent some multiple
     1416            of the register width is a source of considerable programming complexity. The general
     1417            goal of our compilation strategy in this case is to allow operations to be programmed in
     1418            terms of unbounded bit streams and then automatically reduced to efficient low-level
     1419            code with the application of a systematic code generation strategy for handling block
     1420            and buffer boundary crossing. This is work currently in progress. </p>
     1421</div>
     1422</div>
     1423<div class="section" id="idp860384">
     1424<h2 class="title" style="clear: both">Conclusion</h2>
     1425<p id="idp860704">Parallel bit stream technology offers the opportunity to dramatically speed up the core
     1426         XML processing components used to implement virtually any XML API. Character validation and
     1427         transcoding, whitespace processing, and parsing up to including the full validation of tag
     1428         syntax can be handled fully in parallel using bit stream methods. Bit streams to mark the
     1429         positions of all element names, attribute names and attribute values can also be produced,
     1430         followed by fast bit scan operations to generate position and length values. Beyond bit
     1431         streams, byte-oriented SIMD processing of names and numerals can also accelerate
     1432         performance beyond sequential byte-at-a-time methods. </p>
     1433<p id="idp861640">Advances in processor architecture are likely to further amplify the performance of
     1434         parallel bit stream technology over traditional byte-at-a-time processing over the next
     1435         decade. Improvements to SIMD register width, register complement and operation format can
     1436         all result in further gains. New SIMD instruction set features such as inductive doubling
     1437         support, parallel extract and deposit instructions, bit interleaving and scatter/gather
     1438         capabilities should also result in significant speed-ups. Leveraging the intraregister
     1439         parallelism of parallel bit stream technology within SIMD registers to take of intrachip
     1440         parallelism on multicore processors should accelerate processing further. </p>
     1441<p id="idp862592">Technology transfer using a patent-based open-source business model is a further goal of
     1442         our work with a view to widespread deployment of parallel bit stream technology in XML
     1443         processing stacks implementing a variety of APIs. The feasibility of substantial
     1444         performance improvement in replacement of technology implementing existing APIs has been
     1445         demonstrated even in complex software architectures involving delivery of performance
     1446         benefits across the JNI boundary. We are seeking to accelerate these deployment efforts
     1447         both through the development of compiler technology to reliably apply these methods to a
     1448         variety of architectures as well as to identify interested collaborators using open-source
     1449         or commercial models. </p>
     1450</div>
     1451<div class="section" id="idp864448">
     1452<h2 class="title" style="clear: both">Acknowledgments</h2>
     1453<p id="idp864792">This work is supported in part by research grants and scholarships from the Natural
     1454         Sciences and Engineering Research Council of Canada, the Mathematics of Information
     1455         Technology and Complex Systems Network and the British Columbia Innovation Council. </p>
     1456<p id="idp865264">We thank our colleague Dan Lin (Linda) for her work in high-performance symbol table
     1457         processing. </p>
     1458</div>
     1459<div class="bibliography" id="idp865632">
    5141460<h2 class="title" style="clear:both">Bibliography</h2>
    515 <p class="bibliomixed" id="thorpe97"><a href="#idp265344">[Thorpe 1997] </a>Thorpe, Francis Newton, ed.<span class="ital">Charter to Sir Walter Raleigh: 1584,</span> in <q>The Federal and States
    516         Constitutions, Colonial Charters, and Other Organic Laws of the States, Territories, and
    517         Colonies Now or Heretofore Forming the United States of America</q> (compiled under Act
    518       of Congress of June 30, 1906). [online]. Washington, D.C.: Government Printing Office, 1909.
    519       The Avalon Project. Co-Dir. William C. Fray &amp; Lisa A. Span. © 1997 [cited 8 Apr
    520         1998].<a href="http://www.yale.edu/lawweb/avalon/raleigh.htm" class="link" target="_new">http://www.yale.edu/lawweb/avalon/raleigh.htm</a>.</p>
    521 <p class="bibliomixed" id="mul94">[DeFord 1997] DeFord, Susan. <span class="ital">Tobacco: The Noxious Weed That Built a Nation,</span> Special to <q>The Washington
    522         Post</q> (May 14, 1997) at H01. [online]. © 1997 [cited 29 May 1998].
    523         <a href="http://www.washingtonpost.com/wp~srv/intereact/longterm/horizon/051497/tobacco.htm" class="link" target="_new">http://www.washingtonpost.com/wp~srv/intereact/longterm/horizon/051497/tobacco.htm</a>.</p>
    524 <p class="bibliomixed" id="mul95">[Tobacco] <span class="ital">Part 2. The Story of the
    525         Arrival and First Uses of Tobacco in Europe,</span> in <q>The History of
    526         Tobacco</q>. [online]. [cited 14 Apr
    527         1998].<a href="http://www.tobacco.co.uk/library/history.html#part2" class="link" target="_new">http://www.tobacco.co.uk/library/history.html#part2</a>.</p>
    528 <p class="bibliomixed" id="mul96">[Lane] Lane, Ralph. <q>The Colony at Roanoake —
    529         1586</q>. [online]. [cited 13 Apr
    530         1998].<a href="http://www.nationalcenter.inter.net/ColonyofRoanoke.html" class="link" target="_new">http://www.nationalcenter.inter.net/ColonyofRoanoke.html</a>.</p>
    531 <p class="bibliomixed" id="mul97">[Prindle 1996a] Prindle, Tara.<span class="ital">Indian
    532         Corn,</span> excerpted from <q>Teaching about Thanksgiving</q>. [online]. Fourth
    533       World Documentation Project. Native American Technology and Art. © 1996 [cited 14 Apr
    534         1998].<a href="http://www.lib.uconn.edu/NativeTech/cornhusk/cornfwdp.html" class="link" target="_new">http://www.lib.uconn.edu/NativeTech/cornhusk/cornfwdp.html</a>.</p>
    535 <p class="bibliomixed" id="mul98">[Prindle 1996b] Prindle, Tara. <q>Native American
    536         History of Corn</q>. [online]. Native American Technology and Art. © 1996 [cited 14 Apr
    537         1998].<a href="http://www.lib.uconn.edu/NativeTech/cornhusk/cornhusk.html" class="link" target="_new">http://www.lib.uconn.edu/NativeTech/cornhusk/cornhusk.html</a>.</p>
    538 <p class="bibliomixed" id="mul99">[Dickerson 1997] Dickerson, George. <q>Beyond Turkey:
    539         Corn Shaped History, Cuisines</q>. [online]. Ed. D’Lyn Ford. New Mexico State University
    540       College of Agriculture &amp; Home Economics, 11 Nov 1997 [cited 14 Apr
    541         1998].<a href="http://elroy.nmsu.edu/CAHE/press/corn_history1997.html" class="link" target="_new">http://elroy.nmsu.edu/CAHE/press/corn_history1997.html</a>.</p>
    542 <p class="bibliomixed" id="mul100">[Borio 1954] Borio, Gene.<span class="ital">Smoking in
    543         England — Elizabethan,</span> excerpted from Alfred H. Dunhill’s <q>The Gentle Art
    544         of Smoking</q>. [online]. Library of Congress catalog card #54-10495 (1954). The Tobacco
    545       BBS [cited 20 May
    546       1998].<a href="http://www.tobacco.org/History/Elizabethan_smoking.html" class="link" target="_new">http://www.tobacco.org/History/Elizabethan_smoking.html</a>.</p>
    547 <p class="bibliomixed" id="mul101">[Sellement] <q>First English Settlement in the New
    548         World</q>. [online]. [cited 13 Apr
    549         1998].<a href="http://hal.dcr.state.nc.us/nc/ncsites/English1.htm#First" class="link" target="_new">http://hal.dcr.state.nc.us/nc/ncsites/English1.htm#First</a>.</p>
    550 <p class="bibliomixed" id="mul-102">[ICW-Net 1998] <span class="ital">Virginia
    551         Dare,</span> in <q>Tales from the Coast!</q> [online]. ICW-NET © 1998 [cited 13
    552       Apr 1998].<a href="http://www.outerbanks-nc.com/manteo/history/vadare.htm" class="link" target="_new">http://www.outerbanks-nc.com/manteo/history/vadare.htm</a>.</p>
    553 </div>
    554 <div class="footnotes">
    555 <br><hr width="100" align="left">
    556 <div id="mul74" class="footnote"><p><sup class="fn-label"><a href="#mul74-ref" class="footnoteref">[1]</a></sup> It has been argued that the first expedition was not a failure. Richard Grenville did
    557           return to the colony with additional provisions not long after Drake’s departure, and he
    558           ordered 15 men, supposedly supplied for two years, to remain in the colony while he
    559           returned for new settlers. However, it is unknown whether these men were present to greet
    560           the subsequent expedition.</p></div>
    561 <div id="mul90" class="footnote"><p><sup class="fn-label"><a href="#mul90-ref" class="footnoteref">[2]</a></sup> In the 16th century, such insects were prized in the making of a vibrant red
    562               dye.</p></div>
    563 <div id="mul5" class="footnote"><p><sup class="fn-label"><a href="#mul5-ref" class="footnoteref">[3]</a></sup> This Appendix contains an actual list of the individuals who sailed to the Roanoke
    564           Colony in 1585. (<span class="ital">See</span>
    565           Durant, David N., <q>Raleigh’s Lost Colony</q>, Appendix I, Atheneum,
    566             NY: 1981.)</p></div>
    567 <div id="mul4" class="footnote"><p><sup class="fn-label"><a href="#mul4-ref" class="footnoteref">[4]</a></sup> This Appendix contains an actual list of the individuals who sailed to the Roanoke
    568           Colony in 1587, as well as the names of children born at the colony. (<span class="ital">See</span>
    569           Durant, David N., <q>Raleigh’s Lost Colony</q>, Appendix I, Atheneum,
    570             NY: 1981.)</p></div>
     1461<p class="bibliomixed" id="XMLChip09"><a href="#idp500496">[Leventhal and Lemoine 2009] </a>Leventhal, Michael and
     1462         Eric Lemoine 2009. The XML chip at 6 years. Proceedings of International Symposium on
     1463         Processing XML Efficiently 2009, Montréal.</p>
     1464<p class="bibliomixed" id="Datapower09"><a href="#idp506976">[Salz, Achilles and Maze 2009] </a>Salz, Richard,
     1465         Heather Achilles, and David Maze. 2009. Hardware and software trade-offs in the IBM
     1466         DataPower XML XG4 processor card. Proceedings of International Symposium on Processing XML
     1467         Efficiently 2009, Montréal.</p>
     1468<p class="bibliomixed" id="PPoPP08"><a href="#idp507840">[Cameron 2007] </a>Cameron, Robert D. 2007. A Case Study
     1469         in SIMD Text Processing with Parallel Bit Streams UTF-8 to UTF-16 Transcoding. Proceedings
     1470         of 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008, Salt
     1471         Lake City, Utah. On the Web at <a href="http://research.ihost.com/ppopp08/" class="link" target="_new">http://research.ihost.com/ppopp08/</a>.</p>
     1472<p class="bibliomixed" id="CASCON08"><a href="#idp508744">[Cameron, Herdy and Lin 2008] </a>Cameron, Robert D.,
     1473         Kenneth S Herdy, and Dan Lin. 2008. High Performance XML Parsing Using Parallel Bit Stream
     1474         Technology. Proceedings of CASCON 2008. 13th ACM SIGPLAN Symposium on Principles and
     1475         Practice of Parallel Programming 2008, Toronto.</p>
     1476<p class="bibliomixed" id="SVGOpen08"><a href="#idp509256">[Herdy, Burggraf and Cameron 2008] </a>Herdy, Kenneth
     1477         S., Robert D. Cameron and David S. Burggraf. 2008. High Performance GML to SVG
     1478         Transformation for the Visual Presentation of Geographic Data in Web-Based Mapping Systems.
     1479         Proceedings of SVG Open 6th International Conference on Scalable Vector Graphics,
     1480         Nuremburg. On the Web at
     1481            <a href="http://www.svgopen.org/2008/papers/74-HighPerformance_GML_to_SVG_Transformation_for_the_Visual_Presentation_of_Geographic_Data_in_WebBased_Mapping_Systems/" class="link" target="_new">http://www.svgopen.org/2008/papers/74-HighPerformance_GML_to_SVG_Transformation_for_the_Visual_Presentation_of_Geographic_Data_in_WebBased_Mapping_Systems/</a>.</p>
     1482<p class="bibliomixed" id="Ross06"><a href="#idp826512">[Ross 2006] </a>Ross, Kenneth A. 2006. Efficient hash
     1483         probes on modern processors. Proceedings of ICDE, 2006. ICDE 2006, Atlanta. On the Web at
     1484            <a href="www.cs.columbia.edu/~kar/pubsk/icde2007.pdf" class="link" target="_new">www.cs.columbia.edu/~kar/pubsk/icde2007.pdf</a>.</p>
     1485<p class="bibliomixed" id="ASPLOS09"><a href="#idp638256">[Cameron and Lin 2009] </a>Cameron, Robert D. and Dan
     1486         Lin. 2009. Architectural Support for SWAR Text Processing with Parallel Bit Streams: The
     1487         Inductive Doubling Principle. Proceedings of ASPLOS 2009, Washington, DC.</p>
     1488<p class="bibliomixed" id="Wu08"><a href="#idp639856">[Wu et al. 2008] </a>Wu, Yu, Qi Zhang, Zhiqiang Yu and
     1489         Jianhui Li. 2008. A Hybrid Parallel Processing for XML Parsing and Schema Validation.
     1490         Proceedings of Balisage 2008, Montréal. On the Web at
     1491            <a href="http://www.balisage.net/Proceedings/vol1/html/Wu01/BalisageVol1-Wu01.html" class="link" target="_new">http://www.balisage.net/Proceedings/vol1/html/Wu01/BalisageVol1-Wu01.html</a>.</p>
     1492<p class="bibliomixed" id="u8u16"><a href="#idp699152">[Cameron 2008] </a>u8u16 - A High-Speed UTF-8 to UTF-16
     1493         Transcoder Using Parallel Bit Streams Technical Report 2007-18. 2007. School of Computing
     1494         Science Simon Fraser University, June 21 2007.</p>
     1495<p class="bibliomixed" id="XML10"><a href="#idp756320">[XML 1.0] </a>Extensible Markup Language (XML) 1.0 (Fifth
     1496         Edition) W3C Recommendation 26 November 2008. On the Web at
     1497            <a href="http://www.w3.org/TR/REC-xml/" class="link" target="_new">http://www.w3.org/TR/REC-xml/</a>.</p>
     1498<p class="bibliomixed" id="Unicode"><a href="#idp783160">[Unicode] </a>The Unicode Consortium. 2009. On the Web at
     1499            <a href="http://unicode.org/" class="link" target="_new">http://unicode.org/</a>.</p>
     1500<p class="bibliomixed" id="Pex06"><a href="#idp638640">[Hilewitz and Lee 2006] </a> Hilewitz, Y. and Ruby B. Lee.
     1501         2006. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit
     1502         Instructions. Proceedings of the IEEE 17th International Conference on Application-Specific
     1503         Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006.</p>
     1504<p class="bibliomixed" id="InfoSet"><a href="#idp839600">[XML Infoset] </a>XML Information Set (Second Edition) W3C
     1505         Recommendation 4 February 2004. On the Web at
     1506         <a href="http://www.w3.org/TR/xml-infoset/" class="link" target="_new">http://www.w3.org/TR/xml-infoset/</a>.</p>
     1507<p class="bibliomixed" id="Saxon"><a href="#idp846112">[Saxon] </a>SAXON The XSLT and XQuery Processor. On the Web
     1508         at <a href="http://saxon.sourceforge.net/" class="link" target="_new">http://saxon.sourceforge.net/</a>.</p>
     1509<p class="bibliomixed" id="Kay08"><a href="#idp847040">[Kay 2008] </a> Kay, Michael Y. 2008. Ten Reasons Why Saxon
     1510         XQuery is Fast, IEEE Data Engineering Bulletin, December 2008.</p>
     1511<p class="bibliomixed" id="AElfred"><a href="#idp847912">[Ælfred] </a> The Ælfred XML Parser. On the Web at
     1512            <a href="http://saxon.sourceforge.net/aelfred.html" class="link" target="_new">http://saxon.sourceforge.net/aelfred.html</a>.</p>
     1513<p class="bibliomixed" id="JNI"><a href="#idp851944">[Hitchens 2002] </a>Hitchens, Ron. Java NIO. O'Reilly, 2002.</p>
     1514<p class="bibliomixed" id="Expat"><a href="#idp834992">[Expat] </a>The Expat XML Parser.
     1515            <a href="http://expat.sourceforge.net/" class="link" target="_new">http://expat.sourceforge.net/</a>.</p>
    5711516</div>
    5721517</div>
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3037 r3038  
    22<!-- MODIFIED DTD LOCATION -->
    33<!DOCTYPE article SYSTEM "balisage-1-3.dtd">
    4 <?xml-stylesheet type="text/xsl" href="balisage-proceedings-html.xsl"?>
    5 <?xml-stylesheet type="text/css" href="balisage-author.css" title="Authoring" alternate="no"?>
    6 <article xmlns="http://docbook.org/ns/docbook" version="5.0-subset Balisage-1.2"
     4<article xmlns="http://docbook.org/ns/docbook" version="5.0-subset Balisage-1.3"
    75  xml:id="HR-23632987-8973">
    8   <title>Raleigh's Discoveries in the New World</title>
    9   <subtitle>The lure of vegetable culture</subtitle>
    10   <info>
    11     <abstract>
    12       <para>In 1584, Queen Elizabeth I charged Sir Walter Raleigh to discover and colonize lands in
    13         the New World on behalf of England. This article discusses the efforts and expeditions
    14         organized under Raleigh's patronage to found a colony in the current-day North Carolina.
    15         Going beyond the chronological history of the doomed Roanoke colony, the author provides
    16         insights into relations between the colonists and their native counterparts, discusses the
    17         botanical riches of the region, and argues that Raleigh's efforts did not end in failure, as
    18         commonly might be thought.</para>
    19     </abstract>
    20     <author>
    21       <personname>
    22         <firstname>Tonia</firstname>
    23         <othername>Renae</othername>
    24         <surname>Gaillard</surname>
    25       </personname>
    26       <personblurb>
    27         <para>A native of Eleajora, Dr. Gaillard completed her post-graduate studies in Colonial
    28           American History in 2033. Following these studies, she became an Associate Professor at
    29           DeBare State University before joining Quetrane University’s faculty. She has chaired the
    30           university’s Department of History since 2048.</para>
    31         <para>Dr. Gaillard’s extensive writings include <quote>The Politics Underlying the Salem
    32             Witch Trials</quote>, <quote>Williamsburg’s Role in the Revolution</quote>,<emphasis
    33             role="ital">Benjamin Franklin: Scientist, Inventor, and Politician</emphasis>, and
    34             <emphasis role="ital">Native American Culture and the "True Faith."</emphasis> She is
    35           currently working on a biography of Simon Kenton, whose role in pioneering Kentucky has
    36           been overshadowed by fellow backwoodsman Daniel Boone.</para>
    37       </personblurb>
    38       <affiliation>
    39         <jobtitle>Department Chair</jobtitle>
    40         <orgname>Quetrane University</orgname>
    41       </affiliation>
    42       <email>tgaillard@quetrane.hist.edu</email>
    43     </author>
    44     <keywordset role="author">
    45       <keyword>New World discoveries</keyword>
    46       <keyword>American colonial history</keyword>
    47       <keyword>English 16th century history</keyword>
    48     </keywordset>
    49   </info>
    50   <section>
    51     <title>Introduction</title>
    52     <para>On March 25, 1584, Queen Elizabeth I of England charged Sir Walter Raleigh to<blockquote>
    53         <para>discover, search, find out, and view such remote, heathen and barbarous lands,
    54           countries, and territories, not actually possessed of any Christian Prince, nor inhabited
    55           by Christian People</para>
    56       <attribution><xref linkend="thorpe97"/></attribution>
    57       </blockquote> That same year Raleigh sent two captains, Philip Amades and Arthur Barlowe, from
    58       England to Hispaniola and the Canary Islands; from there, the captains were instructed to
    59       scout the lands northeast of those already claimed by Spain, to wit, Florida. This land — now
    60       encompassing the Carolinas and Virginia — was claimed on behalf of England and named
    61         <quote>Virginia,</quote> in honor of the <quote>Virgin Queen.</quote></para>
    62     <para>Information processing, especially text markup, was primitive in the colony. For example,
    63       most text stores were in XML! Documents may have looked like this:
    64       <programlisting>&lt;teiHeader&gt;
    65   &lt;fileDesc&gt;
    66     &lt;titleStmt&gt;
    67       &lt;title&gt;Farming in the New World&lt;/title&gt;
    68       ...
    69     &lt;/titleStmt&gt;
    70   &lt;/fileDesc&gt;
    71 &lt;/teiHeader&gt;</programlisting>
    72       Notice the paired Tags: <code>&lt;title&gt;</code> and <code>&lt;/title&gt;</code> and the
    73       primitive use of indenting. Unusual features of the colonists data processing practices included:<variablelist>
    74         <varlistentry>
    75           <term>Tags</term>
    76           <listitem>
    77             <para>Meaningful descriptions of the information enclosed by the markers</para>
    78           </listitem>
    79         </varlistentry>
    80         <varlistentry>
    81           <term>Balance</term>
    82           <listitem>
    83             <para>All markup is both opened and closed (or explicitly empty)</para>
    84           </listitem>
    85         </varlistentry>
    86       </variablelist>
    87     </para>
    88   </section>
    89   <section>
    90     <title>Securing a Permanent Colony in the Claimed Lands</title>
    91     <para>With land claimed in the New World, an expedition was mounted to establish a settlement.
    92       The first expedition failed. Led by Sir Richard Grenville in April 1585, it encompassed 600
    93       men of which 105 remained in the colony while Grenville returned to England for additional
    94       provisions. (<emphasis role="ital">See</emphasis>
    95       <xref linkend="mul83"/>.) However, when almost a year passed without Grenville’s return, the
    96       remainder of the expeditionary force took advantage of Sir Francis Drake’s arrival to seek
    97       return passage to England. <footnote xml:id="mul74">
    98         <para>It has been argued that the first expedition was not a failure. Richard Grenville did
    99           return to the colony with additional provisions not long after Drake’s departure, and he
    100           ordered 15 men, supposedly supplied for two years, to remain in the colony while he
    101           returned for new settlers. However, it is unknown whether these men were present to greet
    102           the subsequent expedition.</para>
    103       </footnote>
    104     </para>
    105     <para>The second expedition, organized by John White in 1587, fared better. It sailed with seven
    106       ships filled with Devon families intent upon establishing a colony in that part of Virginia
    107       called Roanoke, a word deriving from the speech of native peoples. [<xref linkend="mul88"/>]
    108       Two years after founding the <quote>Cittie of Raleigh,</quote> houses had been built for
    109       almost all families residing in the colony, and the colony had celebrated the birth of its
    110       first children born in the New World. The first child, grandchild of John White and child of
    111       Ananias and Eleanor Dare, was been named Virginia in honor of the sovereign.</para>
    112     <section>
    113       <title>Native Plants and Wildlife</title>
    114       <para>The European settlers found the New World abundant with commodities <quote>known to
    115           yield victual and sustenance of man’s life</quote>. The first expeditionary force noted
    116         that a great variety of berries grew wildly, including raspberries, blueberries, and
    117         strawberries. Along with maize, native grain, which could be made into bread, grew in the
    118         area. Two other plants — more properly called roots — which could be saved for winter
    119         consumption were <quote>cassida</quote> and <quote>chyna.</quote> The settlers discovered
    120         that while some roots could be eaten much in appearance as they were dug, others had to be
    121         boiled before use as a foodstuff. As more fully described below, other plants included
    122         beans, and several crops previously unknown to the Europeans: <itemizedlist>
    123           <listitem>
    124             <para><quote>macocqwer</quote> (gourds),</para>
    125           </listitem>
    126           <listitem>
    127             <para><quote>melden</quote> (an herb),</para>
    128           </listitem>
    129           <listitem>
    130             <para><quote>planta solis</quote> (sunflower — used in a type of bread, as well as for
    131               broth),</para>
    132           </listitem>
    133           <listitem>
    134             <para>peas (powdered in a mortar), and</para>
    135           </listitem>
    136           <listitem>
    137             <para>potatoes.</para>
    138           </listitem>
    139         </itemizedlist>
    140       </para>
     6   <title>Parallel Bit Stream Technology as a Foundation for XML Parsing Performance</title>
     7   <info>
     8<!--
     9      <confgroup>
     10         <conftitle>International Symposium on Processing XML Efficiently: Overcoming Limits on
     11            Space, Time, or Bandwidth</conftitle>
     12         <confdates>August 10 2009</confdates>
     13      </confgroup>
     14-->
     15      <abstract>
     16         <para>By first transforming the octets (bytes) of XML texts into eight parallel bit
     17            streams, the SIMD features of commodity processors can be exploited for parallel
     18            processing of blocks of 128 input bytes at a time. Established transcoding and parsing
     19            techniques are reviewed followed by new techniques including parsing with bitstream
     20            addition. Further opportunities are discussed in light of expected advances in CPU
     21            architecture and compiler technology. Implications for various APIs and information
     22            models are presented as well opportunities for collaborative open-source
     23         development.</para>
     24      </abstract>
     25      <author>
     26         <personname>
     27            <firstname>Rob</firstname>
     28            <surname>Cameron</surname>
     29         </personname>
     30         <personblurb>
     31            <para>Dr. Rob Cameron is Professor and Director of Computing Science at Simon Fraser
     32               University. With a broad spectrum of research interests related to programming
     33               languages, software engineering and sociotechnical design of public computing
     34               infrastructure, he has recently been focusing on high performance text processing
     35               using parallel bit stream technology and its applications to XML. He is also a
     36               patentleft evangelist, advocating university-based technology transfer models
     37               dedicated to free use in open source. </para>
     38
     39         </personblurb>
     40         <affiliation>
     41            <jobtitle>Professor of Computing Science</jobtitle>
     42            <orgname>Simon Fraser University</orgname>
     43         </affiliation>
     44         <email>cameron@cs.sfu.ca</email>
     45      </author>
     46      <author>
     47         <personname>
     48            <firstname>Ken</firstname>
     49            <surname>Herdy</surname>
     50         </personname>
     51         <personblurb>
     52            <para> Ken Herdy completed an Advanced Diploma of Technology in Geographical Information
     53               Systems at the British Columbia Institute of Technology in 2003 and earned a Bachelor
     54               of Science in Computing Science with a Certificate in Spatial Information Systems at
     55               Simon Fraser University in 2005. </para>
     56            <para> Ken is currently pursuing graduate studies in Computing Science at Simon Fraser
     57               University with industrial scholarship support from the Natural Sciences and
     58               Engineering Research Council of Canada, the Mathematics of Information Technology and
     59               Complex Systems NCE, and the BC Innovation Council. His research focus is an analysis
     60               of the principal techniques that may be used to improve XML processing performance in
     61               the context of the Geography Markup Language (GML). </para>
     62
     63         </personblurb>
     64         <affiliation>
     65            <jobtitle>Graduate Student, School of Computing Science</jobtitle>
     66            <orgname>Simon Fraser University </orgname>
     67         </affiliation>
     68         <email>ksherdy@cs.sfu.ca</email>
     69      </author>
     70      <author>
     71         <personname>
     72            <firstname>Ehsan</firstname>
     73            <surname>Amiri</surname>
     74         </personname>
     75         <personblurb>
     76            <para>Ehsan Amiri is a PhD student of Computer Science at Simon Fraser University.
     77               Before that he studied at Sharif University of Technology, Tehran, Iran. While his
     78               graduate research has been focused on theoretical problems like fingerprinting, Ehsan
     79               has worked on some software projects like development of a multi-node firewall as
     80               well. More recently he has been developing compiler technology for automatic
     81               generation of bit stream processing code. </para>
     82
     83         </personblurb>
     84         <affiliation>
     85            <jobtitle>Graduate Student, School of Computing Science</jobtitle>
     86            <orgname>Simon Fraser University</orgname>
     87         </affiliation>
     88         <email>eamiri@cs.sfu.ca</email>
     89      </author>
     90<!--
     91      <legalnotice>
     92         <para>Copyright &#x000A9; 2009 Robert D. Cameron, Kenneth S. Herdy and Ehsan Amiri.
     93            This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative
     94            Works 2.5 Canada License.</para>
     95      </legalnotice>
     96-->
     97      <keywordset role="author">
     98         <keyword/>
     99         <keyword/>
     100         <keyword/>
     101      </keywordset>
     102   </info>
     103   <section>
     104      <title>Introduction</title>
     105      <para> While particular XML applications may benefit from special-purpose hardware such as XML
     106         chips [<xref linkend="XMLChip09"/>] or appliances [<xref linkend="Datapower09"/>], the bulk
     107         of the world's XML processing workload will continue to be handled by XML software stacks
     108         on commodity processors. Exploiting the SIMD capabilities of such processors such as the
     109         SSE instructions of x86 chips, parallel bit stream technology offers the potential of
     110         dramatic improvement over byte-at-a-time processing for a variety of XML processing tasks.
     111         Character set issues such as Unicode validation and transcoding [<xref linkend="PPoPP08"
     112         />], normalization of line breaks and white space and XML character validation can be
     113         handled fully in parallel using this representation. Lexical item streams, such as the bit
     114         stream marking the positions of opening angle brackets, can also be formed in parallel.
     115         Bit-scan instructions of commodity processors may then be used on lexical item streams to
     116         implement rapid single-instruction scanning across variable-length multi-byte text blocks
     117         as in the Parabix XML parser [<xref linkend="CASCON08"/>]. Overall, these techniques may be
     118         combined to yield end-to-end performance that may be 1.5X to 15X faster than alternatives
     119            [<xref linkend="SVGOpen08"/>].</para>
     120      <para>Continued research in parallel bit stream techniques as well as more conventional
     121         application of SIMD techniques in XML processing offers further prospects for improvement
     122         of core XML components as well as for tackling performance-critical tasks further up the
     123         stack. A newly prototyped technique for parallel tag parsing using bitstream addition is
     124         expected to improve parsing performance even beyond that achieved using sequential bit
     125         scans. Several techniques for improved symbol table performance are being investigated,
     126         including parallel hash value calculation and length-based sorting using the cheap length
     127         determination afforded by bit scans. To deliver the benefits of parallel bit stream
     128         technology to the Java world, we are developing Array Set Model (ASM) representations of
     129         XML Infoset and other XML information models for efficient transmission across the JNI
     130         boundary.</para>
     131
     132      <para>Amplifying these software advances, continuing hardware advances in commodity processors
     133         increase the relative advantage of parallel bit stream techniques over traditional
     134         byte-at-a-time processors. For example, the Intel Core architecture improved SSE processing
     135         to give superscalar execution of bitwise logic operations (3 instructions per cycle vs. 1
     136         in Pentium 4). Upcoming 256-bit AVX technology extends the register set and replaces
     137         destructive two-operand instructions with a nondestructive three-operand form. General
     138         purpose programming on graphic processing units (GPGPU) such as the upcoming 512-bit
     139         Larrabee processor may also be useful for XML applications using parallel bit streams. New
     140         instruction set architectures may also offer dramatic improvements in core algorithms.
     141         Using the relatively simple extensions to support the principle of inductive doubling, a 3X
     142         improvement in several core parallel bit stream algorithms may be achieved [<xref
     143            linkend="ASPLOS09"/>]. Other possibilities include direct implementation of parallel
     144         extract and parallel deposit (pex/pdep) instructions [<xref linkend="Pex06"/>], and
     145         bit-level interleave operations as in Larrabee, each of which would have important
     146         application to parallel bit stream processing.</para>
     147
     148      <para>Further prospects for XML performance improvement arise from leveraging the
     149         intraregister parallelism of parallel bit stream technology to exploit the interchip
     150         parallelism of multicore computing. Parallel bit stream techniques can support multicore
     151         parallelism in both data partitioning and task partitioning models. For example, the
     152         datasection partitioning approach of Wu, Zhang, Yu and Li may be used to partition blocks
     153         for speculative parallel parsing on separate cores followed by a postprocessing step to
     154         join partial S-trees [<xref linkend="Wu08"/>].</para>
     155
     156      <para>In our view, the established and expected performance advantages of parallel bit stream
     157         technology over traditional byte-at-a-time processing are so compelling that parallel bit
     158         stream technology should ultimately form the foundation of every high-performance XML
     159         software stack. We envision a common high-performance XML kernel that may be customized to
     160         a variety of processor architectures and that supports a wide range of existing and new XML
     161         APIs. Widespread deployment of this technology should greatly benefit the XML community in
     162         addressing both the deserved and undeserved criticism of XML on performance grounds. A
     163         further benefit of improved performance is a substantial greening of XML technologies.</para>
     164
     165      <para>To complement our research program investigating fundamental algorithms and issues in
     166         high-performance XML processing, our work also involves development of open source software
     167         implementing these algorithms, with a goal of full conformance to relevant specifications.
     168         From the research perspective, this approach is valuable in ensuring that the full
     169         complexity of required XML processing is addressed in reporting and assessing processing
     170         results. However, our goal is also to use this open source software as a basis of
     171         technology transfer. A Simon Fraser University spin-off company, called International
     172         Characters, Inc., has been created to commercialize the results of this work using a
     173         patent-based open source model.</para>
     174
     175      <para>To date, we have not yet been successful in establishing a broader community of
     176         participation with our open source code base. Within open-source communities, there is
     177         often a general antipathy towards software patents; this may limit engagement with our
     178         technology, even though it has been dedicated for free use in open source. </para>
     179
     180      <para>A further complication is the inherent difficulty of SIMD programming in general, and
     181         parallel bit stream programming in particular. Considerable work is required with each new
     182         algorithmic technique being investigated as well as in retargetting our techniques for each
     183         new development in SIMD and multicore processor technologies. To address these concerns, we
     184         have increasingly shifted the emphasis of our research program towards compiler technology
     185         capable of generating parallel bit stream code from higher-level specifications.</para>
     186   </section>
     187
     188   <section>
     189      <title>A Catalog of Parallel Bit Streams for XML</title>
    141190      <section>
    142         <title>Gourds</title>
    143         <para>The native people grew a variety of large broad-leafed, ground-covering vines which
    144           produced what they called <quote>macocqwer</quote> or gourds. (<emphasis role="ital"
    145             >See</emphasis>
    146           <xref linkend="gourds"/>.) Varying in color among shades of green, yellow, and orange,
    147           these gourds served a number of functions, not chief of which was as a food source. There
    148           were two distinct types, soft-shelled and hard-shelled. Of particular interest to the
    149           settlers were pumpkins; grown throughout the summer, this gourd remained in the fields
    150           until late autumn’s frost. Following harvest, the gourd could be stored throughout the
    151           winter and its flesh made into stews.</para>
    152         <figure xml:id="gourds">
    153           <title>Gourds</title>
    154           <mediaobject>
    155             <imageobject>
    156               <imagedata format="jpg" fileref="19450212-2.jpg" width="50%"/>
    157             </imageobject>
    158           </mediaobject>
    159           <caption>
    160             <para>While gourds, pumpkins and squashes were new to the English, they were soon
    161               discovered to be very useful for warding off starvation.</para>
    162           </caption>
    163         </figure>
    164         <para>However, far more important was the hard-skinned gourd. The value of this gourd lay
    165           not in its potential as a food source, but rather as a container and serving vessel. Once
    166           dried, these gourds were cut and hollowed for use as storage containers, as well as for
    167           bowls, ladles, cups, and other types of serving utensils. Indeed, since gourds grew in a
    168           variety of shapes and sizes, particular gourds could be selected for their resemblance to
    169           the items sought. For the adventurous, the durable objects could be carved and decorated
    170           with plant dyes.</para>
     191         <title>Introduction</title>
     192         <para>In this section, we introduce the fundamental concepts of parallel bit stream
     193            technology and present a comprehensive catalog of parallel bit streams for use in XML
     194            processing. In presenting this catalog, the focus is on the specification of the bit
     195            streams as data streams in one-to-one correspondence with the character code units of an
     196            input XML stream. The goal is to define these bit streams in the abstract without
     197            initially considering memory layouts, register widths or other issues related to
     198            particular target architectures. In cataloging these techniques, we also hope to convey
     199            a sense of the breadth of applications of parallel bit stream technology to XML
     200            processing tasks. </para>
     201      </section>
     202
     203      <section>
     204         <title>Basis Bit Streams</title>
     205         <para>Given a byte-oriented text stream represented in UTF-8, for example, we define a
     206            transform representation of this text consisting of a set of eight parallel bit streams
     207            for the individual bits of each byte. Thus, the <code>Bit0</code> stream is the stream
     208            of bits consisting of bit 0 of each byte in the input byte stream, <code>Bit1</code> is
     209            the bit stream consisting of bit 1 of each byte in the input stream and so on. The set
     210            of streams <code>Bit0</code> through <code>Bit7</code> are known as the <emphasis>basis
     211               streams</emphasis> of the parallel bit stream representation. The following table
     212            shows an example XML character stream together with its representation as a set of 8
     213            basis streams. <table>
     214               <caption>
     215                  <para>XML Character Stream Transposition.</para>
     216               </caption>
     217               <colgroup>
     218                  <col align="left" valign="top"/>
     219                  <col align="left" valign="top"/>
     220                  <col align="left" valign="top"/>
     221                  <col align="left" valign="top"/>
     222                  <col align="left" valign="top"/>
     223                  <col align="left" valign="top"/>
     224               </colgroup>
     225               <tbody>
     226                  <tr valign="top">
     227                     <td>Input Data</td>
     228                     <td>
     229                        <code>&lt;</code>
     230                     </td>
     231                     <td>
     232                        <code>t</code>
     233                     </td>
     234                     <td>
     235                        <code>a</code>
     236                     </td>
     237                     <td>
     238                        <code>g</code>
     239                     </td>
     240                     <td>
     241                        <code>/</code>
     242                     </td>
     243                     <td>
     244                        <code>&gt;</code>
     245                     </td>
     246                  </tr>
     247                  <tr valign="top">
     248                     <td>ASCII</td>
     249                     <td>
     250                        <code>00111100</code>
     251                     </td>
     252                     <td>
     253                        <code>01110100</code>
     254                     </td>
     255                     <td>
     256                        <code>01100001</code>
     257                     </td>
     258                     <td>
     259                        <code>01100111</code>
     260                     </td>
     261                     <td>
     262                        <code>00101111</code>
     263                     </td>
     264                     <td>
     265                        <code>00111110</code>
     266                     </td>
     267                  </tr>
     268                  <tr valign="top">
     269                     <td>Bit0</td>
     270                     <td>
     271                        <code>0</code>
     272                     </td>
     273                     <td>
     274                        <code>0</code>
     275                     </td>
     276                     <td>
     277                        <code>0</code>
     278                     </td>
     279                     <td>
     280                        <code>0</code>
     281                     </td>
     282                     <td>
     283                        <code>0</code>
     284                     </td>
     285                     <td>
     286                        <code>0</code>
     287                     </td>
     288                  </tr>
     289                  <tr valign="top">
     290                     <td>Bit1</td>
     291                     <td>
     292                        <code>0</code>
     293                     </td>
     294                     <td>
     295                        <code>1</code>
     296                     </td>
     297                     <td>
     298                        <code>1</code>
     299                     </td>
     300                     <td>
     301                        <code>1</code>
     302                     </td>
     303                     <td>
     304                        <code>0</code>
     305                     </td>
     306                     <td>
     307                        <code>0</code>
     308                     </td>
     309                  </tr>
     310                  <tr valign="top">
     311                     <td>Bit2</td>
     312                     <td>
     313                        <code>1</code>
     314                     </td>
     315                     <td>
     316                        <code>1</code>
     317                     </td>
     318                     <td>
     319                        <code>1</code>
     320                     </td>
     321                     <td>
     322                        <code>1</code>
     323                     </td>
     324                     <td>
     325                        <code>1</code>
     326                     </td>
     327                     <td>
     328                        <code>1</code>
     329                     </td>
     330                  </tr>
     331                  <tr valign="top">
     332                     <td>Bit3</td>
     333                     <td>
     334                        <code>1</code>
     335                     </td>
     336                     <td>
     337                        <code>1</code>
     338                     </td>
     339                     <td>
     340                        <code>0</code>
     341                     </td>
     342                     <td>
     343                        <code>0</code>
     344                     </td>
     345                     <td>
     346                        <code>0</code>
     347                     </td>
     348                     <td>
     349                        <code>1</code>
     350                     </td>
     351                  </tr>
     352                  <tr valign="top">
     353                     <td>Bit4</td>
     354                     <td>
     355                        <code>1</code>
     356                     </td>
     357                     <td>
     358                        <code>0</code>
     359                     </td>
     360                     <td>
     361                        <code>0</code>
     362                     </td>
     363                     <td>
     364                        <code>0</code>
     365                     </td>
     366                     <td>
     367                        <code>1</code>
     368                     </td>
     369                     <td>
     370                        <code>1</code>
     371                     </td>
     372                  </tr>
     373                  <tr valign="top">
     374                     <td>Bit5</td>
     375                     <td>
     376                        <code>1</code>
     377                     </td>
     378                     <td>
     379                        <code>1</code>
     380                     </td>
     381                     <td>
     382                        <code>0</code>
     383                     </td>
     384                     <td>
     385                        <code>1</code>
     386                     </td>
     387                     <td>
     388                        <code>1</code>
     389                     </td>
     390                     <td>
     391                        <code>1</code>
     392                     </td>
     393                  </tr>
     394                  <tr valign="top">
     395                     <td>Bit6</td>
     396                     <td>
     397                        <code>0</code>
     398                     </td>
     399                     <td>
     400                        <code>0</code>
     401                     </td>
     402                     <td>
     403                        <code>0</code>
     404                     </td>
     405                     <td>
     406                        <code>1</code>
     407                     </td>
     408                     <td>
     409                        <code>1</code>
     410                     </td>
     411                     <td>
     412                        <code>1</code>
     413                     </td>
     414                  </tr>
     415                  <tr valign="top">
     416                     <td>Bit7</td>
     417                     <td>
     418                        <code>0</code>
     419                     </td>
     420                     <td>
     421                        <code>0</code>
     422                     </td>
     423                     <td>
     424                        <code>1</code>
     425                     </td>
     426                     <td>
     427                        <code>1</code>
     428                     </td>
     429                     <td>
     430                        <code>1</code>
     431                     </td>
     432                     <td>
     433                        <code>0</code>
     434                     </td>
     435                  </tr>
     436               </tbody>
     437            </table>
     438         </para>
     439         <para> Depending on the features of a particular processor architecture, there are a number
     440            of algorithms for transposition to parallel bit stream form. Several of these algorithms
     441            employ a three-stage structure. In the first stage, the input byte stream is divided
     442            into a pair of half-length streams consisting of four bits for each byte, for example,
     443            one stream for the high nybble of each byte and another for the low nybble of each byte.
     444            In the second stage, these streams of four bits per byte are each divided into streams
     445            consisting of two bits per original byte, for example streams for the
     446            <code>Bit0/Bit1</code>, <code>Bit2/Bit3</code>, <code>Bit4/Bit5</code>, and
     447               <code>Bit6/Bit7</code> pairs. In the final stage, the streams are further subdivided
     448            in the individual bit streams. </para>
     449         <para> Using SIMD capabilities, this process is quite efficient, with an amortized cost of
     450            1.1 CPU cycles per input byte on Intel Core 2 with SSE, or 0.6 CPU cycles per input byte
     451            on Power PC G4 with Altivec. With future advances in processor technology, this
     452            transposition overhead is expected to reduce, possibly taking advantage of upcoming
     453            parallel extract (pex) instructions on Intel technology. In the ideal, only 24
     454            instructions are needed to transform a block of 128 input bytes using 128-bit SSE
     455            registers using the inductive doubling instruction set architecture, representing an
     456            overhead of less than 0.2 instructions per input byte. </para>
     457      </section>
     458
     459      <section>
     460         <title>General Streams</title>
     461         <para>This section describes bit streams which support basic processing operations.</para>
     462
     463         <section>
     464            <title>Deletion Mask Streams</title>
     465            <para>DelMask (deletion mask) streams marks character code unit positions for deletion.
     466               Since the deletion operation is dependency free across many stages of XML processing,
     467               it is possible to simply mark and record deletion positions as deletion mask streams for future processing. A single
     468               invocation of a SIMD based parallel deletion algorithm can then perform the deletion of
     469               positions accumulated across a number of stages through a bitwise ORing of deletion
     470               masks. For example, deletion arises in the replacement of predefined entities with a
     471               single character, such as in the replacement of the &amp;amp; entity, with the
     472               &amp; character. Deletion also arises in XML
     473               end-of-line handling, and CDATA section delimeter processing. Several algorithms to
     474               delete bits at positions marked by DelMask are possible [<xref linkend="u8u16"/>]. </para>
     475            <para>The following table provides an example of generating a DelMask in the context of
     476               bit stream based parsing of well-formed character references and predefined entities.
     477               The result is the generation of a DelMask stream. <table>
     478                  <caption>
     479                     <para>DelMask Stream Generation</para>
     480                  </caption>
     481                  <colgroup>
     482                     <col align="left" valign="top"/>
     483                     <col align="left" valign="top"/>
     484                  </colgroup>
     485                  <tbody>
     486                     <tr valign="top">
     487                        <td>Input Data</td>
     488                        <td>
     489                           <code>&amp;gt; &amp;#13; &amp;#x0a;</code>
     490                        </td>
     491                     </tr>
     492                     <tr valign="top">
     493                        <td>GenRefs</td>
     494                        <td>
     495                           <code>_11______________</code>
     496                        </td>
     497                     </tr>
     498
     499                     <tr valign="top">
     500                        <td>DecRefs</td>
     501                        <td>
     502                           <code>_______11________</code>
     503                        </td>
     504                     </tr>
     505                     <tr valign="top">
     506                        <td>HexRefs</td>
     507                        <td>
     508                           <code>______________11_</code>
     509                        </td>
     510                     </tr>
     511                     <tr valign="top">
     512                        <td>DelMask</td>
     513                        <td>
     514                           <code>111__1111__11111_</code>
     515                        </td>
     516                     </tr>
     517                     <tr valign="top">
     518                        <td>ErrorFlag</td>
     519                        <td>
     520                           <code>_________________</code>
     521                        </td>
     522                     </tr>
     523                  </tbody>
     524               </table>
     525            </para>
     526         </section>
     527
     528         <section>
     529            <title>Error Flag Streams </title>
     530            <para>Error flag streams indicates the character code unit positions of syntactical
     531               errors. XML processing examples which benefit from the marking of error positions
     532               include UTF-8 character sequence validation and XML parsing [<xref linkend="u8u16"
     533               />].</para>
     534            <para>The following table provides an example of using bit streams to parse character
     535               references and predefined entities which fail to meet the XML 1.0 well-formedness
     536               constraints. The result is the generation of an error flag stream that marks the
     537               positions of mal-formed decimal and hexical character references respectively. <table>
     538                  <caption>
     539                     <para>Error Flag Stream Generation</para>
     540                  </caption>
     541                  <colgroup>
     542                     <col align="left" valign="top"/>
     543                     <col align="left" valign="top"/>
     544                  </colgroup>
     545                  <tbody>
     546                     <tr valign="top">
     547                        <td>Input Data</td>
     548                        <td>
     549                           <code>&amp;gt; &amp;#, &amp;#x; </code>
     550                        </td>
     551                     </tr>
     552                     <tr valign="top">
     553                        <td>GenRefs</td>
     554                        <td>
     555                           <code>_11___________</code>
     556                        </td>
     557                     </tr>
     558                     <tr valign="top">
     559                        <td>DecRefs</td>
     560                        <td>
     561                           <code>______________</code>
     562                        </td>
     563                     </tr>
     564                     <tr valign="top">
     565                        <td>HexRefs</td>
     566                        <td>
     567                           <code>______________</code>
     568                        </td>
     569                     </tr>
     570                     <tr valign="top">
     571                        <td>DelMask</td>
     572                        <td>
     573                           <code>111__11__111__</code>
     574                        </td>
     575                     </tr>
     576                     <tr valign="top">
     577                        <td>ErrorFlag</td>
     578                        <td>
     579                           <code>_______1____1_</code>
     580                        </td>
     581                     </tr>
     582                  </tbody>
     583               </table>
     584            </para>
     585         </section>
     586
     587      </section>
     588
     589      <section>
     590         <title>Lexical Item Streams</title>
     591         <para>Lexical item streams differ from traditional streams of tokens in that they are bit
     592            streams that mark the positions of tokens, whitespace or delimiters. Additional bit
     593            streams, such as the reference streams and callout streams, are subsequently constructed
     594            based on the information held within the set of lexical items streams. Differentiation
     595            between the actual tokens that may occur at a particular point (e.g., the different XML
     596            tokens that begin “&lt;”) may be performed using multicharacter recognizers on the
     597            bytestream representation [<xref linkend="CASCON08"/>].</para>
     598         <para>A key role of lexical item streams in XML parsing is to facilitate fast scanning
     599            operations. For example, a left angle bracket lexical item stream may be formed to
     600            identify those character code unit positions at which a “&lt;” character occurs.
     601            Hardware register bit scan operations may then be used by the XML parser on the left
     602            angle bracket stream to efficiently identify the position of the next “&lt;”. Based
     603            on the capabilities of current commodity processors, a single register bit scan
     604            operation may effectively scan up to 64 byte positions with a single instruction.</para>
     605         <para>Overall, the construction of the full set of lexical item stream computations
     606            requires approximately 1.0 CPU cycles per byte when implemented for 128 positions at a
     607            time using 128-bit SSE registers on Intel Core2 processors [<xref linkend="CASCON08"/>].
     608            The following table defines the core lexical item streams defined by the Parabix XML
     609            parser.</para>
     610         <para>
     611            <table>
     612               <caption>
     613                  <para>Lexical item stream descriptions.</para>
     614               </caption>
     615               <tbody>
     616                  <tr>
     617                     <td align="left"> LAngle </td>
     618                     <td align="left"> Marks the position of any left angle bracket character.</td>
     619                  </tr>
     620                  <tr>
     621                     <td align="left"> RAngle </td>
     622                     <td align="left"> Marks the position of any right angle bracket character.</td>
     623                  </tr>
     624                  <tr>
     625                     <td align="left"> LBracket </td>
     626                     <td align="left"> Marks the position of any left square bracker character.</td>
     627                  </tr>
     628                  <tr>
     629                     <td align="left"> RBracket </td>
     630                     <td align="left"> Marks the position of any right square bracket
     631                     character.</td>
     632                  </tr>
     633                  <tr>
     634                     <td align="left"> Exclam </td>
     635                     <td align="left"> Marks the position of any exclamation mark character.</td>
     636                  </tr>
     637                  <tr>
     638                     <td align="left"> QMark </td>
     639                     <td align="left"> Marks the position of any question mark character.</td>
     640                  </tr>
     641                  <tr>
     642                     <td align="left"> Hyphen </td>
     643                     <td align="left"> Marks the position of any hyphen character.</td>
     644                  </tr>
     645                  <tr>
     646                     <td align="left"> Equals </td>
     647                     <td align="left"> Marks the position of any equal sign character.</td>
     648                  </tr>
     649                  <tr>
     650                     <td align="left"> SQuote </td>
     651                     <td align="left"> Marks the position of any single quote character.</td>
     652                  </tr>
     653                  <tr>
     654                     <td align="left"> DQuote </td>
     655                     <td align="left"> Marks the position of any double quote character.</td>
     656                  </tr>
     657                  <tr>
     658                     <td align="left"> Slash </td>
     659                     <td align="left"> Marks the position of any forward slash character</td>
     660                  </tr>
     661                  <tr>
     662                     <td align="left"> NameScan </td>
     663                     <td align="left"> Marks the position of any XML name character.</td>
     664                  </tr>
     665                  <tr>
     666                     <td align="left"> WS </td>
     667                     <td align="left"> Marks the position of any XML 1.0 whitespace character.</td>
     668                  </tr>
     669                  <tr>
     670                     <td align="left"> PI_start </td>
     671                     <td align="left"> Marks the position of the start of any processing instruction
     672                        at the '?' character position.</td>
     673                  </tr>
     674                  <tr>
     675                     <td align="left"> PI_end </td>
     676                     <td align="left"> Marks the position of any end of any processing instruction
     677                        at the '>' character position.</td>
     678                  </tr>
     679                  <tr>
     680                     <td align="left"> CtCD_start </td>
     681                     <td align="left"> Marks the position of the start of any comment or CDATA
     682                        section at the '!' character position.</td>
     683                  </tr>
     684                  <tr>
     685                     <td align="left"> EndTag_start </td>
     686                     <td align="left"> Marks the position of any end tag at the '/' character
     687                        position.</td>
     688                  </tr>
     689                  <tr>
     690                     <td align="left"> CD_end </td>
     691                     <td align="left"> Marks the position of the end of any CDATA section at the '>'
     692                        character position. </td>
     693                  </tr>
     694                  <tr>
     695                     <td align="left"> DoubleHyphen </td>
     696                     <td align="left"> Marks the position of any double hyphen character.</td>
     697                  </tr>
     698                  <tr>
     699                     <td align="left"> RefStart </td>
     700                     <td align="left"> Marks the position of any ampersand character.</td>
     701                  </tr>
     702                  <tr>
     703                     <td align="left"> Hash </td>
     704                     <td align="left"> Marks the position of any hash character.</td>
     705                  </tr>
     706                  <tr>
     707                     <td align="left"> x </td>
     708                     <td align="left"> Marks the position of any 'x' character.</td>
     709                  </tr>
     710                  <tr>
     711                     <td align="left"> Digit </td>
     712                     <td align="left"> Marks the position of any digit.</td>
     713                  </tr>
     714                  <tr>
     715                     <td align="left"> Hex </td>
     716                     <td align="left"> Marks the position of any hexidecimal character.</td>
     717                  </tr>
     718                  <tr>
     719                     <td align="left"> Semicolon </td>
     720                     <td align="left"> Marks the position of any semicolon character.</td>
     721                  </tr>
     722               </tbody>
     723            </table>
     724         </para>
     725         <para> The following illustrates a number of the lexical item streams. </para>
     726         <para>
     727            <table>
     728               <caption>
     729                  <para>Lexical Item Streams</para>
     730               </caption>
     731               <colgroup>
     732                  <col align="left" valign="top"/>
     733                  <col align="left" valign="top"/>
     734               </colgroup>
     735               <tbody>
     736                  <tr valign="top">
     737                     <td>Input Data</td>
     738                     <td>
     739                        <code>&lt;tag&gt;&lt;tag&gt; text &amp;lt;
     740                           &amp;#x3e; &lt;/tag&gt;&lt;/tag&gt;</code>
     741                     </td>
     742                  </tr>
     743
     744                  <tr valign="top">
     745                     <td>LAngle</td>
     746                     <td>
     747                        <code>1____1______________________1_____1_____</code>
     748                     </td>
     749                  </tr>
     750                  <tr valign="top">
     751                     <td>RAngle</td>
     752                     <td>
     753                        <code>____1____1_______________________1_____1</code>
     754                     </td>
     755                  </tr>
     756                  <tr valign="top">
     757                     <td>WS</td>
     758                     <td>
     759                        <code>__________1____1____1______1____________</code>
     760                     </td>
     761                  </tr>
     762                  <tr valign="top">
     763                     <td>RefStart</td>
     764                     <td>
     765                        <code>________________1____1__________________</code>
     766                     </td>
     767                  </tr>
     768                  <tr valign="top">
     769                     <td>Hex</td>
     770                     <td>
     771                        <code>__1____1____1___________11_____1_____1__</code>
     772                     </td>
     773                  </tr>
     774
     775                  <tr valign="top">
     776                     <td>Semicolon</td>
     777                     <td>
     778                        <code>___________________1______1_____________</code>
     779                     </td>
     780                  </tr>
     781                  <tr valign="top">
     782                     <td>Slash</td>
     783                     <td>
     784                        <code>_____________________________1_____1____</code>
     785                     </td>
     786                  </tr>
     787               </tbody>
     788            </table>
     789         </para>
     790
     791      </section>
     792
     793      <section>
     794         <title>UTF-8 Byte Classification, Scope and Validation Streams</title>
     795         <para> An XML parser must accept the UTF-8 encoding of Unicode [<xref linkend="XML10"/>].
     796            It is a fatal error if an XML document determined to be in UTF-8 contains byte sequences
     797            that are not legal in that encoding. UTF-8 byte classification, scope, XML character
     798            validation and error flag bit streams are defined to validate UTF-8 byte sequences and
     799            support transcoding to UTF-16.</para>
     800
     801         <section>
     802            <title>UTF-8 Byte Classification Streams</title>
     803            <para>UTF-8 byte classification bit streams classify UTF-8 bytes based on their role in
     804               forming single and multibyte sequences. The u8Prefix and u8Suffix bit streams
     805               identify bytes that represent, respectively, prefix or suffix bytes of multibyte
     806               UTF-8 sequences. The u8UniByte bit stream identifies those bytes that may be
     807               considered single-byte sequences. The u8Prefix2, u8Prefix3, and u8Prefix4 refine the
     808               u8Prefix respectively indicating prefixes of two, three or four byte
     809            sequences respectively.</para>
     810         </section>
     811
     812         <section>
     813            <title>UTF-8 Scope Streams</title>
     814            <para> Scope streams represent expectations established by UTF-8 prefix bytes. For
     815               example, the u8Scope22 bit stream represents the positions at which the second byte of a
     816               two-byte sequence is expected based on the occurrence of a two-byte prefix in the
     817               immediately preceding position. The u8scope32, u8Scope33, u8Scope42, u8scope43, and
     818               u8Scope44 complete the set of UTF-8 scope streams.</para>
     819            <para> The following example demonstrates the UTF-8 character encoding validation
     820               process using parallel bit stream techniques. The result of this validation process
     821               is an error flag stream identifying the positions at which errors occur.</para>
     822            <para>
     823               <table>
     824                  <caption>
     825                     <para>UTF-8 Scope Streams</para>
     826                  </caption>
     827                  <colgroup>
     828                     <col align="left" valign="top"/>
     829                     <col align="left" valign="top"/>
     830                  </colgroup>
     831                  <tbody> <tr valign="top"><td>Input Data</td><td><code>A Text in Farsi:&#160;ى&#160;ك&#160;&#160;Ù
     832&#160;ت&#160;ن&#160;&#160;ف&#160;ا&#160;ر&#160;س&#160;ى</code></td></tr>
     833                     
     834                     <tr valign="top">
     835                        <td>High Nybbles</td>
     836                        <td>
     837                           <code>42567726624677632D8DBDBDAD82D8DAD82D8D8</code>
     838                        </td>
     839                     </tr>
     840                     <tr valign="top">
     841                        <td>Low Nybbles</td>
     842                       
     843                        <td>
     844                           <code>10458409E061239A099838187910968A9509399</code>
     845                        </td>
     846                     </tr>
     847                     <tr valign="top">
     848                        <td>u8Unibyte</td>
     849                        <td>
     850                           <code>11111111111111111__________1______1____</code>
     851                           
     852                        </td>
     853                     </tr>
     854                     <tr valign="top">
     855                        <td>u8Prefix</td>
     856                        <td>
     857                           <code>_________________1_1_1_1_1__1_1_1__1_1_</code>
     858                        </td>
     859                     </tr>
     860                     
     861                     <tr valign="top">
     862                        <td>u8Suffix</td>
     863                        <td>
     864                           <code>__________________1_1_1_1_1__1_1_1__1_1</code>
     865                        </td>
     866                     </tr>
     867                     <tr valign="top">
     868                        <td>u8Prefix2</td>
     869                       
     870                        <td>
     871                           <code>_________________1_1_1_1_1__1_1_1__1_1_</code>
     872                        </td>
     873                     </tr>
     874                     <tr valign="top">
     875                        <td>u8Scope22</td>
     876                        <td>
     877                           <code>__________________1_1_1_1_1__1_1_1__1_1</code>
     878                           
     879                        </td>
     880                     </tr>
     881                     <tr valign="top">
     882                        <td>ErrorFlag</td>
     883                        <td>
     884                           <code>_______________________________________</code>
     885                        </td>
     886                     </tr>
     887                     
     888                  </tbody>
     889               </table>
     890               
     891
     892            </para>
     893         </section>
     894
     895         <section>
     896            <title>UTF-8 Validation Streams</title>
     897            <para> Proper formation of UTF-8 byte sequences requires that the correct number of
     898               suffix bytes always follow a UTF-8 prefix byte, and that certain illegal byte
     899               combinations are ruled out. For example, sequences beginning with the prefix bytes
     900               0xF5 through 0xFF are illegal as they would represent code point values above 10FFFF.
     901               In addition, there are constraints on the first suffix byte following certain special
     902               prefixes, namely that a suffix following the prefix 0xE0 must fall in the range
     903               0xA0–0xBF, a suffix following the prefix 0xED must fall in the range 0x80–0x9F, a
     904               suffix following the prefix 0xF0 must fall in the range 0x90–0xBF and a suffix
     905               following the prefix 0xF4 must fall in the range 0x80–0x8F. The task of ensuring that
     906               each of these constraints hold is known as UTF-8 validation. The bit streams xE0,
     907               xED, xF0, xF4, xA0_xBF, x80_x9F, x90_xBF, and x80_x8F are constructed to flag the
     908               aforementioned UTF-8 validation errors. The result of UTF-8 validation is a UTF-8
     909               error flag bit stream contructed as the ORing of a series of UTF-8 validation tests.
     910            </para>
     911         </section>
     912
     913         <section>
     914            <title>XML Character Validation Streams</title>
     915            <para>The UTF-8 character sequences <emphasis>0xEF 0xBF 0xBF</emphasis> and
     916                  <emphasis>0xEF 0xBF 0xBE</emphasis> correspond to the Unicode code points 0xFFFE
     917               and 0xFFFF respectively. In XML 1.0, 0xFFFE and 0xFFFF represent characters outside
     918               the legal XML character ranges. As such, bit streams which mark 0xEF, 0xBF, and 0xBE
     919               character are constructed to flag illegal UTF-8 character sequences. </para>
     920         </section>
     921
     922         <section>
     923            <title>UTF-8 to UTF-16 Transcoding</title>
     924            <para>UTF-8 is often preferred for storage and data exchange, it is suitable for
     925               processing, but it is significantly more complex to process than UTF-16 [<xref
     926                  linkend="Unicode"/>]. As such, XML documents are typically encoded in UTF-8 for
     927               serialization and transport, and subsequently transcoded to UTF-16 for processing
     928               with programming languages such as Java and C#. Following the parallel bit stream
     929               methods developed for the u8u16 transcoder, a high-performance standalone UTF-8 to
     930               UTF-16 transcoder [<xref linkend="u8u16"/>], transcoding to UTF-16 may be achieved by
     931               computing a series of 16 bit streams. One stream for each of the individual bits of a
     932               UTF-16 code unit. </para>
     933            <para>The bit streams for UTF-16 are conveniently divided into groups: the eight streams
     934               u16Hi0, u16Hi1, ..., u16Hi7 for the high byte of each UTF-16 code unit and the eight
     935               streams u16Lo1, ..., u16Lo7 for the low byte. Upon conversion of the parallel bit
     936               stream data back to byte streams, eight sequential byte streams U16h0, U16h1, ...,
     937               U16Hi7 are used for the high byte of each UTF-16 code unit, while U16Lo0, U16Lo1,...,
     938               U16Lo7 are used for the corresponding low byte. Interleaving these streams then
     939               produces the full UTF-16 doublebyte stream.</para>
     940         </section>
     941
     942         <section>
     943            <title>UTF-8 Indexed UTF-16 Streams</title>
     944            <para>UTF-16 bit streams are initially defined in UTF-8 indexed form. That is, with sets
     945               of bits in one-to-one correspondence with UTF-8 bytes. However, only one set of
     946               UTF-16 bits is required for encoding two or three-byte UTF-8 sequences and only two
     947               sets are required for surrogate pairs corresponding to four-byte UTF-8 sequences. The
     948               u8LastByte (u8UniByte , u8Scope22 , u8Scope33 , and u8Scope44 ) and u8Scope42 streams
     949               mark the positions at which the correct UTF-16 bits are computed. The bit sets at
     950               other positions must be deleted to compress the streams to the UTF-16 indexed form.
     951            </para>
     952         </section>
     953      </section>
     954
     955      <section>
     956         <title>Control Character Streams</title>
     957         <para>The control character bit streams marks ASCII control characters in the range
     958            0x00-0x1F. Additional control character bit streams mark the tab, carriage return, line
     959            feed, and space character. In addition, a bit stream to mark carriage return line
     960            combinations is also constructed. Presently, control character bit streams support the
     961            operations of XML 1.0 character validation and XML end-of-line handling.</para>
     962
     963         <section>
     964            <title>XML Character Validation</title>
     965            <para>Legal characters in XML are the tab, carriage return, and line feed characters,
     966               together with all Unicode characters and excluding the surrogate blocks, as well as hexadecimal OxFFFE and
     967               OxFFFF [<xref linkend="XML10"/>]. The x00_x1F bit stream is constructed and used in
     968               combination with the additional control character bit streams to flags the positions
     969               of illegal control characters.</para>
     970         </section>
     971
     972         <section>
     973            <title>XML 1.0 End-of-line Handling</title>
     974            <para>In XML 1.0 the two-character sequence CR LF (carriage return, line feed) as well as
     975               any CR character not followed by a LF character must be converted to a single LF
     976               character [<xref linkend="XML10"/>].</para>
     977            <para>By defining carriage return, line feed, and carriage return line feed bit streams,
     978               dentoted CR, LF and CRLF respectively, end-of-line normalization processing can be
     979               performed in parallel using only a small number of logical and shift operations.</para>
     980            <para/>
     981            <para>The following example demonstrates the generation of the CRLF deletion mask. In
     982               this example, the position of all CR characters followed by LF characters are marked
     983               for deletion. Isolated carriage returns are then replaced with LF characters.
     984               Completion of this process satisfies the XML 1.0 end-of-line handling requirements.
     985               For clarity, this example encodes input data carriage returns as
     986               <emphasis>C</emphasis> characters, whereas line feed characters are shown as
     987                  <emphasis>L</emphasis> characters.</para>
     988            <para>
     989               <table>
     990                  <caption>
     991                     <para>XML 1.0 End-of-line Handling</para>
     992                  </caption>
     993                  <colgroup>
     994                     <col align="left" valign="top"/>
     995                  </colgroup>
     996                  <tbody>
     997                     <tr valign="top">
     998                        <td>Input Data</td>
     999                        <td>
     1000                           <code>first line C second line CL third line L one more C nothing
     1001                           left</code>
     1002                        </td>
     1003                     </tr>
     1004                     <tr valign="top">
     1005                        <td>CR</td>
     1006                        <td>
     1007                           <code>-----------1-------------1------------------------1-------------</code>
     1008                        </td>
     1009                     </tr>
     1010                     <tr valign="top">
     1011                        <td>LF</td>
     1012                        <td>
     1013                           <code>--------------------------1------------1------------------------</code>
     1014                        </td>
     1015                     </tr>
     1016                     <tr valign="top">
     1017                        <td>DelMask</td>
     1018                        <td>
     1019                           <code>--------------------------1-------------------------------------</code>
     1020                        </td>
     1021                     </tr>
     1022                  </tbody>
     1023               </table>
     1024
     1025            </para>
     1026         </section>
     1027
     1028      </section>
     1029
     1030      <section>
     1031         <title>Call Out Streams</title>
     1032         <para> Call out bit streams mark the extents of XML markup structures such as comments,
     1033            processing instruction and CDATA sections as well as physical structures such as character and
     1034            entity references and general references.  Call out streams are also formed for logical markup structures such
     1035            start tags, end tags and empty element tags. </para>
     1036         <section>
     1037            <title>Comment, Processing Instruction and CDATA Section Call Out Streams</title>
     1038            <para>Comments, processing instructions and CDATA sections call out streams, Ct_Span,
     1039               PI_Span and CD_Span respectively, define sections of an XML document which
     1040               contain markup that is not interpreted by an XML processor. As such, the union of
     1041               Ct_Span, PI_Span and CD_Span streams defines the regions of non-interpreteable markup.
     1042               The stream formed by this union is termed the CtCDPI_Mask.</para>
     1043            <para>The following tables provides an example of constructing the CtCDPI_Mask. </para>
     1044            <table>
     1045               <caption>
     1046                  <para>CtCDPI Mask Generation</para>
     1047               </caption>
     1048               <colgroup><col align="left" valign="top" /><col align="left" valign="top" /></colgroup>
     1049               <tbody> <tr valign="top"><td>Input Data</td><td><code>&lt;?php?&gt; &lt;!-- example --&gt; &lt;![CDATA[ shift: a&lt;&lt;1 ]]&gt;</code></td></tr>
     1050                 
     1051                  <tr valign="top"><td>CD_Span</td><td><code>___________________________1111111111111111111111_</code></td></tr>
     1052                  <tr valign="top"><td>Ct_Span</td><td><code>___________111111111111___________________________</code></td></tr>
     1053                  <tr valign="top"><td>PI_Span</td><td><code>_11111____________________________________________</code></td></tr>
     1054                  <tr valign="top"><td>CtCDPI_Mask</td><td><code>_111111__111111111111111__111111111111111111111111</code></td></tr>
     1055                  <tr valign="top"><td>ErrorFlag</td><td><code>__________________________________________________</code></td></tr>
     1056                 
     1057               </tbody>
     1058            </table>
     1059           
     1060           
     1061           
     1062           
     1063            <para> With the removal of all non-interpreteable markup, several phases of parallel bit
     1064               stream based SIMD operations may follow operating on up to 128 byte positions on
     1065               current commondity processors and assured of XML markup relevancy. For
     1066               example, with the extents identification of comments, processing instructions and
     1067               CDATA secions, XML names may be identified and length sorted for efficient symbol
     1068               table construction. </para>
     1069            <para> As an aside, comments and CDATA sections must first be validated to ensure
     1070               that comments do not contain "--" sequences and that CDATA sections do not contain illegal
     1071               "]]&gt;" sequences prior to ignorable markup stream generation.</para>
     1072         </section>
     1073
     1074         <section>
     1075            <title>Reference Call Out Streams</title>
     1076            <para>The reference call out streams are the GenRefs, DecRefs, and HexRefs streams. This
     1077               subset of the call out streams marks the extents of all but the closing semicolon of
     1078               general and character references.</para>
     1079            <para>Predefined character
     1080               (<![CDATA[&lt;,&gt;,&amp;,&apos;,&quot;]]>) and numeric character
     1081               references (&amp;#nnnn;, &amp;#xhhhh;) must be replaced by a single character
     1082                  [<xref linkend="XML10"/>]. As previously shown, this subset of call out streams enables the construction of a DelMask for
     1083               references.</para>
     1084         </section>
     1085
     1086         <section>
     1087            <title>Tag Call Out Streams</title>
     1088            <para>Whereas sequential bit scans over lexical item streams form the basis of XML
     1089               parsing, in the current Parabix parser a new method of parallel parsing has been
     1090               developed and prototyped using the concept of bitstream addition. Fundamental to this
     1091               method is the concept of a <emphasis>cursor</emphasis> stream, a bit stream marking
     1092               the positions of multiple parallel parses currently in process. </para>
     1093            <para>The results of parallel parsing using the bit stream addition technique produces a
     1094               set of tag call out bit streams. These streams mark the extents of each start tag,
     1095               end tag and empty element tag. Within tags, additional streams mark start
     1096               and end positions for tag names, as well as attribute names and values. An error flag
     1097               stream marks the positions of any syntactic errors encountered during parsing.</para>
     1098            <para> The set of tag call out streams consists of the ElemNames, AttNames, AttVals, Tags,
     1099               EmptyTagEnds and EndTags bit streams. The following example demonstrates the bit
     1100               stream output produced which from parallel parsing using bit stream addition. </para>
     1101            <table>
     1102               <caption>
     1103                  <para>Tag Call Out Streams</para>
     1104               </caption>
     1105               <colgroup>
     1106                  <col align="left" valign="top"/>
     1107                  <col align="left" valign="top"/>
     1108               </colgroup>
     1109               <tbody>
     1110                  <tr valign="top">
     1111                     <td>Input Data</td>
     1112                     <td>
     1113                        <code>&lt;root&gt;&lt;t1&gt;text&lt;/t1&gt;&lt;t2
     1114                           a1=&apos;foo&apos; a2 =
     1115                           &apos;fie&apos;&gt;more&lt;/t2&gt;&lt;tag3
     1116                           att3=&apos;b&apos;/&gt;&lt;/root&gt;</code>
     1117                     </td>
     1118                  </tr>
     1119
     1120                  <tr valign="top">
     1121                     <td>ElemNames</td>
     1122                     <td>
     1123                        <code>_1111__11___________11_______________________________1111__________________</code>
     1124                     </td>
     1125                  </tr>
     1126                  <tr valign="top">
     1127                     <td>AttNames</td>
     1128                     <td>
     1129                        <code>_______________________11_______11________________________1111_____________</code>
     1130                     </td>
     1131                  </tr>
     1132                  <tr valign="top">
     1133                     <td>AttrVals</td>
     1134                     <td>
     1135                        <code>__________________________11111______11111_____________________111_________</code>
     1136                     </td>
     1137                  </tr>
     1138                  <tr valign="top">
     1139                     <td>EmptyTagEnds</td>
     1140                     <td>
     1141                        <code>___________________________________________________________________1_______</code>
     1142                     </td>
     1143                  </tr>
     1144                  <tr valign="top">
     1145                     <td>EndTags</td>
     1146                     <td>
     1147                        <code>_______________111______________________________111__________________11111_</code>
     1148                     </td>
     1149                  </tr>
     1150
     1151                  <tr valign="top">
     1152                     <td>Start/EmptyTags</td>
     1153                     <td>
     1154                        <code>_1111__11___________1111111111111111111111___________11111111111111________</code>
     1155                     </td>
     1156                  </tr>
     1157                  <tr valign="top">
     1158                     <td>ErrorFlag</td>
     1159                     <td>
     1160                        <code>___________________________________________________________________________</code>
     1161                     </td>
     1162                  </tr>
     1163               </tbody>
     1164            </table>
     1165
     1166         </section>
     1167      </section>
     1168   </section>
     1169   <section>
     1170      <title>SIMD Beyond Bitstreams: Names and Numbers</title>
     1171
     1172      <para>Whereas the fundamental innovation of our work is the use of SIMD technology in
     1173         implementing parallel bit streams for XML, there are also important ways in which more
     1174         traditional byte-oriented SIMD operations can be useful in accelerating other aspects of
     1175         XML processing.</para>
     1176
     1177      <section>
     1178         <title>Name Lookup</title>
     1179         <para>Efficient symbol table mechanisms for looking up element and attribute names is
     1180            important for almost all XML processing applications. It is also an important technique
     1181            merely for assessing well-formedness of an XML document; rather than validating the
     1182            character-by-character composition of each occurrence of an XML name as it is
     1183            encountered, it is more efficient to validate all but the first occurrence by first
     1184            determining whether the name already exists in a table of prevalidated names.</para>
     1185
     1186         <para>The first symbol table mechanism deployed in the Parabix parser simply used the
     1187            hashmaps of the C++ standard template library, without deploying any SIMD technology.
     1188            However, with the overhead of character validation, transcoding and parsing dramatically
     1189            reduced by parallel bit stream technology, we found that symbol lookups then accounted
     1190            for about half of the remaining execution time in a statistics gathering application
     1191               [<xref linkend="CASCON08"/>]. Thus, symbol table processing was identified as a major
     1192            target for further performance improvement. </para>
     1193         <para> Our first effort to improve symbol table performance was to employ the splash tables
     1194            with cuckoo hashing as described by Ross [<xref linkend="Ross06"/>], using SIMD
     1195            technology for parallel bucket processing. Although this technique did turn out to have
     1196            the advantage of virtually constant-time performance even for very large vocabularies,
     1197            it was not particularly helpful for the relatively small vocabularies typically found in
     1198            XML document processing. </para>
     1199         <para> However, a second approach has been found to be quite useful, taking advantage of
     1200            parallel bit streams for cheap determination of symbol length. In essence, the length of
     1201            a name can be determined very cheaply using a single bit scan operation. This then makes
     1202            it possible to use length-sorted symbol table processing, as follows. First, the
     1203            occurrences of all names are stored in arrays indexed by length. Then the length-sorted
     1204            arrays may each be inserted into the symbol table in turn. The advantage of this is that
     1205            a separate loop may be written for each length. Length sorting makes for very efficient
     1206            name processing. For example hash value computations and name comparisons can be made by
     1207            loading multibyte values and performing appropriate shifting and masking operations,
     1208            without the need for a byte-at-a-time loop. In initial experiments, this length-sorting
     1209            approach was found to reduce symbol lookup cost by a factor of two. </para>
     1210         <para> Current research includes the application of SIMD technology to further enhance the
     1211            performance of length-sorted lookup. We have identified a promising technique for
     1212            parallel processing of multiple name occurrences using a parallel trie lookup technique.
     1213            Given an array of occurrences of names of a particular length, the first one, two or
     1214            four bytes of each name are gathered and stored in a linear array. SIMD techniques are
     1215            then used to compare these prefixes with the possible prefixes for the current position
     1216            within the trie. In general, a very small number of possibilities exist for each trie
     1217            node, allowing for fast linear search through all possibilities. Typically, the
     1218            parallelism is expected to exceed the number of possibilities to search through at each
     1219            node. With length-sorting to separate the top-level trie into many small subtries, we
     1220            expect only a single step of symbol lookup to be needed in most practical instances. </para>
     1221
     1222         <para>The gather step of this algorithm is actually a common technique in SIMD processing.
     1223            Instruction set support for gather operations is a likely future direction for SIMD
     1224            technology.</para>
     1225      </section>
     1226
     1227      <section>
     1228         <title>Numeric Processing</title>
     1229         <para> Many XML applications involve numeric data fields as attribute values or element
     1230            content. Although most current XML APIs uniformly return information to applications in
     1231            the form of character strings, it is reasonable to consider direct API support for
     1232            numeric conversions within a high-performance XML engine. With string to numeric
     1233            conversion such a common need, why leave it to application programmers? </para>
     1234         <para> High-performance string to numeric conversion using SIMD operations also can
     1235            considerably outperform the byte-at-a-time loops that most application programmers or
     1236            libraries might employ. A first step is reduction of ASCII bytes to corresponding
     1237            decimal nybbles using a SIMD packing operation. Then an inductive doubling algorithm
     1238            using SIMD operations may be employed. First, 16 sets of adjacent nybble values in the
     1239            range 0-9 can be combined in just a few SIMD operations to 16 byte values in the range
     1240            0-99. Then 8 sets of byte values may similarly be combined with further SIMD processing
     1241            to produce doublebyte values in the range 0-9999. Further combination of doublebyte
     1242            values into 32-bit integers and so on can also be performed using SIMD operations. </para>
     1243         <para> Using appropriate gather operations to bring numeric strings into appropriate array
     1244            structures, an XML engine could offer high-performance numeric conversion services to
     1245            XML application programmers. We expect this to be an important direction for our future
     1246            work, particularly in support of APIs that focus on direct conversion of XML data into
     1247            business objects. </para>
     1248
     1249      </section>
     1250   </section>
     1251
     1252   <section>
     1253      <title>APIs and Parallel Bit Streams</title>
     1254
     1255      <section>
     1256         <title>The ILAX Streaming API</title>
     1257         <para>The In-Line API for XML (ILAX) is the base API provided with the Parabix parser. It
     1258            is intended for low-level extensions compiled right into the engine, with minimum
     1259            possible overhead. It is similar to streaming event-based APIs such as SAX, but
     1260            implemented by inline substitution rather than using callbacks. In essence, an extension
     1261            programmer provides method bodies for event-processing methods declared internal to the
     1262            Parabix parsing engine, compiling the event processing code directly with the core code
     1263            of the engine. </para>
     1264         <para> Although ILAX can be used directly for application programming, its primary use is
     1265            for implementing engine extensions that support higher-level APIs. For example, the
     1266            implementation of C or C++ based streaming APIs based on the Expat [<xref
     1267               linkend="Expat"/>] or general SAX models can be quite directly implemented. C/C++ DOM
     1268            or other tree-based APIs can also be fairly directly implemented. However, delivering
     1269            Parabix performance to Java-based XML applications is challenging due to the
     1270            considerable overhead of crossing the Java Native Interface (JNI) boundary. This issue
     1271            is addressed with the Array Set Model (ASM) concept discussed in the following section. </para>
     1272         <para> With the recent development of parallel parsing using bitstream addition, it is
     1273            likely that the underlying ILAX interface of Parabix will change. In essence, ILAX
     1274            suffers the drawback of all event-based interfaces: they are fundamentally sequential in
     1275            number. As research continues, we expect efficient parallel methods building on parallel
     1276            bit stream foundations to move up the stack of XML processing requirements. Artificially
     1277            imposing sequential processing is thus expected to constrain further advances in XML
     1278            performance. </para>
     1279      </section>
     1280
     1281      <section>
     1282         <title>Efficient XML in Java Using Array Set Models</title>
     1283         <para> In our GML-to-SVG case study, we identified the lack of high-performance XML
     1284            processing solutions for Java to be of particular interest. Java byte code does not
     1285            provide access to the SIMD capabilities of the underlying machine architecture. Java
     1286            just-in-time compilers might be capable of using some SIMD facilities, but there is no
     1287            real prospect of conventional compiler technology translating byte-at-a-time algorithms
     1288            into parallel bit stream code. So the primary vehicle for delivering high-performance
     1289            XML processing is to call native parallel bit stream code written in C through JNI
     1290            capabilities. </para>
     1291         <para>However, each JNI call is expensive, so it is desirable to minimize the number of
     1292            calls and get as much work done during each call as possible. This mitigates against
     1293            direct implementation of streaming APIs in Java through one-to-one mappings to an
     1294            underlying streaming API in C. Instead, we have concentrated on gathering information on
     1295            the C side into data structures that can then be passed to the Java side. However, using
     1296            either C pointer-based structures or C++ objects is problematic because these are
     1297            difficult to interpret on the Java side and are not amenable to Java's automatic storage
     1298            management system. Similarly, Java objects cannot be conveniently created on the C side.
     1299            However, it is possible to transfer arrays of simple data values (bytes or integers)
     1300            between C and Java, so that makes a reasonable focus for bulk data communication between
     1301            C and Java. </para>
     1302         <para><emphasis>Array Set Models</emphasis> are array-based representations of information
     1303            representing an XML document in accord with XML InfoSet [<xref linkend="InfoSet"/>] or
     1304            other XML data models relevant to particular APIs. As well as providing a mechanism for
     1305            efficient bulk data communication across the JNI boundary, ASMs potentially have a
     1306            number of other benefits in high-performance XML processing. <itemizedlist>
     1307               <listitem>
     1308                  <para>Prefetching. Commodity processors commonly support hardware and/or software
     1309                     prefetching to ensure that data is available in a processor cache when it is
     1310                     needed. In general, prefetching is most effective in conjunction with the
     1311                     continuous sequential memory access patterns associated with array
     1312                  processing.</para>
     1313               </listitem>
     1314               <listitem>
     1315                  <para>DMA. Some processing environments provide Direct Memory Access (DMA)
     1316                     controllers for block data movement in parallel with computation. For example,
     1317                     the Cell Broadband Engine uses DMA controllers to move the data to and from the
     1318                     local stores of the synergistic processing units. Arrays of contiguous data
     1319                     elements are well suited to bulk data movement using DMA.</para>
     1320               </listitem>
     1321               <listitem>
     1322                  <para>SIMD. Single Instruction Multiple Data (SIMD) capabilities of modern
     1323                     processor instruction sets allow simultaneous application of particular
     1324                     instructions to sets of elements from parallel arrays. For effective use of
     1325                     SIMD capabilities, an SoA (Structure of Arrays) model is preferrable to an AoS
     1326                     (Array of Structures) model. </para>
     1327               </listitem>
     1328               <listitem>
     1329                  <para>Multicore processors. Array-oriented processing can enable the effective
     1330                     distribution of work to the individual cores of a multicore system in two
     1331                     distinct ways. First, provided that sequential dependencies can be minimized or
     1332                     eliminated, large arrays can be divided into separate segments to be processed
     1333                     in parallel on each core. Second, pipeline parallelism can be used to implement
     1334                     efficient multipass processing with each pass consisting of a processing kernel
     1335                     with array-based input and array-based output. </para>
     1336               </listitem>
     1337               <listitem>
     1338                  <para>Streaming buffers for large XML documents. In the event that an XML document
     1339                     is larger than can be reasonably represented entirely within processor memory,
     1340                     a buffer-based streaming model can be applied to work through a document using
     1341                     sliding windows over arrays of elements stored in document order. </para>
     1342               </listitem>
     1343
     1344            </itemizedlist>
     1345         </para>
     1346
     1347         <section>
     1348            <title>Saxon-B TinyTree Example</title>
     1349            <para>As a first example of the ASM concept, current work includes a proof-of-concept to
     1350               deliver a high-performance replacement for building the TinyTree data structure used
     1351               in Saxon-B 6.5.5, an open-source XSLT 2.0 processor written in Java [<xref
     1352                  linkend="Saxon"/>]. Although XSLT stylesheets may be cached for performance, the
     1353               caching of source XML documents is typically not possible. A new TinyTree object to
     1354               represent the XML source document is thus commonly constructed with each new query so
     1355               that the overall performance of simple queries on large source XML documents is
     1356               highly dependent on TinyTree build time. Indeed, in a study of Saxon-SA, the
     1357               commercial version of Saxon, query time was shown to be dominated by TinyTree build
     1358               time [<xref linkend="Kay08"/>]. Similar performance results are demonstrable for the
     1359               Saxon-B XSLT processor as well. </para>
     1360            <para> The Saxon-B processor studied is a pure Java solution, converting a SAX (Simple
     1361               API for XML) event stream into the TinyTree Java object using the efficient Aelfred
     1362               XML parser [<xref linkend="AElfred"/>]. The TinyTree structure is itself an
     1363               array-based structure mapping well suited to the ASM concept. It consists of six
     1364               parallel arrays of integers indexed on node number and containing one entry for each
     1365               node in the source document, with the exception of attribute and namespace nodes
     1366                  [<xref linkend="Saxon"/>]. Four of the arrays respectively provide node kind, name
     1367               code, depth, and next sibling information for each node, while the two others are
     1368               overloaded for different purposes based on node kind value. For example, in the
     1369               context of a text node , one of the overloaded arrays holds the text buffer offset
     1370               value whereas the other holds the text buffer length value. Attributes and namespaces
     1371               are represented using similiar parallel array of values. The stored TinyTree values
     1372               are primarily primitive Java types, however, object types such as Java Strings and
     1373               Java StringBuffers are also used to hold attribute values and comment values
     1374               respectively. </para>
     1375            <para> In addition to the TinyTree object, Saxon-B maintains a NamePool object which
     1376               represents a collection of XML name triplets. Each triplet is composed of a Namespace
     1377               URI, a Namespace prefix and a local name and encoded as an integer value known as a
     1378               namecode. Namecodes permit efficient name search and look-up using integer
     1379               comparison. Namecodes may also be subsequently decoded to recover namespace and local
     1380               name information. </para>
     1381            <para> Using the Parabix ILAX interface, a high-performance reimplementation of TinyTree
     1382               and NamePool data structures was built to compare with the Saxon-B implementation. In
     1383               fact, two functionally equivalent versions of the ASM java class were constructed. An
     1384               initial version was constructed based on a set of primitive Java arrays constructed
     1385               and allocated in the Java heap space via JNI New&lt;PrimitiveType&gt;Array
     1386               method call. In this version, the JVM garbage collector is aware of all memory
     1387               allocated in the native code. However, in this approach, large array copy operations
     1388               limited overall performance to approximately a 2X gain over the Saxon-B build time. </para>
     1389            <para>To further address the performance penalty imposed by copying large array values,
     1390               a second version of the ASM Java object was constructed based on natively backed
     1391               Direct Memory Byte Buffers [<xref linkend="JNI"/>]. In this version the JVM garbage
     1392               collector is unaware any native memory resources backing the Direct Memory Byte
     1393               Buffers. Large JNI-based copy operations are avoided; however, system memory must be
     1394               explicitly deallocated via a Java native method call. Using this approach, our
     1395               preliminary results show an approximate total 2.5X gain over Saxon-B build time.
     1396            </para>
     1397         </section>
     1398      </section>
     1399   </section>
     1400
     1401
     1402   <section>
     1403      <title>Compiler Technology</title>
     1404
     1405      <para> An important focus of our recent work is on the development of compiler technology to
     1406         automatically generate the low-level SIMD code necessary to implement bit stream processing
     1407         given suitable high-level specifications. This has several potential benefits. First, it
     1408         can eliminate the tedious and error-prone programming of bit stream operations in terms of
     1409         register-at-a-time SIMD operations. Second, compilation technology can automatically employ
     1410         a variety of performance improvement techniques that are difficult to apply manually. These
     1411         include algorithms for instruction scheduling and register allocation as well as
     1412         optimization techniques for common subexpression expression elimination and register
     1413         rematerialization among others. Third, compiler technology makes it easier to make changes
     1414         to the low-level code for reasons of perfective or adaptive maintenance.</para>
     1415
     1416      <para>Beyond these reasons, compiler technology also offers the opportunity for retargetting
     1417         the generation of code to accommodate different processor architectures and API
     1418         requirements. Strategies for efficient parallel bit stream code can vary considerably
     1419         depending on processor resources such as the number of registers available, the particular
     1420         instruction set architecture supported, the size of L1 and L2 data caches, the number of
     1421         available cores and so on. Separate implementation of custom code for each processor
     1422         architecture would thus be likely to be prohibitively expensive, prone to errors and
     1423         inconsistencies and difficult to maintain. Using compilation technology, however, the idea
     1424         would be to implement a variety of processor-specific back-ends all using a common front
     1425         end based on parallel bit streams. </para>
     1426
     1427      <section>
     1428         <title>Character Class Compiler</title>
     1429
     1430         <para>The first compiler component that we have implemented is a character class compiler,
     1431            capable of generation all the bit stream logic necessary to produce a set of lexical
     1432            item streams each corresponding to some particular set of characters to be recognized.
     1433            By taking advantage of common patterns between characters within classes, and special
     1434            optimization logic for recognizing character-class ranges, our existing compiler is able
     1435            to generate well-optimized code for complex sets of character classes involving numbers
     1436            of special characters as well as characters within specific sets of ranges. </para>
     1437
    1711438      </section>
    1721439      <section>
    173         <title>Tomatoes</title>
    174         <para>Also new to the colonists was the tomato. (<emphasis role="ital">See</emphasis>
    175           <xref linkend="Tomatoes"/>.) Tomatoes were described as thin-skinned succulent fruits with
    176           pulpy interiors. Almost <quote>Bristol red</quote> in color, the fruit was somewhat round
    177           in form and the size of a chicken’s egg — not anything approaching the size of modern,
    178           cultivated varieties. Of particular importance was the plant’s fecundity. The flowering,
    179           bush-like plant bore fruit over a period of months. Thus, a plant could produce as many as
    180           15 tomatoes at a given time.</para>
    181         <figure xml:id="Tomatoes">
    182           <title>Tomatoes</title>
    183           <mediaobject>
    184             <imageobject>
    185               <imagedata format="jpg" fileref="19450212-1.jpg" width="50%"/>
    186             </imageobject>
    187           </mediaobject>
    188           <caption>
    189             <para>Tomatoes were among the new fruits discovered in the New World.</para>
    190           </caption>
    191         </figure>
     1440         <title>Regular Expression Compilation</title>
     1441
     1442         <para>Based on the character class compiler, we are currently investigating the
     1443            construction of a regular expression compiler that can implement bit-stream based
     1444            parallel regular-expression matching similar to that describe previously for parallel
     1445            parsing by bistream addition. This compiler works with the assumption that bitstream
     1446            regular-expression definitions are deterministic; no backtracking is permitted with the
     1447            parallel bit stream representation. In XML applications, this compiler is primarily
     1448            intended to enforce regular-expression constraints on string datatype specifications
     1449            found in XML schema. </para>
     1450
    1921451      </section>
     1452
    1931453      <section>
    194         <title>Potatoes</title>
    195         <para>Another root that proved beneficial was the potato. Similarly, to the
    196             <quote>cassida</quote> mentioned earlier, potatoes were considered roots rather than
    197           plants, as the edible portion of the plant lay underground. Given the curious nature of
    198           this legume, several examples were brought back from the New World. In fact, Raleigh
    199           attempted to cultivate the potato at his estate, Youghall, in Ireland.</para>
     1454         <title>Unbounded Bit Stream Compilation</title>
     1455
     1456         <para>The Catalog of XML Bit Streams presented earlier consist of a set of abstract,
     1457            unbounded bit streams, each in one-to-one correspondence with input bytes of a text
     1458            file. Determining how these bit streams are implemented using fixed-width SIMD
     1459            registers, and possibly processed in fixed-length buffers that represent some multiple
     1460            of the register width is a source of considerable programming complexity. The general
     1461            goal of our compilation strategy in this case is to allow operations to be programmed in
     1462            terms of unbounded bit streams and then automatically reduced to efficient low-level
     1463            code with the application of a systematic code generation strategy for handling block
     1464            and buffer boundary crossing. This is work currently in progress. </para>
     1465
    2001466      </section>
    201       <section>
    202         <title>Fruits and Nuts</title>
    203         <para>The colony abounded with a wealth of fruits and nuts, some not previously known to the
    204           Europeans. In addition to mulberries, strawberries, and blueberries already mentioned, one
    205           of the more curious fruits found was called <quote>medlar</quote> by the natives. Medlar
    206           was a fruit not unlike cherries in size and color, but with a much sweeter taste. An
    207           equally unusual fruit was <quote>metaqvesvnnaqvk</quote>; notwithstanding its red fruit,
    208           that plant’s more important feature lay in the cochinile insects which fed upon its
    209           prickly thick leaves <footnote xml:id="mul90">
    210             <para>In the 16th century, such insects were prized in the making of a vibrant red
    211               dye.</para>
    212           </footnote>.</para>
    213         <para>Equally abundant in variety and size were nuts. In addition to chestnuts and walnuts,
    214           the natives <quote>harvested</quote> no less than five types of <quote>acorns</quote>.
    215           These somewhat small nuts were either dried in a manner similar to that used in England
    216           for malt, or were boiled for broth.</para>
    217         <table>
    218           <caption>
    219             <para>North American Native Plants</para>
    220           </caption>
    221           <col align="right" valign="top"/>
    222           <col valign="top"/>
    223           <col align="center" valign="top"/>
    224           <thead>
    225             <tr valign="top">
    226               <th>Use</th>
    227               <th>Plant Part</th>
    228               <th>Examples</th>
    229             </tr>
    230           </thead>
    231           <tbody>
    232             <tr valign="top">
    233               <td rowspan="8"><emphasis role="bold">Vegetables</emphasis></td>
    234             </tr>
    235             <tr valign="top">
    236               <td rowspan="2">Seeds</td>
    237               <td>Corn</td>
    238             </tr>
    239             <tr valign="top">
    240               <td>Beans</td>
    241             </tr>
    242             <tr valign="top">
    243               <td rowspan="3">Fruits</td>
    244               <td>Squash</td>
    245             </tr>
    246             <tr valign="top">
    247               <td>Peppers</td>
    248             </tr>
    249             <tr valign="top">
    250               <td>Tomatoes</td>
    251             </tr>
    252             <tr valign="top">
    253               <td rowspan="2">Roots</td>
    254               <td>Potatoes</td>
    255             </tr>
    256             <tr valign="top">
    257               <td>Sweet Potatoes</td>
    258             </tr>
    259             <tr valign="top">
    260               <td rowspan="4"><emphasis role="bold">Teas</emphasis></td>
    261             </tr>
    262             <tr valign="top">
    263               <td rowspan="2">Leaves</td>
    264               <td>Mountain Mint (Namewuskons)</td>
    265             </tr>
    266             <tr valign="top">
    267               <td>Dawn Mint (Wabinowusk)</td>
    268             </tr>
    269             <tr valign="top">
    270               <td>Berries</td>
    271               <td>Wintergreen</td>
    272             </tr>
    273           </tbody>
    274         </table>
    275       </section>
    276     </section>
    277   </section>
    278   <appendix xml:id="mul83">
    279     <title>The First Expeditionary Force’s Colony, 1585-1586 <footnote xml:id="mul5">
    280         <para>This Appendix contains an actual list of the individuals who sailed to the Roanoke
    281           Colony in 1585. (<emphasis role="ital">See</emphasis>
    282           <citation>Durant, David N., <quote>Raleigh’s Lost Colony</quote>, Appendix I, Atheneum,
    283             NY: 1981.</citation>)</para>
    284       </footnote>
    285     </title>
    286     <itemizedlist>
    287       <listitem>
    288         <para>Master Philip Amades, Admirall of the countrie</para>
    289       </listitem>
    290       <listitem>
    291         <para>Master Hariot</para>
    292       </listitem>
    293       <listitem>
    294         <para>Master Acton</para>
    295       </listitem>
    296       <listitem>
    297         <para>Master Edward Stafford</para>
    298       </listitem>
    299       <listitem>
    300         <para>Thomas Luddington</para>
    301       </listitem>
    302       <listitem>
    303         <para>Master Marvyn</para>
    304       </listitem>
    305       <listitem>
    306         <para>Master Gardyner</para>
    307       </listitem>
    308       <listitem>
    309         <para>Captaine Vaughan</para>
    310       </listitem>
    311       <listitem>
    312         <para>Master Kendall</para>
    313       </listitem>
    314       <listitem>
    315         <para>Master Prideox</para>
    316       </listitem>
    317       <listitem>
    318         <para>Robert Holecroft</para>
    319       </listitem>
    320       <listitem>
    321         <para>Rise Courtney</para>
    322       </listitem>
    323       <listitem>
    324         <para>Master Hugh Rogers</para>
    325       </listitem>
    326       <listitem>
    327         <para>Thomas Foxe</para>
    328       </listitem>
    329       <listitem>
    330         <para>Edward Nugent</para>
    331       </listitem>
    332       <listitem>
    333         <para>Darby Glande</para>
    334       </listitem>
    335       <listitem>
    336         <para>Edward Kelle</para>
    337       </listitem>
    338       <listitem>
    339         <para>John Gostigo</para>
    340       </listitem>
    341       <listitem>
    342         <para>Erasmus Clefs</para>
    343       </listitem>
    344       <listitem>
    345         <para>Edward Ketcheman</para>
    346       </listitem>
    347       <listitem>
    348         <para>John Linsey</para>
    349       </listitem>
    350       <listitem>
    351         <para>Thomas Rottenbury</para>
    352       </listitem>
    353       <listitem>
    354         <para>Roger Deane</para>
    355       </listitem>
    356       <listitem>
    357         <para>John Harris</para>
    358       </listitem>
    359       <listitem>
    360         <para>Frauncis Norris</para>
    361       </listitem>
    362       <listitem>
    363         <para>Matthewe Lyne</para>
    364       </listitem>
    365       <listitem>
    366         <para>Edward Kettell</para>
    367       </listitem>
    368       <listitem>
    369         <para>Thomas Wisse</para>
    370       </listitem>
    371       <listitem>
    372         <para>Master Thomas Harvie</para>
    373       </listitem>
    374       <listitem>
    375         <para>Master Snelling</para>
    376       </listitem>
    377       <listitem>
    378         <para>Master Anthony Russe</para>
    379       </listitem>
    380       <listitem>
    381         <para>Master Allyne</para>
    382       </listitem>
    383       <listitem>
    384         <para>Master Michel Polyson</para>
    385       </listitem>
    386       <listitem>
    387         <para>John Cage</para>
    388       </listitem>
    389       <listitem>
    390         <para>Thomas Parre</para>
    391       </listitem>
    392       <listitem>
    393         <para>William Randes</para>
    394       </listitem>
    395       <listitem>
    396         <para>Geffrey Churchman</para>
    397       </listitem>
    398       <listitem>
    399         <para>William Farthowe</para>
    400       </listitem>
    401       <listitem>
    402         <para>John Taylor</para>
    403       </listitem>
    404       <listitem>
    405         <para>Philppe Robyns</para>
    406       </listitem>
    407       <listitem>
    408         <para>Thomas Phillipes</para>
    409       </listitem>
    410       <listitem>
    411         <para>Valentine Beale</para>
    412       </listitem>
    413       <listitem>
    414         <para>James Skinner</para>
    415       </listitem>
    416       <listitem>
    417         <para>George Eseven</para>
    418       </listitem>
    419       <listitem>
    420         <para>John Chaundeler</para>
    421       </listitem>
    422       <listitem>
    423         <para>Philip Blunt</para>
    424       </listitem>
    425       <listitem>
    426         <para>Richard Poore</para>
    427       </listitem>
    428       <listitem>
    429         <para>Robert Yong</para>
    430       </listitem>
    431       <listitem>
    432         <para>Marmaduke Constable</para>
    433       </listitem>
    434       <listitem>
    435         <para>Thomas Hesket</para>
    436       </listitem>
    437       <listitem>
    438         <para>William Wasse</para>
    439       </listitem>
    440       <listitem>
    441         <para>John Fever</para>
    442       </listitem>
    443       <listitem>
    444         <para>Daniel</para>
    445       </listitem>
    446       <listitem>
    447         <para>Thomas Taylor</para>
    448       </listitem>
    449       <listitem>
    450         <para>Richard Humfrey</para>
    451       </listitem>
    452       <listitem>
    453         <para>John Wright</para>
    454       </listitem>
    455       <listitem>
    456         <para>Gabriell North</para>
    457       </listitem>
    458       <listitem>
    459         <para>Robert Biscombe</para>
    460       </listitem>
    461       <listitem>
    462         <para>William Backhouse</para>
    463       </listitem>
    464       <listitem>
    465         <para>William White</para>
    466       </listitem>
    467       <listitem>
    468         <para>Henry Potkin</para>
    469       </listitem>
    470       <listitem>
    471         <para>Dennis Barnes</para>
    472       </listitem>
    473       <listitem>
    474         <para>Joseph Borges</para>
    475       </listitem>
    476       <listitem>
    477         <para>Doughan Gannes</para>
    478       </listitem>
    479       <listitem>
    480         <para>William Tenche</para>
    481       </listitem>
    482       <listitem>
    483         <para>Randall Latham</para>
    484       </listitem>
    485       <listitem>
    486         <para>Thomas Hulme</para>
    487       </listitem>
    488       <listitem>
    489         <para>Walter Myll</para>
    490       </listitem>
    491       <listitem>
    492         <para>Richard Gilbert</para>
    493       </listitem>
    494       <listitem>
    495         <para>Steven Pomarie</para>
    496       </listitem>
    497       <listitem>
    498         <para>John Brocke</para>
    499       </listitem>
    500       <listitem>
    501         <para>Bennet Harrye</para>
    502       </listitem>
    503       <listitem>
    504         <para>James Stevenson</para>
    505       </listitem>
    506       <listitem>
    507         <para>Charles Stevenson</para>
    508       </listitem>
    509       <listitem>
    510         <para>Christopher Lowde</para>
    511       </listitem>
    512       <listitem>
    513         <para>Jeremie Man</para>
    514       </listitem>
    515       <listitem>
    516         <para>James Mason</para>
    517       </listitem>
    518       <listitem>
    519         <para>David Salter</para>
    520       </listitem>
    521       <listitem>
    522         <para>Richard Ireland</para>
    523       </listitem>
    524       <listitem>
    525         <para>Thomas Bookener</para>
    526       </listitem>
    527       <listitem>
    528         <para>William Philippes</para>
    529       </listitem>
    530       <listitem>
    531         <para>Randall Mayne</para>
    532       </listitem>
    533       <listitem>
    534         <para>Bennet Chappell</para>
    535       </listitem>
    536       <listitem>
    537         <para>Richard Sare</para>
    538       </listitem>
    539       <listitem>
    540         <para>James Lasie</para>
    541       </listitem>
    542       <listitem>
    543         <para>Smolkin</para>
    544       </listitem>
    545       <listitem>
    546         <para>Thomas Smart</para>
    547       </listitem>
    548       <listitem>
    549         <para>Robert</para>
    550       </listitem>
    551       <listitem>
    552         <para>John Evans</para>
    553       </listitem>
    554       <listitem>
    555         <para>Roger Large</para>
    556       </listitem>
    557       <listitem>
    558         <para>Humfrey Garden</para>
    559       </listitem>
    560       <listitem>
    561         <para>Frauncis Whitton</para>
    562       </listitem>
    563       <listitem>
    564         <para>Rowland Griffyn</para>
    565       </listitem>
    566       <listitem>
    567         <para>William Millard</para>
    568       </listitem>
    569       <listitem>
    570         <para>John Twyt</para>
    571       </listitem>
    572       <listitem>
    573         <para>Edwarde Seklemore</para>
    574       </listitem>
    575       <listitem>
    576         <para>John Anwike</para>
    577       </listitem>
    578       <listitem>
    579         <para>Christopher Marshall</para>
    580       </listitem>
    581       <listitem>
    582         <para>David Williams</para>
    583       </listitem>
    584       <listitem>
    585         <para>Nicholas Swabber</para>
    586       </listitem>
    587       <listitem>
    588         <para>Edward Chipping</para>
    589       </listitem>
    590       <listitem>
    591         <para>Sylvester Beching</para>
    592       </listitem>
    593       <listitem>
    594         <para>Vincent Cheyne</para>
    595       </listitem>
    596       <listitem>
    597         <para>Haunce Walters</para>
    598       </listitem>
    599       <listitem>
    600         <para>Edward Barecombe</para>
    601       </listitem>
    602       <listitem>
    603         <para>Thomas Skevelabs</para>
    604       </listitem>
    605       <listitem>
    606         <para>William Walters</para>
    607       </listitem>
    608     </itemizedlist>
    609   </appendix>
    610   <appendix xml:id="mul88">
    611     <title>The Roanoke Colony, 1587 <footnote xml:id="mul4">
    612         <para>This Appendix contains an actual list of the individuals who sailed to the Roanoke
    613           Colony in 1587, as well as the names of children born at the colony. (<emphasis
    614             role="ital">See</emphasis>
    615           <citation>Durant, David N., <quote>Raleigh’s Lost Colony</quote>, Appendix I, Atheneum,
    616             NY: 1981.</citation>)</para>
    617       </footnote>
    618     </title>
    619     <itemizedlist>
    620       <listitem>
    621         <para>Men</para>
    622         <itemizedlist>
    623           <listitem>
    624             <para>John White [Governor]</para>
    625           </listitem>
    626           <listitem>
    627             <para>Roger Bailie [Assistant]</para>
    628           </listitem>
    629           <listitem>
    630             <para>Ananias Dare [Assistant]</para>
    631           </listitem>
    632           <listitem>
    633             <para>Christopher Cooper [Assistant]</para>
    634           </listitem>
    635           <listitem>
    636             <para>Thomas Stevens [Assistant]</para>
    637           </listitem>
    638           <listitem>
    639             <para>John Sampson [Assistant]</para>
    640           </listitem>
    641           <listitem>
    642             <para>Dyonis Harvie [Assistant]</para>
    643           </listitem>
    644           <listitem>
    645             <para>Roger Prat [Assistant]</para>
    646           </listitem>
    647           <listitem>
    648             <para>George Howe [Assistant]</para>
    649           </listitem>
    650           <listitem>
    651             <para>Simon Fernando [Assistant]</para>
    652           </listitem>
    653           <listitem>
    654             <para>William Willes</para>
    655           </listitem>
    656           <listitem>
    657             <para>John Brooke</para>
    658           </listitem>
    659           <listitem>
    660             <para>Cutbert White</para>
    661           </listitem>
    662           <listitem>
    663             <para>John Bright</para>
    664           </listitem>
    665           <listitem>
    666             <para>Clement Tayler</para>
    667           </listitem>
    668           <listitem>
    669             <para>William Sole</para>
    670           </listitem>
    671           <listitem>
    672             <para>John Cotsmur</para>
    673           </listitem>
    674           <listitem>
    675             <para>Humfrey Newton</para>
    676           </listitem>
    677           <listitem>
    678             <para>Thomas Colman</para>
    679           </listitem>
    680           <listitem>
    681             <para>Thomas Gramme</para>
    682           </listitem>
    683           <listitem>
    684             <para>Nicholas Johnson</para>
    685           </listitem>
    686           <listitem>
    687             <para>Thomas Warner</para>
    688           </listitem>
    689           <listitem>
    690             <para>Anthony Cage</para>
    691           </listitem>
    692           <listitem>
    693             <para>John Jones</para>
    694           </listitem>
    695           <listitem>
    696             <para>John Tydway</para>
    697           </listitem>
    698           <listitem>
    699             <para>Ambrose Viccars</para>
    700           </listitem>
    701           <listitem>
    702             <para>Edmond English</para>
    703           </listitem>
    704           <listitem>
    705             <para>Thomas Topan</para>
    706           </listitem>
    707           <listitem>
    708             <para>Henry Berrye</para>
    709           </listitem>
    710           <listitem>
    711             <para>Richard Berrye</para>
    712           </listitem>
    713           <listitem>
    714             <para>John Spendlove</para>
    715           </listitem>
    716           <listitem>
    717             <para>John Hemmington</para>
    718           </listitem>
    719           <listitem>
    720             <para>Thomas Butler</para>
    721           </listitem>
    722           <listitem>
    723             <para>Edward Powell</para>
    724           </listitem>
    725           <listitem>
    726             <para>John Burden</para>
    727           </listitem>
    728           <listitem>
    729             <para>James Hynde</para>
    730           </listitem>
    731           <listitem>
    732             <para>Thomas Ellis</para>
    733           </listitem>
    734           <listitem>
    735             <para>William Browne</para>
    736           </listitem>
    737           <listitem>
    738             <para>Michael Myllet</para>
    739           </listitem>
    740           <listitem>
    741             <para>Thomas Smith</para>
    742           </listitem>
    743           <listitem>
    744             <para>Richard Kemme</para>
    745           </listitem>
    746           <listitem>
    747             <para>Thomas Harris</para>
    748           </listitem>
    749           <listitem>
    750             <para>Richard Taverner</para>
    751           </listitem>
    752           <listitem>
    753             <para>John Earnest</para>
    754           </listitem>
    755           <listitem>
    756             <para>Henry Johnson</para>
    757           </listitem>
    758           <listitem>
    759             <para>John Starte</para>
    760           </listitem>
    761           <listitem>
    762             <para>Richard Darige</para>
    763           </listitem>
    764           <listitem>
    765             <para>William Lucas</para>
    766           </listitem>
    767           <listitem>
    768             <para>Arnold Archard</para>
    769           </listitem>
    770           <listitem>
    771             <para>John Wright</para>
    772           </listitem>
    773           <listitem>
    774             <para>William Dutton</para>
    775           </listitem>
    776           <listitem>
    777             <para>Morris Allen</para>
    778           </listitem>
    779           <listitem>
    780             <para>William Waters</para>
    781           </listitem>
    782           <listitem>
    783             <para>Richard Arthur</para>
    784           </listitem>
    785           <listitem>
    786             <para>John Chapman</para>
    787           </listitem>
    788           <listitem>
    789             <para>William Clement</para>
    790           </listitem>
    791           <listitem>
    792             <para>Robert Little</para>
    793           </listitem>
    794           <listitem>
    795             <para>Hugh Tayler</para>
    796           </listitem>
    797           <listitem>
    798             <para>Richard Wildye</para>
    799           </listitem>
    800           <listitem>
    801             <para>Lewes Wotton</para>
    802           </listitem>
    803           <listitem>
    804             <para>Michael Bishop</para>
    805           </listitem>
    806           <listitem>
    807             <para>Henry Browne</para>
    808           </listitem>
    809           <listitem>
    810             <para>Marke Bennet</para>
    811           </listitem>
    812           <listitem>
    813             <para>John Gibbes</para>
    814           </listitem>
    815           <listitem>
    816             <para>John Stilman</para>
    817           </listitem>
    818           <listitem>
    819             <para>Robert Wilkinson</para>
    820           </listitem>
    821           <listitem>
    822             <para>Peter Little</para>
    823           </listitem>
    824           <listitem>
    825             <para>John Wyles</para>
    826           </listitem>
    827           <listitem>
    828             <para>Brian Wyles</para>
    829           </listitem>
    830           <listitem>
    831             <para>George Martyn</para>
    832           </listitem>
    833           <listitem>
    834             <para>Hugh Pattenson</para>
    835           </listitem>
    836           <listitem>
    837             <para>Martyn Sutton</para>
    838           </listitem>
    839           <listitem>
    840             <para>John Farre</para>
    841           </listitem>
    842           <listitem>
    843             <para>John Bridger</para>
    844           </listitem>
    845           <listitem>
    846             <para>Griffen Jones</para>
    847           </listitem>
    848           <listitem>
    849             <para>Richard Shaberdge</para>
    850           </listitem>
    851           <listitem>
    852             <para>James Lasie</para>
    853           </listitem>
    854           <listitem>
    855             <para>John Cheven</para>
    856           </listitem>
    857           <listitem>
    858             <para>Thomas Hewet</para>
    859           </listitem>
    860           <listitem>
    861             <para>William Berde</para>
    862           </listitem>
    863           <listitem>
    864             <para>Henry Rufoote</para>
    865           </listitem>
    866           <listitem>
    867             <para>Richard Tomkins</para>
    868           </listitem>
    869           <listitem>
    870             <para>Henry Dorrell</para>
    871           </listitem>
    872           <listitem>
    873             <para>Charles Florrie</para>
    874           </listitem>
    875           <listitem>
    876             <para>Henry Mylton</para>
    877           </listitem>
    878           <listitem>
    879             <para>Henry Payne</para>
    880           </listitem>
    881           <listitem>
    882             <para>Thomas Harris</para>
    883           </listitem>
    884           <listitem>
    885             <para>William Nicholes</para>
    886           </listitem>
    887           <listitem>
    888             <para>Thomas Phevens</para>
    889           </listitem>
    890           <listitem>
    891             <para>John Borden</para>
    892           </listitem>
    893           <listitem>
    894             <para>Thomas Scot</para>
    895           </listitem>
    896         </itemizedlist>
    897       </listitem>
    898       <listitem>
    899         <para>Women</para>
    900         <itemizedlist>
    901           <listitem>
    902             <para>Elyoner Dare</para>
    903           </listitem>
    904           <listitem>
    905             <para>Margery Harvie</para>
    906           </listitem>
    907           <listitem>
    908             <para>Agnes Wood</para>
    909           </listitem>
    910           <listitem>
    911             <para>Wenefrid Powell</para>
    912           </listitem>
    913           <listitem>
    914             <para>Joyce Archard</para>
    915           </listitem>
    916           <listitem>
    917             <para>Jane Jones</para>
    918           </listitem>
    919           <listitem>
    920             <para>Elizabeth Glane</para>
    921           </listitem>
    922           <listitem>
    923             <para>Jane Pierce</para>
    924           </listitem>
    925           <listitem>
    926             <para>Audry Tappan</para>
    927           </listitem>
    928           <listitem>
    929             <para>Alis Chapman</para>
    930           </listitem>
    931           <listitem>
    932             <para>Emme Merrimoth</para>
    933           </listitem>
    934           <listitem>
    935             <para>Colman</para>
    936           </listitem>
    937           <listitem>
    938             <para>Margaret Lawrence</para>
    939           </listitem>
    940           <listitem>
    941             <para>Joan Warren</para>
    942           </listitem>
    943           <listitem>
    944             <para>Jane Mannering</para>
    945           </listitem>
    946           <listitem>
    947             <para>Rose Payne</para>
    948           </listitem>
    949           <listitem>
    950             <para>Elizabeth Viccars</para>
    951           </listitem>
    952         </itemizedlist>
    953       </listitem>
    954       <listitem>
    955         <para>Children</para>
    956         <itemizedlist>
    957           <listitem>
    958             <para>John Sampson</para>
    959           </listitem>
    960           <listitem>
    961             <para>Robert Ellis</para>
    962           </listitem>
    963           <listitem>
    964             <para>Ambrose Viccars</para>
    965           </listitem>
    966           <listitem>
    967             <para>Thomas Archard</para>
    968           </listitem>
    969           <listitem>
    970             <para>Thomas Humfrey</para>
    971           </listitem>
    972           <listitem>
    973             <para>Tomas Smart</para>
    974           </listitem>
    975           <listitem>
    976             <para>George Howe</para>
    977           </listitem>
    978           <listitem>
    979             <para>John Prat</para>
    980           </listitem>
    981           <listitem>
    982             <para>William Wythers</para>
    983           </listitem>
    984         </itemizedlist>
    985       </listitem>
    986       <listitem>
    987         <para>Children Born at the Colony</para>
    988         <itemizedlist>
    989           <listitem>
    990             <para>Virginia Dare</para>
    991           </listitem>
    992           <listitem>
    993             <para>Harvye</para>
    994           </listitem>
    995         </itemizedlist>
    996       </listitem>
    997       <listitem>
    998         <para>Native Peoples (who having been in England returned to the colony)</para>
    999         <itemizedlist>
    1000           <listitem>
    1001             <para>Manteo</para>
    1002           </listitem>
    1003           <listitem>
    1004             <para>Towaye</para>
    1005           </listitem>
    1006         </itemizedlist>
    1007       </listitem>
    1008     </itemizedlist>
    1009   </appendix>
    1010   <bibliography>
    1011     <title>Bibliography</title>
    1012     <bibliomixed xml:id="thorpe97" xreflabel="Thorpe 1997">Thorpe, Francis Newton, ed.<emphasis
    1013         role="ital">Charter to Sir Walter Raleigh: 1584,</emphasis> in <quote>The Federal and States
    1014         Constitutions, Colonial Charters, and Other Organic Laws of the States, Territories, and
    1015         Colonies Now or Heretofore Forming the United States of America</quote> (compiled under Act
    1016       of Congress of June 30, 1906). [online]. Washington, D.C.: Government Printing Office, 1909.
    1017       The Avalon Project. Co-Dir. William C. Fray &amp; Lisa A. Span. © 1997 [cited 8 Apr
    1018         1998].<link>http://www.yale.edu/lawweb/avalon/raleigh.htm</link>.</bibliomixed>
    1019     <bibliomixed xml:id="mul94" xreflabel="DeFord 1997">DeFord, Susan. <emphasis role="ital"
    1020         >Tobacco: The Noxious Weed That Built a Nation,</emphasis> Special to <quote>The Washington
    1021         Post</quote> (May 14, 1997) at H01. [online]. © 1997 [cited 29 May 1998].
    1022         <link>http://www.washingtonpost.com/wp~srv/intereact/longterm/horizon/051497/tobacco.htm</link>.</bibliomixed>
    1023     <bibliomixed xml:id="mul95" xreflabel="Tobacco"><emphasis role="ital">Part 2. The Story of the
    1024         Arrival and First Uses of Tobacco in Europe,</emphasis> in <quote>The History of
    1025         Tobacco</quote>. [online]. [cited 14 Apr
    1026         1998].<link>http://www.tobacco.co.uk/library/history.html#part2</link>.</bibliomixed>
    1027     <bibliomixed xml:id="mul96" xreflabel="Lane">Lane, Ralph. <quote>The Colony at Roanoake —
    1028         1586</quote>. [online]. [cited 13 Apr
    1029         1998].<link>http://www.nationalcenter.inter.net/ColonyofRoanoke.html</link>.</bibliomixed>
    1030     <bibliomixed xml:id="mul97" xreflabel="Prindle 1996a">Prindle, Tara.<emphasis role="ital">Indian
    1031         Corn,</emphasis> excerpted from <quote>Teaching about Thanksgiving</quote>. [online]. Fourth
    1032       World Documentation Project. Native American Technology and Art. © 1996 [cited 14 Apr
    1033         1998].<link>http://www.lib.uconn.edu/NativeTech/cornhusk/cornfwdp.html</link>.</bibliomixed>
    1034     <bibliomixed xml:id="mul98" xreflabel="Prindle 1996b">Prindle, Tara. <quote>Native American
    1035         History of Corn</quote>. [online]. Native American Technology and Art. © 1996 [cited 14 Apr
    1036         1998].<link>http://www.lib.uconn.edu/NativeTech/cornhusk/cornhusk.html</link>.</bibliomixed>
    1037     <bibliomixed xml:id="mul99" xreflabel="Dickerson 1997">Dickerson, George. <quote>Beyond Turkey:
    1038         Corn Shaped History, Cuisines</quote>. [online]. Ed. D’Lyn Ford. New Mexico State University
    1039       College of Agriculture &amp; Home Economics, 11 Nov 1997 [cited 14 Apr
    1040         1998].<link>http://elroy.nmsu.edu/CAHE/press/corn_history1997.html</link>.</bibliomixed>
    1041     <bibliomixed xml:id="mul100" xreflabel="Borio 1954">Borio, Gene.<emphasis role="ital">Smoking in
    1042         England — Elizabethan,</emphasis> excerpted from Alfred H. Dunhill’s <quote>The Gentle Art
    1043         of Smoking</quote>. [online]. Library of Congress catalog card #54-10495 (1954). The Tobacco
    1044       BBS [cited 20 May
    1045       1998].<link>http://www.tobacco.org/History/Elizabethan_smoking.html</link>.</bibliomixed>
    1046     <bibliomixed xml:id="mul101" xreflabel="Sellement"><quote>First English Settlement in the New
    1047         World</quote>. [online]. [cited 13 Apr
    1048         1998].<link>http://hal.dcr.state.nc.us/nc/ncsites/English1.htm#First</link>.</bibliomixed>
    1049     <bibliomixed xml:id="mul-102" xreflabel="ICW-Net 1998"><emphasis role="ital">Virginia
    1050         Dare,</emphasis> in <quote>Tales from the Coast!</quote> [online]. ICW-NET © 1998 [cited 13
    1051       Apr 1998].<link>http://www.outerbanks-nc.com/manteo/history/vadare.htm</link>.</bibliomixed>
    1052   </bibliography>
     1467   </section>
     1468
     1469   <section>
     1470      <title>Conclusion</title>
     1471      <para>Parallel bit stream technology offers the opportunity to dramatically speed up the core
     1472         XML processing components used to implement virtually any XML API. Character validation and
     1473         transcoding, whitespace processing, and parsing up to including the full validation of tag
     1474         syntax can be handled fully in parallel using bit stream methods. Bit streams to mark the
     1475         positions of all element names, attribute names and attribute values can also be produced,
     1476         followed by fast bit scan operations to generate position and length values. Beyond bit
     1477         streams, byte-oriented SIMD processing of names and numerals can also accelerate
     1478         performance beyond sequential byte-at-a-time methods. </para>
     1479      <para>Advances in processor architecture are likely to further amplify the performance of
     1480         parallel bit stream technology over traditional byte-at-a-time processing over the next
     1481         decade. Improvements to SIMD register width, register complement and operation format can
     1482         all result in further gains. New SIMD instruction set features such as inductive doubling
     1483         support, parallel extract and deposit instructions, bit interleaving and scatter/gather
     1484         capabilities should also result in significant speed-ups. Leveraging the intraregister
     1485         parallelism of parallel bit stream technology within SIMD registers to take of intrachip
     1486         parallelism on multicore processors should accelerate processing further. </para>
     1487      <para>Technology transfer using a patent-based open-source business model is a further goal of
     1488         our work with a view to widespread deployment of parallel bit stream technology in XML
     1489         processing stacks implementing a variety of APIs. The feasibility of substantial
     1490         performance improvement in replacement of technology implementing existing APIs has been
     1491         demonstrated even in complex software architectures involving delivery of performance
     1492         benefits across the JNI boundary. We are seeking to accelerate these deployment efforts
     1493         both through the development of compiler technology to reliably apply these methods to a
     1494         variety of architectures as well as to identify interested collaborators using open-source
     1495         or commercial models. </para>
     1496   </section>
     1497
     1498   <section>
     1499      <title>Acknowledgments</title>
     1500      <para>This work is supported in part by research grants and scholarships from the Natural
     1501         Sciences and Engineering Research Council of Canada, the Mathematics of Information
     1502         Technology and Complex Systems Network and the British Columbia Innovation Council. </para>
     1503      <para>We thank our colleague Dan Lin (Linda) for her work in high-performance symbol table
     1504         processing. </para>
     1505   </section>
     1506
     1507   <bibliography>
     1508      <title>Bibliography</title>
     1509      <bibliomixed xml:id="XMLChip09" xreflabel="Leventhal and Lemoine 2009">Leventhal, Michael and
     1510         Eric Lemoine 2009. The XML chip at 6 years. Proceedings of International Symposium on
     1511         Processing XML Efficiently 2009, Montréal.</bibliomixed>
     1512      <bibliomixed xml:id="Datapower09" xreflabel="Salz, Achilles and Maze 2009">Salz, Richard,
     1513         Heather Achilles, and David Maze. 2009. Hardware and software trade-offs in the IBM
     1514         DataPower XML XG4 processor card. Proceedings of International Symposium on Processing XML
     1515         Efficiently 2009, Montréal.</bibliomixed>
     1516      <bibliomixed xml:id="PPoPP08" xreflabel="Cameron 2007">Cameron, Robert D. 2007. A Case Study
     1517         in SIMD Text Processing with Parallel Bit Streams UTF-8 to UTF-16 Transcoding. Proceedings
     1518         of 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008, Salt
     1519         Lake City, Utah. On the Web at <link>http://research.ihost.com/ppopp08/</link>.</bibliomixed>
     1520      <bibliomixed xml:id="CASCON08" xreflabel="Cameron, Herdy and Lin 2008">Cameron, Robert D.,
     1521         Kenneth S Herdy, and Dan Lin. 2008. High Performance XML Parsing Using Parallel Bit Stream
     1522         Technology. Proceedings of CASCON 2008. 13th ACM SIGPLAN Symposium on Principles and
     1523         Practice of Parallel Programming 2008, Toronto.</bibliomixed>
     1524      <bibliomixed xml:id="SVGOpen08" xreflabel="Herdy, Burggraf and Cameron 2008">Herdy, Kenneth
     1525         S., Robert D. Cameron and David S. Burggraf. 2008. High Performance GML to SVG
     1526         Transformation for the Visual Presentation of Geographic Data in Web-Based Mapping Systems.
     1527         Proceedings of SVG Open 6th International Conference on Scalable Vector Graphics,
     1528         Nuremburg. On the Web at
     1529            <link>http://www.svgopen.org/2008/papers/74-HighPerformance_GML_to_SVG_Transformation_for_the_Visual_Presentation_of_Geographic_Data_in_WebBased_Mapping_Systems/</link>.</bibliomixed>
     1530      <bibliomixed xml:id="Ross06" xreflabel="Ross 2006">Ross, Kenneth A. 2006. Efficient hash
     1531         probes on modern processors. Proceedings of ICDE, 2006. ICDE 2006, Atlanta. On the Web at
     1532            <link>www.cs.columbia.edu/~kar/pubsk/icde2007.pdf</link>.</bibliomixed>
     1533      <bibliomixed xml:id="ASPLOS09" xreflabel="Cameron and Lin 2009">Cameron, Robert D. and Dan
     1534         Lin. 2009. Architectural Support for SWAR Text Processing with Parallel Bit Streams: The
     1535         Inductive Doubling Principle. Proceedings of ASPLOS 2009, Washington, DC.</bibliomixed>
     1536      <bibliomixed xml:id="Wu08" xreflabel="Wu et al. 2008">Wu, Yu, Qi Zhang, Zhiqiang Yu and
     1537         Jianhui Li. 2008. A Hybrid Parallel Processing for XML Parsing and Schema Validation.
     1538         Proceedings of Balisage 2008, Montréal. On the Web at
     1539            <link>http://www.balisage.net/Proceedings/vol1/html/Wu01/BalisageVol1-Wu01.html</link>.</bibliomixed>
     1540      <bibliomixed xml:id="u8u16" xreflabel="Cameron 2008">u8u16 - A High-Speed UTF-8 to UTF-16
     1541         Transcoder Using Parallel Bit Streams Technical Report 2007-18. 2007. School of Computing
     1542         Science Simon Fraser University, June 21 2007.</bibliomixed>
     1543      <bibliomixed xml:id="XML10" xreflabel="XML 1.0">Extensible Markup Language (XML) 1.0 (Fifth
     1544         Edition) W3C Recommendation 26 November 2008. On the Web at
     1545            <link>http://www.w3.org/TR/REC-xml/</link>.</bibliomixed>
     1546      <bibliomixed xml:id="Unicode" xreflabel="Unicode">The Unicode Consortium. 2009. On the Web at
     1547            <link>http://unicode.org/</link>.</bibliomixed>
     1548      <bibliomixed xml:id="Pex06" xreflabel="Hilewitz and Lee 2006"> Hilewitz, Y. and Ruby B. Lee.
     1549         2006. Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit
     1550         Instructions. Proceedings of the IEEE 17th International Conference on Application-Specific
     1551         Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006.</bibliomixed>
     1552      <bibliomixed xml:id="InfoSet" xreflabel="XML Infoset">XML Information Set (Second Edition) W3C
     1553         Recommendation 4 February 2004. On the Web at
     1554         <link>http://www.w3.org/TR/xml-infoset/</link>.</bibliomixed>
     1555      <bibliomixed xml:id="Saxon" xreflabel="Saxon">SAXON The XSLT and XQuery Processor. On the Web
     1556         at <link>http://saxon.sourceforge.net/</link>.</bibliomixed>
     1557      <bibliomixed xml:id="Kay08" xreflabel="Kay 2008"> Kay, Michael Y. 2008. Ten Reasons Why Saxon
     1558         XQuery is Fast, IEEE Data Engineering Bulletin, December 2008.</bibliomixed>
     1559      <bibliomixed xml:id="AElfred" xreflabel="Ælfred"> The Ælfred XML Parser. On the Web at
     1560            <link>http://saxon.sourceforge.net/aelfred.html</link>.</bibliomixed>
     1561      <bibliomixed xml:id="JNI" xreflabel="Hitchens 2002">Hitchens, Ron. Java NIO. O'Reilly, 2002.</bibliomixed>
     1562      <bibliomixed xml:id="Expat" xreflabel="Expat">The Expat XML Parser.
     1563            <link>http://expat.sourceforge.net/</link>.</bibliomixed>
     1564   </bibliography>
     1565
    10531566</article>
Note: See TracChangeset for help on using the changeset viewer.