Changeset 3058


Ignore:
Timestamp:
Apr 19, 2013, 5:10:57 PM (6 years ago)
Author:
cameron
Message:

Linking

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Balisage13/Bal2013came0601/Bal2013came0601.xml

    r3057 r3058  
    149149        For example, one possibility for data
    150150        parallelization is to add a pre-parsing step to compute
    151         a skeleton tree structure of an  XML document \cite{GRID2006}.
     151        a skeleton tree structure of an  XML document <citation linkend="GRID2006"/>.
    152152        The parallelization of the pre-parsing stage itself can be tackled with
    153         state machines \cite{E-SCIENCE2007, IPDPS2008}.
    154         Methods without pre-parsing have used speculation \cite{HPCC2011} or post-processing that
    155         combines the partial results \cite{ParaDOM2009}.
     153          state machines <citation linkend="E-SCIENCE2007"/>, <citation linkend="IPDPS2008"/>.
     154        Methods without pre-parsing have used speculation <citation linkend="HPCC2011"/> or post-processing that
     155        combines the partial results <citation linkend="ParaDOM2009"/>.
    156156        A hybrid technique that combines data and pipeline parallelism was proposed to
    157         hide the latency of a "job" that has to be done sequentially \cite{ICWS2008}.
     157        hide the latency of a "job" that has to be done sequentially <citation linkend="ICWS2008"/>.
    158158      </para>
    159159      <para>
    160160        Fewer efforts have investigated SIMD parallelism, although this approach
    161161        has the potential advantage of improving single core performance as well
    162         as offering savings in energy consumption \cite{HPCA2012}.
     162        as offering savings in energy consumption <citation linkend="HPCA2012"/>.
    163163        Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension
    164         and showed how they can be used to improve the performance of XML parsing \cite{XMLSSE42}.
     164        and showed how they can be used to improve the performance of XML parsing <citation linkend="XMLSSE42"/>.
    165165        The Parabix framework uses generic SIMD extensions and bit parallel methods to
    166         process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
     166        process hundreds of XML input characters simultaneously <citation linkend="Cameron2009"/> <citation linkend="cameron-EuroPar2011"/>.
    167167        Parabix prototypes have also combined SIMD methods with thread-level parallelism to
    168         achieve further acceleration on multicore systems \cite{HPCA2012}.
     168        achieve further acceleration on multicore systems <citation linkend="HPCA2012"/>.
    169169      </para>
    170170      <para>
     
    270270-->
    271271            <!-- The bits used to construct $\tt <subscript>7</subscript>$ have been highlighted in this example. -->
    272             Boolean-logic operations\footnote{&#8743;, \&#8744; and &#172; denote the
    273             boolean AND, OR and NOT operators.} are used to classify the input bits into a set of
     272              Boolean-logic operations<footnote>&#8743;, \&#8744; and &#172; denote the
     273              boolean AND, OR and NOT operators.</footnote> are used to classify the input bits into a set of
    274274               <emphasis role="ital">character-class bit streams</emphasis>, which identify key
    275275            characters (or groups of characters) with a <code>1</code>. For example, one of the
     
    347347            character immediately following the opener (i.e., &quot;<code>/</code>&quot;) or
    348348            not. The remaining three lines show streams that can be computed in subsequent parsing
    349             (using the technique of bitstream addition \cite{cameron-EuroPar2011}), namely streams
     349            (using the technique of bitstream addition <citation linkend="cameron-EuroPar2011"/>), namely streams
    350350            marking the element names, attribute names and attribute values of tags. </para>
    351351            <table xml:id="derived">
     
    379379            bytes at a time using 128-bit registers) should provide substantial benefit. </para>
    380380         <para> Previous studies have shown that the Parabix approach improves many aspects of XML
    381             processing, including transcoding \cite{Cameron2008}, character classification and
     381            processing, including transcoding <citation linkend="Cameron2008"/>, character classification and
    382382            validation, tag parsing and well-formedness checking. The first Parabix parser used
    383383            processor bit scan instructions to considerably accelerate sequential scanning loops for
    384             individual characters \cite{CameronHerdyLin2008}. Recent work has incorporated a method
    385             of parallel scanning using bitstream addition \cite{cameron-EuroPar2011}, as well as
     384            individual characters <citation linkend="CameronHerdyLin2008"/>. Recent work has incorporated a method
     385            of parallel scanning using bitstream addition <citation linkend="cameron-EuroPar2011"/>, as well as
    386386            combining SIMD methods with 4-stage pipeline parallelism to further improve throughput
    387             \cite{HPCA2012}. Although these research prototypes handled the full syntax of
     387            <citation linkend="HPCA2012"/>. Although these research prototypes handled the full syntax of
    388388            schema-less XML documents, they lacked the functionality required by full XML parsers. </para>
    389389         <para> Commercial XML processors support transcoding of multiple character sets and can
     
    400400            scanning, validation and content processing modes. </para>
    401401         <para> In other words, Xerces belongs to an equivalent class applications termed FSM
    402             applications\footnote{ Herein FSM applications are considered software systems whose
     402           applications<footnote>Herein FSM applications are considered software systems whose
    403403            behaviour is defined by the inputs, current state and the events associated with
    404             transitions of states.}. Each state transition indicates the processing context of
     404              transitions of states.</footnote>. Each state transition indicates the processing context of
    405405            subsequent characters. Unfortunately, textual data tends to be unpredictable and any
    406406            character could induce a state transition. </para>
     
    409409            operations that can be grouped into logical layers, e.g., transposition, character
    410410            classification, and lexical analysis. Each layer is pipeline parallel and require
    411             neither speculation nor pre-parsing stages\cite{HPCA2012}. To meet the API requirements
     411            neither speculation nor pre-parsing stages<citation linkend="HPCA2012"/>. To meet the API requirements
    412412            of the document-ordered Xerces output, the results of the Parabix processing layers must
    413413            be interleaved to produce the equivalent behaviour. </para>
     
    455455         <para> In icXML functions are grouped into logical components. As shown in
    456456             <xref linkend="xerces-arch"/>, two major categories exist: (1) the Parabix Subsystem and (2) the
    457             Markup Processor. All tasks in (1) use the Parabix Framework \cite{HPCA2012}, which
     457               Markup Processor. All tasks in (1) use the Parabix Framework <citation linkend="HPCA2012">, which
    458458            represents data as a set of parallel bitstreams. The <emphasis role="ital">Character Set
    459459              Adapter</emphasis>, discussed in <xref linkend="character-set-adapter"/>, mirrors
     
    546546            representation supports efficient transcoding operations. In the important case of UTF-8
    547547            to UTF-16 transcoding, the corresponding UTF-16 bitstreams can be calculated in bit
    548             parallel fashion based on UTF-8 streams \cite{Cameron2008}, and all but the final bytes
     548              parallel fashion based on UTF-8 streams <citation linkend="Cameron2008"/>, and all but the final bytes
    549549            of multi-byte sequences can be marked for deletion as discussed in the following
    550550            subsection. In other cases, transcoding within a block only need be applied for
     
    567567            input bytes is the bit sequence <code>110110</code>. Using this approach, transcoding
    568568            may then be completed by applying parallel deletion and inverse transposition of the
    569             UTF-16 bitstreams\cite{Cameron2008}. </para>
     569            UTF-16 bitstreams<citation linkend="Cameron2008"/>. </para>
    570570         <para> Rather than immediately paying the costs of deletion and transposition just for
    571571            transcoding, however, icXML defers these steps so that the deletion masks for several
     
    617617            operations for each of the these filtering steps, paying the cost of parallel deletion
    618618            and inverse transposition only once. Currently, icXML employs the parallel-prefix
    619             compress algorithm of Steele~\cite{HackersDelight} Performance is independent of the
     619            compress algorithm of Steele~<citation linkend="HackersDelight"/> Performance is independent of the
    620620            number of positions deleted. Future versions of icXML are expected to take advantage of
    621             the parallel extract operation~\cite{HilewitzLee2006} that Intel is now providing in its
     621            the parallel extract operation~<citation linkend="HilewitzLee2006"/> that Intel is now providing in its
    622622            Haswell architecture. </para>
    623623      </section>
     
    819819      <para> As discussed in section <xref linkend="background-xerces"/>, Xerces can be considered a FSM
    820820         application. These are &quot;embarrassingly
    821          sequential.&quot;\cite{Asanovic:EECS-2006-183} and notoriously difficult to
     821         sequential.&quot;<citation linkend="Asanovic-EECS-2006-183"/> and notoriously difficult to
    822822         parallelize. However, icXML is designed to organize processing into logical layers. In
    823823         particular, layers within the Parabix Subsystem are designed to operate over significant
     
    886886      <para> Although the structure of the Parabix Subsystem allows division of the work into
    887887         several pipeline stages and has been demonstrated to be effective for four pipeline stages
    888          in a research prototype \cite{HPCA2012}, our analysis here suggests that the further
     888         in a research prototype <citation linkend="HPCA2012"/>, our analysis here suggests that the further
    889889         pipelining of work within the Parabix Subsystem is not worthwhile if the cost of
    890890         application logic is little as 33% of the end-to-end cost using Xerces. To achieve benefits
     
    906906         events included in our study are: processor cycles, branch instructions, branch
    907907         mispredictions, and cache misses. The Performance Application Programming Interface (PAPI)
    908          Version 5.5.0 \cite{papi} toolkit was installed on the test system to facilitate the
     908         Version 5.5.0 <citation linkend="papi"/> toolkit was installed on the test system to facilitate the
    909909         collection of hardware performance monitoring statistics. In addition, we used the Linux
    910          perf \cite{perf} utility to collect per core hardware events. </para>
     910         perf <citation linkend="perf"/> utility to collect per core hardware events. </para>
    911911      <section>
    912912         <title>Xerces C++ SAXCount</title>
     
    942942</para>     
    943943         <para> A key predictor of the overall parsing performance of an XML file is markup
    944             density\footnote{ Markup Density: the ratio of markup bytes used to define the structure
    945             of the document vs. its file size.}. This metric has substantial influence on the
     944           density<footnote>Markup Density: the ratio of markup bytes used to define the structure
     945             of the document vs. its file size.</footnote>. This metric has substantial influence on the
    946946            performance of traditional recursive descent XML parsers because it directly corresponds
    947947            to the number of state transitions that occur when parsing a document. We use a mixture
     
    972972<para>   As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
    973973was chosen.   This application transforms geospatially encoded data represented using
    974 an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography}
     974an XML representation in the form of Geography Markup Language (GML) <citation linkend="lake2004geography"/>
    975975into a different XML format  suitable for displayable maps:
    976 Scalable Vector Graphics (SVG) format\cite{lu2007advances}. In the GML2SVG benchmark, GML feature elements
     976Scalable Vector Graphics (SVG) format<citation linkend="lu2007advances"/>. In the GML2SVG benchmark, GML feature elements
    977977and GML geometry elements tags are matched. GML coordinate data are then extracted
    978978and transformed to the corresponding SVG path data encodings.
Note: See TracChangeset for help on using the changeset viewer.