Changeset 2871 for docs/Working


Ignore:
Timestamp:
Jan 30, 2013, 5:29:40 PM (6 years ago)
Author:
nmedfort
Message:

more edits

Location:
docs/Working/icXML
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/arch-overview.tex

    r2866 r2871  
    11\subsection{Overview}
    22
    3 \def \CSG{Stream Generator}
     3\def \CSG{Content Stream Generator}
    44
    55\icXML{} is more than an optimized version of Xerces. Many components were grouped, restructured and
  • docs/Working/icXML/icxml-main.tex

    r2866 r2871  
    9696resolution, bit parallel methods in namespace processing,
    9797as well as staged processing using pipeline parallelism to take advantage of
    98 multiple cores.   
     98multiple cores.
     99
     100\begin{figure*}[th]
     101\begin{center}
     102\begin{tabular}{rr}\\
     103Source Data & \verb`<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>`\\
     104Tag Openers & \verb`1____________1____________________________1____________1__________`\\
     105Start Tag Marks & \verb`_1____________1___________________________________________________`\\
     106End Tag Marks & \verb`___________________________________________1____________1_________`\\
     107Empty Tag Marks & \verb`__________________________________________________________________`\\
     108Element Names & \verb`_11111111_____1111111_____________________________________________`\\
     109Attribute Names & \verb`______________________11_______11_________________________________`\\
     110Attribute Values & \verb`__________________________111________111__________________________`\\
     111% String Ends & \verb`1____________1_______________1__________1_1____________1__________`\\
     112% Markup Identifiers & \verb`_________1______________1_________1______1_1____________1_________`\\
     113% Deletion Mask & \verb`_11111111_____1111111111_1____1111_11_______11111111_____111111111`\\
     114% Undeleted Data & \verb``{\tt\it 0}\verb`________>fee`{\tt\it 0}\verb`__________=_fie`{\tt\it 0}\verb`____=__foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/________fum`{\tt\it 0}\verb`/_________`
     115\end{tabular}
     116\end{center}
     117\caption{XML Source Data and Derived Parallel Bit Streams}
     118\label{fig:parabix1}
     119\end{figure*}
    99120
    100121The remainder of this paper is organized as follows.   
     
    118139\input{background-xerces}
    119140
    120 \begin{figure*}[th]
    121 \begin{center}
    122 \begin{tabular}{rr}\\
    123 Source Data & \verb`<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>`\\
    124 Tag Openers & \verb`1____________1____________________________1____________1__________`\\
    125 Start Tag Marks & \verb`_1____________1___________________________________________________`\\
    126 End Tag Marks & \verb`___________________________________________1____________1_________`\\
    127 Empty Tag Marks & \verb`__________________________________________________________________`\\
    128 Element Names & \verb`_11111111_____1111111_____________________________________________`\\
    129 Attribute Names & \verb`______________________11_______11_________________________________`\\
    130 Attribute Values & \verb`__________________________111________111__________________________`\\
    131 % String Ends & \verb`1____________1_______________1__________1_1____________1__________`\\
    132 % Markup Identifiers & \verb`_________1______________1_________1______1_1____________1_________`\\
    133 % Deletion Mask & \verb`_11111111_____1111111111_1____1111_11_______11111111_____111111111`\\
    134 % Undeleted Data & \verb``{\tt\it 0}\verb`________>fee`{\tt\it 0}\verb`__________=_fie`{\tt\it 0}\verb`____=__foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/________fum`{\tt\it 0}\verb`/_________`
    135 \end{tabular}
    136 \end{center}
    137 \caption{XML Source Data and Derived Parallel Bit Streams}
    138 \label{fig:parabix1}
    139 \end{figure*}
    140141
    141142\input{background-parabix}
  • docs/Working/icXML/multithread.tex

    r2525 r2871  
    77% which naturally enables pipeline parallel processing.
    88
    9 As discussed in section \ref{background:xerces}, Xerces can be considered a complex finite-state machine.
    10 As an application class, finite-state machines are considered very difficult to parallelize
    11 and have been termed ``embarassingly sequential.'' \cite{Asanovic:EECS-2006-183}.
    12 However, \icXML{} is designed to organize processing into logical layers that
    13 are separable.   In particular, layers within the \PS{} are designed to operate
     9As discussed in section \ref{background:xerces}, Xerces can be considered a FSM application.
     10These are ``embarassingly sequential.''\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize.
     11However, \icXML{} is designed to organize processing into logical layers.   
     12In particular, layers within the \PS{} are designed to operate
    1413over significant segments of input data before passing their outputs on for
    1514subsequent processing.  This fits well into the general model of pipeline
     
    1817
    1918The most straightforward division of work in \icXML{} is to separate
    20 the \PS{} and the \MP{} into distinct logical layers in a two-stage pipeline.
     19the \PS{} and the \MP{} into distinct logical layers into two seperate stages.
     20The resultant application, {\it\icXMLp{}}, is a course-grained software-pipeline application.
    2121In this case, the \PS{} thread $T_1$ reads 16k of XML input $I$ at a time and produces the
    2222content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$.
     
    4141\subfigure[]{
    4242\includegraphics[width=0.48\textwidth]{plots/threads_timeline2.pdf}
     43\label{threads_timeline2}
    4344}
    4445\caption{Thread Balance in Two-Stage Pipelines}
    45 \label{threads_timeline2}
    4646
    4747\end{figure}
     
    6565% and the first thread has to wait for the second thread finishing reading the shared data before it can reuse the memory space.
    6666
    67 Overall, our design assumption is that an accelerated Xerces parser will be
    68 most significant for applications that themselves perform substantial
    69 processing on the parsed XML data delivered.  Our design is intended for
    70 a range of applications ranging between two design points.   The first
    71 design point is one in which XML parsing cost handled by the
    72 \PS{} dominates at 67\% of the overall
    73 cost, with the cost of application processing (including the driver logic
    74 withinn the \MP{}) still being quite significant
    75 at 33\%.   The second is almost the reverse scenario, the cost of application processing
    76 dominates at 60\% of the overall cost, while the overall cost of parsing represents
    77 an overhead of 40\%.
     67Overall, our design is intended to benefit a range of applications.
     68Conceptually, we consider two design points.
     69The first, the parsing performed by the \PS{} dominates at 67\% of the overall cost,
     70with the cost of application processing (including the driver logic within the \MP{}) at 33\%.   
     71The second is almost the opposite scenario, the cost of application processing dominates at 60\%,
     72while the cost of XML parsing represents an overhead of 40\%.
    7873
    79 Our design is also predicated on a goal of using the Parabix
    80 framework to achieve achieving a 50\% to 100\% improvement
    81 in the parsing engine itself.   Our best case scenario is
     74Our design is predicated on a goal of using the Parabix
     75framework to achieve a 50\% to 100\% improvement in the parsing engine itself.   
     76In a best case scenario,
    8277a 100\% improvement of the \PS{} for the design point in which
    8378XML parsing dominates at 67\% of the total application cost.
    84 In this case, single-threaded \icXML{} should achieve a 50\% speedup
     79In this case, the single-threaded \icXML{} should achieve a 1.5x speedup over Xerces
    8580so that the total application cost reduces to 67\% of the original. 
    86 However, with our two-stage pipeline model, our ideal scenario
    87 gives us two well-balanced threads each performing about 33\% of the
    88 original work.   In this case, Amdahl's law predicts that
    89 we could expect up to a 3X speedup, at best.
     81However, in \icXMLp{}, our ideal scenario gives us two well-balanced threads
     82each performing about 33\% of the original work.   
     83In this case, Amdahl's law predicts that we could expect up to a 3x speedup at best.
    9084
    9185At the other extreme of our design range, we consider an application
    92 in which core parsing cost is 40\%.   Assuming the 2X speedup of
     86in which core parsing cost is 40\%.   Assuming the 2x speedup of
    9387the \PS{} over the corresponding Xerces core, single-threaded
    9488\icXML{} delivers a 25\% speedup.   However, the most significant
  • docs/Working/icXML/performance.tex

    r2869 r2871  
    1 We evaluate \xerces, \icXML, \icXMLp against two benchmarking applications.
    2 First, against the Xerces C++ SAXCount
    3 sample application, and secondly against a real world
    4 GML to SVG transformation application.
    5 
    6 We investigated XML parser performance
    7 using an Intel Core i7 quad-core
    8 "Sandy Bridge" processor (3.40GHz, 4 physical cores, 8 threads (2 per core),
     1We evaluate \xerces{}, \icXML{}, \icXMLp{} against two benchmarking applications:
     2the Xerces C++ SAXCount sample application,
     3and a real world GML to SVG transformation application.
     4We investigated XML parser performance using an Intel Core i7 quad-core
     5(Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core),
    9632+32 Kb (per core) L1 cache,
    107256 Kb (per core) L2 cache,
     
    2522
    2623\subsection{Xerces C++ SAXCount}
    27 Xerces-C++ comes with sample applications that demonstrate salient features of the parser.
    28 SAXCount is the simplest sample application. The SAXCount application counts the
    29 elements, attributes and characters of a given XML file using the (event based) SAX API.
    30 and prints out the counts.
     24Xerces comes with sample applications that demonstrate salient features of the parser.
     25SAXCount is the simplest such application:
     26it counts the elements, attributes and characters of a given XML file using the (event based) SAX API
     27and prints out the totals.
    3128
    3229\begin{table}
     
    5451XML documents and consist entirely of single byte encoded ASCII characters.
    5552
    56 A key predictor of the overall parsing performance
    57 of an XML file is Markup density (i.e., the ratio of markup
    58 vs. the total XML document size.) This metric has substantial
    59 influence on the performance of traditional recursive descent
    60 XML parsers. We use a mixture of document-oriented and
     53A key predictor of the overall parsing performance of an XML file is markup density\footnote{
     54  Markup Density: the ratio of markup bytes used to define the structure of the document vs. its file size.}.
     55This metric has substantial influence on the performance of traditional recursive descent XML parsers
     56because it directly corresponds to the number of state transitions that occur when parsing a document.
     57We use a mixture of document-oriented and
    6158data-oriented XML files to analyze performance over a spectrum
    6259of markup densities.
    6360
    64 Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of CPU cycles per byte for the SAXCount application.
     61Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of
     62CPU cycles per byte for the SAXCount application.
    6563The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
    6664With two threads on the multicore machine, our pipelined version can achieve speedup up to 2.7x.
    6765Xerces is substantially slowed by dense markup
    68 but \icXML{} is relatively less affected as a result of the parallel processing technique.
    69 The pipelined \icXML{} performs even better on higher markup density files
    70 because the dense markup files are well balanced in this application.
     66but \icXML{} is less affected through a reduction in branches and the use of parallel-processing techniques.
     67\icXMLp{} performs better as markup-density increases because the work performed by each stage is
     68well balanced in this application.
    7169
    7270\begin{figure}
     
    9593The GML source document set
    9694consists of 46 distinct GML feature layers ranging in size from approximately 9 KB to 125.2 MB
    97 and with an average document size of 18.6 MB. Markup density ranges from approximately 0.0447 to 0.719
     95and with an average document size of 18.6 MB. Markup density ranges from approximately 0.045 to 0.719
    9896and with an average markup density of 0.519. In this performance study,
    9997213.4 MB of source GML data generates 91.9 MB of target SVG data.
     
    107105
    108106Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
    109 the Xerces, \icXML{} and pipelined \icXML{}.   On the GML workload with this application,
    110 single-thread \icXML{}
     107the Xerces, \icXML{} and \icXMLp{}. 
     108On the GML workload with this application, single-thread \icXML{}
    111109achieved about a 50\% acceleration over Xerces,
    112 increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   Using pipelined  \icXML{}, a
    113 further throughput increase to 111 MB/sec was recorded, approximately a 2X speedup.
     110increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   
     111Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,
     112approximately a 2X speedup.
    114113
    115114An important aspect of \icXML{} is the replacement of much branch-laden
     
    128127\end{figure}
    129128
    130 
    131129The behaviour of the three versions with respect to L1 cache misses per kB is shown
    132130in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
    133 and data-cache performance with the improveements in instruction-cache
     131and data-cache performance with the improvements in instruction-cache
    134132behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
    135133performance over Xerces on both measures.   The pipelined version shows a slight
Note: See TracChangeset for help on using the changeset viewer.