Changeset 2869


Ignore:
Timestamp:
Jan 30, 2013, 5:15:11 PM (6 years ago)
Author:
cameron
Message:

Abstract and conclusion

Location:
docs/Working/icXML
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/abstract.tex

    r2516 r2869  
    1010while maintaining the existing API for application programmers.
    1111Using SIMD techniques alone, an increase in parsing speed
    12 of 50\% to 100\% was observed in a range of applications.
     12of at least 50\% was observed in a range of applications.
    1313When coupled with pipeline parallelism on dual core processors,
    14 improvements approaching 3X were realized.
     14improvements of 2x and beyond were realized.
    1515
  • docs/Working/icXML/conclusion.tex

    r2852 r2869  
    1 This paper presented the \icXML{} parser and discussed the key architectural differences between it and Xerces.
    2 \icXML{} was architected to utilize SIMD parallelism, facilitated by the Parabix framework. \icXMLp{} extended on
    3 this by incorporating pipeline parallelism through the concept of layers and streaming content models.
     1This paper is the first case study documenting the significant
     2performance benefits that may be realized through the integration
     3of parallel bit stream technology into existing widely-used software libraries.
     4In the case of the Xerces-C++ XML parser, the
     5combined integration of SIMD and multicore parallelism was
     6shown capable of dramatic producing dramatic increases in
     7throughput and reductions in branch mispredictions and cache misses.
     8The modified parser, going under the name \icXML{} is designed
     9to provide the full functionality of the original Xerces library
     10with complete compatibility of APIs.  Although substantial
     11reengineering was required to realize the
     12performance potential of parallel technologies, this
     13is an important case study demonstrating the general
     14feasibility of these techniques.
    415
    5 Two applications were selected for the performance evaluation: SAXCount and GML2SVG. The former to assess the
    6 the speed up of \icXML{} over Xerces itself, and the latter to test it within a reasonably complex application.
    7 {\bf something about the final speed up rates in both SAXCount and GML2SVG}
     16The further development of \icXML{} to move beyond 2-stage
     17pipeline parallelism is ongoing, with realistic prospects for
     18four reasonably balanced stages within the library.  For
     19applications such as GML2SVG which are dominated by time
     20spent on XML parsing, such a multistage pipelined parsing
     21library should offer substantial benefits. 
    822
    9 Although only a two-thread version was explored, more is possible---but the value of using more is dependent on
    10 the application utilizing the \icXML{} library.
    11 As the application becomes more complex there are diminishing returns \wrt{} additional thread-level parallelism.
    12 A more interesting use of additional threads could be in the inclusion of an XPath and XQuery modules that could
    13 eliminate unneeded data prior to the \MP{} stage.
    14 Finally, the concepts used within \icXML{} and \icXMLp{} are not restricted to XML processing: \icXML{} should be considered a
    15 proof-of-concept work that shows it is possible to parallelize some finite-state machines by restructuring the application
    16 (and therefore the problem domain) to one that is more in line with current processor technology.
     23The example of XML parsing may be considered prototypical
     24of finite-state machines applications which have sometimes
     25been considered ``embarassingly sequential'' and so
     26difficult to parallelize that ``nothing works.''  So the
     27case study presented here should be considered an important
     28data point in making the case that parallelization can
     29indeed be helpful across a broad array of application types.
     30
     31To overcome the software engineering challenges in applying
     32parallel bit stream technology to existing software systems,
     33it is clear that better library and tool support is needed.
     34The techniques used in the implementation of \icXML{} and
     35documented in this paper could well be generalized for
     36applications in other contexts and automated through
     37the creation of compiler technology specifically supporting
     38parallel bit stream programming.
     39
  • docs/Working/icXML/performance.tex

    r2865 r2869  
    110110single-thread \icXML{}
    111111achieved about a 50\% acceleration over Xerces,
    112 increasing throughput on our test machine from ???  MB/sec to ??? MB/sec.   Using pipelined  \icXML{}, a
    113 further throughput increase to ???  MB/sec was recorded.
     112increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   Using pipelined  \icXML{}, a
     113further throughput increase to 111 MB/sec was recorded, approximately a 2X speedup.
    114114
    115115An important aspect of \icXML{} is the replacement of much branch-laden
    116116sequential code inside Xerces with straight-line SIMD code using far
    117117fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
    118 improvement in branching behaviour, with a dramatic reduction in branch misses per KB.
     118improvement in branching behaviour, with a dramatic reduction in branch misses per kB.
    119119It is also interesting to note that pipelined \icXML{} goes even
    120120further.   In essence, in using pipeline parallelism to split the instruction
     
    129129
    130130
    131 The behaviour of the three versions with respect to L1 cache misses per MB is shown
     131The behaviour of the three versions with respect to L1 cache misses per kB is shown
    132132in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
    133133and data-cache performance with the improveements in instruction-cache
Note: See TracChangeset for help on using the changeset viewer.