source: docs/Working/icXML/performance.tex @ 2869

Last change on this file since 2869 was 2869, checked in by cameron, 7 years ago

Abstract and conclusion

File size: 7.3 KB
[2528]1We evaluate \xerces, \icXML, \icXMLp against two benchmarking applications.
2First, against the Xerces C++ SAXCount
3sample application, and secondly against a real world
4GML to SVG transformation application.
6We investigated XML parser performance
7using an Intel Core i7 quad-core
8"Sandy Bridge" processor (3.40GHz, 4 physical cores, 8 threads (2 per core),
[2507]932+32 Kb (per core) L1 cache,
10256 Kb (per core) L2 cache,
[2528]118 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
[2528]13We analyzed the execution profiles of each XML parser
[2513]14using the performance counters found in the processor.
15We chose several key hardware events that provide insight into the profile of each
16application and indicate if the processor is doing useful work. 
17The set of events included in our study are:
18processor cycles, branch instructions, branch mispredictions,
[2528]19and cache misses. The Performance Application Programming Interface
20(PAPI) Version 5.5.0 \cite{papi} toolkit
21was installed on the test system to facilitate the
22collection of hardware performance monitoring
23statistics. In addition, we used the Linux perf \cite{perf} utility
24to collect per core hardware events.
[2507]26\subsection{Xerces C++ SAXCount}
[2528]27Xerces-C++ comes with sample applications that demonstrate salient features of the parser.
28SAXCount is the simplest sample application. The SAXCount application counts the
29elements, attributes and characters of a given XML file using the (event based) SAX API.
30and prints out the counts.
[2511]38File Name               & jaw.xml               & road.gml      & po.xml        & soap.xml \\ \hline   
39File Type               & document              & data          & data          & data   \\ \hline     
40File Size (kB)          & 7343                  & 11584         & 76450         & 2717 \\ \hline
41Markup Item Count       & 74882                 & 280724        & 4634110       & 18004 \\ \hline
42Markup Density          & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
46\caption{XML Document Characteristics} 
[2511]50Table \ref{XMLDocChars} shows the document characteristics of the XML input
51files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
52represents document-oriented XML inputs and contains the three-byte and four-byte UTF-8 sequence
53required for the UTF-8 encoding of Japanese characters. The remaining data files are data-oriented
54XML documents and consist entirely of single byte encoded ASCII characters.
[2513]56A key predictor of the overall parsing performance
57of an XML file is Markup density (i.e., the ratio of markup
58vs. the total XML document size.) This metric has substantial
59influence on the performance of traditional recursive descent
60XML parsers. We use a mixture of document-oriented and
61data-oriented XML files to analyze performance over a spectrum
62of markup densities.
[2863]64Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of CPU cycles per byte for the SAXCount application.
[2511]65The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
66With two threads on the multicore machine, our pipelined version can achieve speedup up to 2.7x.
67Xerces is substantially slowed by dense markup
68but \icXML{} is relatively less affected as a result of the parallel processing technique.
[2852]69The pipelined \icXML{} performs even better on higher markup density files
[2511]70because the dense markup files are well balanced in this application.
[2526]74\caption{SAXCount Performance Comparison}
[2852]80As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
81was chosen.   This application transforms geospatially encoded data represented using
82an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography} 
83into a different XML format  suitable for displayable maps:
84Scalable Vector Graphics (SVG) format\cite{lu2007advances}. In the GML2SVG benchmark, GML feature elements
[2513]85and GML geometry elements tags are matched. GML coordinate data are then extracted
[2852]86and transformed to the corresponding SVG path data encodings.
87Equivalent SVG path elements are generated and output to the destination
88SVG document.  The GML2SVG application is thus considered typical of a broad
89class of XML applications that parse and extract information from
90a known XML format for the purpose of analysis and restructuring to meet
91the requirements of an alternative format.
[2852]93Our GML to SVG data translations are executed on GML source data
94modelling the city of Vancouver, British Columbia, Canada.
95The GML source document set
96consists of 46 distinct GML feature layers ranging in size from approximately 9 KB to 125.2 MB
[2513]97and with an average document size of 18.6 MB. Markup density ranges from approximately 0.0447 to 0.719
98and with an average markup density of 0.519. In this performance study,
[2509]99213.4 MB of source GML data generates 91.9 MB of target SVG data.
103\caption{Performance Comparison for GML2SVG}
[2863]108Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
109the Xerces, \icXML{} and pipelined \icXML{}.   On the GML workload with this application,
110single-thread \icXML{} 
[2865]111achieved about a 50\% acceleration over Xerces,
[2869]112increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   Using pipelined  \icXML{}, a
113further throughput increase to 111 MB/sec was recorded, approximately a 2X speedup.
[2865]115An important aspect of \icXML{} is the replacement of much branch-laden
116sequential code inside Xerces with straight-line SIMD code using far
117fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
[2869]118improvement in branching behaviour, with a dramatic reduction in branch misses per kB.
[2865]119It is also interesting to note that pipelined \icXML{} goes even
120further.   In essence, in using pipeline parallelism to split the instruction
121stream onto separate cores, the branch target buffers on each core are
122less overloaded and able to increase the successful branch prediction rate.
126\caption{Comparative Branch Misprediction Rate}
[2869]131The behaviour of the three versions with respect to L1 cache misses per kB is shown
[2865]132in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
133and data-cache performance with the improveements in instruction-cache
134behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
135performance over Xerces on both measures.   The pipelined version shows a slight
136worsening in data-cache performance, well more than offset by a further dramatic
137reduction in instruction-cache miss rate.   Again partitioning the instruction
138stream through the pipeline parallelism model has significant benefit.
142\caption{Comparative Cache Miss Rate}
[2865]146One caveat with this study is that the GML2SVG application did not exhibit
147a relative balance of processing between application code and Xerces library
148code reaching the 33\% figure.  This suggests that for this application and
149possibly others, further separating the logical layers of the
150\icXML{} engine into different pipeline stages could well offer significant benefit.
151This remains an area of ongoing work.
Note: See TracBrowser for help on using the repository browser.