source: docs/Working/icXML/performance.tex @ 2871

Last change on this file since 2871 was 2871, checked in by nmedfort, 7 years ago

more edits

File size: 7.4 KB
Line 
1We evaluate \xerces{}, \icXML{}, \icXMLp{} against two benchmarking applications:
2the Xerces C++ SAXCount sample application,
3and a real world GML to SVG transformation application.
4We investigated XML parser performance using an Intel Core i7 quad-core
5(Sandy Bridge) processor (3.40GHz, 4 physical cores, 8 threads (2 per core),
632+32 Kb (per core) L1 cache,
7256 Kb (per core) L2 cache,
88 MB L3 cache) running the 64-bit version of Ubuntu 12.04 (Linux).
9
10We analyzed the execution profiles of each XML parser
11using the performance counters found in the processor.
12We chose several key hardware events that provide insight into the profile of each
13application and indicate if the processor is doing useful work. 
14The set of events included in our study are:
15processor cycles, branch instructions, branch mispredictions,
16and cache misses. The Performance Application Programming Interface
17(PAPI) Version 5.5.0 \cite{papi} toolkit
18was installed on the test system to facilitate the
19collection of hardware performance monitoring
20statistics. In addition, we used the Linux perf \cite{perf} utility
21to collect per core hardware events.
22
23\subsection{Xerces C++ SAXCount}
24Xerces comes with sample applications that demonstrate salient features of the parser.
25SAXCount is the simplest such application:
26it counts the elements, attributes and characters of a given XML file using the (event based) SAX API
27and prints out the totals.
28
29\begin{table}
30\begin{center}
31{
32\footnotesize
33\begin{tabular}{|l||l|l|l|l|l|}
34\hline
35File Name               & jaw.xml               & road.gml      & po.xml        & soap.xml \\ \hline   
36File Type               & document              & data          & data          & data   \\ \hline     
37File Size (kB)          & 7343                  & 11584         & 76450         & 2717 \\ \hline
38Markup Item Count       & 74882                 & 280724        & 4634110       & 18004 \\ \hline
39Markup Density          & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
40\end{tabular}
41}
42\end{center}
43\caption{XML Document Characteristics} 
44\label{XMLDocChars} 
45\end{table}
46
47Table \ref{XMLDocChars} shows the document characteristics of the XML input
48files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
49represents document-oriented XML inputs and contains the three-byte and four-byte UTF-8 sequence
50required for the UTF-8 encoding of Japanese characters. The remaining data files are data-oriented
51XML documents and consist entirely of single byte encoded ASCII characters.
52
53A key predictor of the overall parsing performance of an XML file is markup density\footnote{
54  Markup Density: the ratio of markup bytes used to define the structure of the document vs. its file size.}.
55This metric has substantial influence on the performance of traditional recursive descent XML parsers
56because it directly corresponds to the number of state transitions that occur when parsing a document.
57We use a mixture of document-oriented and
58data-oriented XML files to analyze performance over a spectrum
59of markup densities.
60
61Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of
62CPU cycles per byte for the SAXCount application.
63The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
64With two threads on the multicore machine, our pipelined version can achieve speedup up to 2.7x.
65Xerces is substantially slowed by dense markup
66but \icXML{} is less affected through a reduction in branches and the use of parallel-processing techniques.
67\icXMLp{} performs better as markup-density increases because the work performed by each stage is
68well balanced in this application.
69
70\begin{figure}
71\includegraphics[width=0.5\textwidth]{plots/perf_SAX.pdf}
72\caption{SAXCount Performance Comparison}
73\label{perf_SAX}
74\end{figure}
75
76\subsection{GML2SVG}
77
78As a more substantial application of XML processing, the GML-to-SVG (GML2SVG) application
79was chosen.   This application transforms geospatially encoded data represented using
80an XML representation in the form of Geography Markup Language (GML) \cite{lake2004geography} 
81into a different XML format  suitable for displayable maps:
82Scalable Vector Graphics (SVG) format\cite{lu2007advances}. In the GML2SVG benchmark, GML feature elements
83and GML geometry elements tags are matched. GML coordinate data are then extracted
84and transformed to the corresponding SVG path data encodings.
85Equivalent SVG path elements are generated and output to the destination
86SVG document.  The GML2SVG application is thus considered typical of a broad
87class of XML applications that parse and extract information from
88a known XML format for the purpose of analysis and restructuring to meet
89the requirements of an alternative format.
90
91Our GML to SVG data translations are executed on GML source data
92modelling the city of Vancouver, British Columbia, Canada.
93The GML source document set
94consists of 46 distinct GML feature layers ranging in size from approximately 9 KB to 125.2 MB
95and with an average document size of 18.6 MB. Markup density ranges from approximately 0.045 to 0.719
96and with an average markup density of 0.519. In this performance study,
97213.4 MB of source GML data generates 91.9 MB of target SVG data.
98
99\begin{figure}
100\includegraphics[width=0.5\textwidth]{plots/Throughput.pdf}
101\caption{Performance Comparison for GML2SVG}
102\label{perf_GML2SVG}
103\end{figure}
104
105
106Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
107the Xerces, \icXML{} and \icXMLp{}.   
108On the GML workload with this application, single-thread \icXML{} 
109achieved about a 50\% acceleration over Xerces,
110increasing throughput on our test machine from 58.3 MB/sec to 87.9 MB/sec.   
111Using \icXMLp{}, a further throughput increase to 111 MB/sec was recorded,
112approximately a 2X speedup.
113
114An important aspect of \icXML{} is the replacement of much branch-laden
115sequential code inside Xerces with straight-line SIMD code using far
116fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
117improvement in branching behaviour, with a dramatic reduction in branch misses per kB.
118It is also interesting to note that pipelined \icXML{} goes even
119further.   In essence, in using pipeline parallelism to split the instruction
120stream onto separate cores, the branch target buffers on each core are
121less overloaded and able to increase the successful branch prediction rate.
122
123\begin{figure}
124\includegraphics[width=0.5\textwidth]{plots/BM.pdf}
125\caption{Comparative Branch Misprediction Rate}
126\label{branchmiss_GML2SVG}
127\end{figure}
128
129The behaviour of the three versions with respect to L1 cache misses per kB is shown
130in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
131and data-cache performance with the improvements in instruction-cache
132behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
133performance over Xerces on both measures.   The pipelined version shows a slight
134worsening in data-cache performance, well more than offset by a further dramatic
135reduction in instruction-cache miss rate.   Again partitioning the instruction
136stream through the pipeline parallelism model has significant benefit.
137
138\begin{figure}
139\includegraphics[width=0.5\textwidth]{plots/CM.pdf}
140\caption{Comparative Cache Miss Rate}
141\label{cachemiss_GML2SVG}
142\end{figure}
143
144One caveat with this study is that the GML2SVG application did not exhibit
145a relative balance of processing between application code and Xerces library
146code reaching the 33\% figure.  This suggests that for this application and
147possibly others, further separating the logical layers of the
148\icXML{} engine into different pipeline stages could well offer significant benefit.
149This remains an area of ongoing work.
150
151
152
153 
Note: See TracBrowser for help on using the repository browser.