source: docs/Working/icXML/performance.tex @ 2516

Last change on this file since 2516 was 2513, checked in by ksherdy, 7 years ago

Performance section updates.

File size: 5.2 KB
Line 
1\section{Performance}
2
3We evaluate the Xerces C++ 3.1.1, ICXML Xerces C++ XML parser and pipelined
4ICXML Xerces C++ against two benchmark applications. Firstly against the Xerces C++ SAXCount
5sample application and secondly against a real world
6GML to SVG format conversion application implemented against the Xerces C++
7DocumentHandler interface. Herein we investigate XML parser performance
8evaluated using an Intel Core i7 quad-core
9"Sandy Bridge" processor (3.40GHz, 4 physical cores/8 threads,
1032+32 Kb (per core) L1 cache,
11256 Kb (per core) L2 cache,
128 MB L3 cache) and leverage the SSE2 SIMD instructions
13available on modern Intel commodity processors.
14
15We investigated the execution profiles of each XML parser
16using the performance counters found in the processor.
17We chose several key hardware events that provide insight into the profile of each
18application and indicate if the processor is doing useful work. 
19The set of events included in our study are:
20processor cycles, branch instructions, branch mispredictions,
21and cache misses.
22
23\subsection{Xerces C++ SAXCount}
24
25SAXCount is the simplest application that counts the elements, attributes and characters
26of a given XML file using the (event based) SAX API.
27The SAXCount sample parses an XML file and prints out the counts.
28
29\begin{table}
30\begin{center}
31{
32\footnotesize
33\begin{tabular}{|l||l|l|l|l|l|}
34\hline
35File Name               & jaw.xml               & road.gml      & po.xml        & soap.xml \\ \hline   
36File Type               & document              & data          & data          & data   \\ \hline     
37File Size (kB)          & 7343                  & 11584         & 76450         & 2717 \\ \hline
38Markup Item Count       & 74882                 & 280724        & 4634110       & 18004 \\ \hline
39Markup Density          & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
40\end{tabular}
41}
42\end{center}
43\caption{XML Document Characteristics} 
44\label{XMLDocChars} 
45\end{table}
46
47Table \ref{XMLDocChars} shows the document characteristics of the XML input
48files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
49represents document-oriented XML inputs and contains the three-byte and four-byte UTF-8 sequence
50required for the UTF-8 encoding of Japanese characters. The remaining data files are data-oriented
51XML documents and consist entirely of single byte encoded ASCII characters.
52
53A key predictor of the overall parsing performance
54of an XML file is Markup density (i.e., the ratio of markup
55vs. the total XML document size.) This metric has substantial
56influence on the performance of traditional recursive descent
57XML parsers. We use a mixture of document-oriented and
58data-oriented XML files to analyze performance over a spectrum
59of markup densities.
60
61Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of CPU cycles per byte.
62The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
63With two threads on the multicore machine, our pipelined version can achieve speedup up to 2.7x.
64Xerces is substantially slowed by dense markup
65but \icXML{} is relatively less affected as a result of the parallel processing technique.
66The pipelined \icXML{} performs even better on higher markup desity files
67because the dense markup files are well balanced in this application.
68
69\begin{figure}
70\includegraphics[width=0.5\textwidth]{plots/perf_SAX.pdf}
71\caption{Performance Comparison for SAXCount}
72\label{perf_SAX}
73\end{figure}
74
75\subsection{GML2SVG}
76
77The visualization of geographic information is a primary goal of on-demand web-based mapping systems \cite{lu2007advances}.
78Web-based mapping systems commonly encode spatial data with GML for transmission and with SVG for display \cite{lu2007advances}.
79GML is an XML grammar defined by the Open Geospatial Consortium (OGC) to encode geographical features \cite{lake2004geography}.
80As an XML grammar, GML is platform neutral and is well suited  the exchange of spatial data over the Internet.
81GML however, is not a visualization format. Rather, GML relies on commercially available viewers for data visualization,
82with Scalable Vector Graphics (SVG) viewers being one of the most common \cite{lu2007advances}. Large volumes of GML data are
83typical in on-demand web-based mapping, and as a consequence, the visualization of GML as SVG requires
84high-performance GML to SVG translation.
85
86In this section we present a performance evaluation of the translation wide spectrum of Geography Markup Language (GML)
87data files to Scalable Vector Graphics (SVG) format for visualization. In the GML to SVG benchmark, GML feature elements
88and GML geometry elements tags are matched. GML coordinate data are then extracted
89and transformed to the SVG path data encodings. Equivalent SVG path elements are generated and output to the destination
90SVG document. GML to SVG data translations are executed on GML source data modelling the city of Vancouver, British Columbia, Canada.
91
92\subsubsection{Workload}
93
94The GML source document set consists of 46 distinct GML feature layers ranging in size from approximately 9 KB to 125.2 MB
95and with an average document size of 18.6 MB. Markup density ranges from approximately 0.0447 to 0.719
96and with an average markup density of 0.519. In this performance study,
97213.4 MB of source GML data generates 91.9 MB of target SVG data.
98
99\begin{figure}
100\includegraphics[width=0.5\textwidth]{plots/perf_GML2SVG.pdf}
101\caption{}
102\label{perf_SAX}
103\end{figure}
104
105
106
107 
Note: See TracBrowser for help on using the repository browser.