source: docs/Working/icXML/performance.tex @ 2526

Last change on this file since 2526 was 2526, checked in by ksherdy, 7 years ago

Minor update.

File size: 5.2 KB
Line 
1We evaluate the Xerces C++ 3.1.1, ICXML Xerces C++ XML parser and pipelined
2ICXML Xerces C++ against two benchmark applications. Firstly against the Xerces C++ SAXCount
3sample application and secondly against a real world
4GML to SVG format conversion application implemented against the Xerces C++
5DocumentHandler interface. Herein we investigate XML parser performance
6evaluated using an Intel Core i7 quad-core
7"Sandy Bridge" processor (3.40GHz, 4 physical cores/8 threads,
832+32 Kb (per core) L1 cache,
9256 Kb (per core) L2 cache,
108 MB L3 cache) and leverage the SSE2 SIMD instructions
11available on modern Intel commodity processors.
12
13We investigated the execution profiles of each XML parser
14using the performance counters found in the processor.
15We chose several key hardware events that provide insight into the profile of each
16application and indicate if the processor is doing useful work. 
17The set of events included in our study are:
18processor cycles, branch instructions, branch mispredictions,
19and cache misses.
20
21\subsection{Xerces C++ SAXCount}
22
23SAXCount is the simplest application that counts the elements, attributes and characters
24of a given XML file using the (event based) SAX API.
25The SAXCount sample parses an XML file and prints out the counts.
26
27\begin{table}
28\begin{center}
29{
30\footnotesize
31\begin{tabular}{|l||l|l|l|l|l|}
32\hline
33File Name               & jaw.xml               & road.gml      & po.xml        & soap.xml \\ \hline   
34File Type               & document              & data          & data          & data   \\ \hline     
35File Size (kB)          & 7343                  & 11584         & 76450         & 2717 \\ \hline
36Markup Item Count       & 74882                 & 280724        & 4634110       & 18004 \\ \hline
37Markup Density          & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
38\end{tabular}
39}
40\end{center}
41\caption{XML Document Characteristics} 
42\label{XMLDocChars} 
43\end{table}
44
45Table \ref{XMLDocChars} shows the document characteristics of the XML input
46files selected for the Xerces C++ SAXCount benchmark. The jaw.xml
47represents document-oriented XML inputs and contains the three-byte and four-byte UTF-8 sequence
48required for the UTF-8 encoding of Japanese characters. The remaining data files are data-oriented
49XML documents and consist entirely of single byte encoded ASCII characters.
50
51A key predictor of the overall parsing performance
52of an XML file is Markup density (i.e., the ratio of markup
53vs. the total XML document size.) This metric has substantial
54influence on the performance of traditional recursive descent
55XML parsers. We use a mixture of document-oriented and
56data-oriented XML files to analyze performance over a spectrum
57of markup densities.
58
59Figure \ref{perf_SAX} compares the performance of Xerces, \icXML{} and pipelined \icXML{} in terms of CPU cycles per byte.
60The speedup for \icXML{} over Xerces is 1.3x to 1.8x.
61With two threads on the multicore machine, our pipelined version can achieve speedup up to 2.7x.
62Xerces is substantially slowed by dense markup
63but \icXML{} is relatively less affected as a result of the parallel processing technique.
64The pipelined \icXML{} performs even better on higher markup desity files
65because the dense markup files are well balanced in this application.
66
67\begin{figure}
68\includegraphics[width=0.5\textwidth]{plots/perf_SAX.pdf}
69\caption{SAXCount Performance Comparison}
70\label{perf_SAX}
71\end{figure}
72
73\subsection{GML2SVG}
74
75The visualization of geographic information is a primary goal of on-demand web-based mapping systems \cite{lu2007advances}.
76Web-based mapping systems commonly encode spatial data with GML for transmission and with SVG for display \cite{lu2007advances}.
77GML is an XML grammar defined by the Open Geospatial Consortium (OGC) to encode geographical features \cite{lake2004geography}.
78As an XML grammar, GML is platform neutral and is well suited  the exchange of spatial data over the Internet.
79GML however, is not a visualization format. Rather, GML relies on commercially available viewers for data visualization,
80with Scalable Vector Graphics (SVG) viewers being one of the most common \cite{lu2007advances}. Large volumes of GML data are
81typical in on-demand web-based mapping, and as a consequence, the visualization of GML as SVG requires
82high-performance GML to SVG translation.
83
84In this section we present a performance evaluation of the translation wide spectrum of Geography Markup Language (GML)
85data files to Scalable Vector Graphics (SVG) format for visualization. In the GML to SVG benchmark, GML feature elements
86and GML geometry elements tags are matched. GML coordinate data are then extracted
87and transformed to the SVG path data encodings. Equivalent SVG path elements are generated and output to the destination
88SVG document. GML to SVG data translations are executed on GML source data modelling the city of Vancouver, British Columbia, Canada.
89
90\subsubsection{Workload}
91
92The GML source document set consists of 46 distinct GML feature layers ranging in size from approximately 9 KB to 125.2 MB
93and with an average document size of 18.6 MB. Markup density ranges from approximately 0.0447 to 0.719
94and with an average markup density of 0.519. In this performance study,
95213.4 MB of source GML data generates 91.9 MB of target SVG data.
96
97\begin{figure}
98\includegraphics[width=0.5\textwidth]{plots/perf_GML2SVG.pdf}
99\caption{Performance Comparison for GML2SVG}
100\label{perf_GML2SVG}
101\end{figure}
102
103
104
105 
Note: See TracBrowser for help on using the repository browser.