source: docs/Cmpt886_Project_Report/04-evaluation.tex.backup @ 1122

Last change on this file since 1122 was 1122, checked in by lindanl, 8 years ago

Add project report for cmpt886 on parallelizing Parabix2

File size: 6.7 KB
Line 
1\section{Evaluation}
2\subsection{Test Data and Platform}
3
4Distinguishing between ``document-oriented'' XML and ``data-oriented'' XML
5is a popular way to describe the two basic classes of XML documents.
6Data-oriented XML is used as an interchange format. Document-oriented
7XML is used to impose structure on information that rarely fits neatly
8into a relational database--particularly information intended for publishing.
9Data-oriented XML are characterized by a higher markup density.
10Markup density is defined as the ratio of the total markup contained
11within an XML file to the total XML document size.  This metric may have
12substantial influence on the performance of XML parsing.
13As such we choose workloads with distinguishable markup densities.
14Table \ref{XMLDocChars} shows the document characteristics of the XML
15instances selected for this performance study.
16
17\begin{table*}
18\begin{center}
19\begin{tabular}{|c||r|r|r|r|r|}
20\hline
21File Name               & dew.xml               & jaw.xml               & roads.gml     & po.xml        & soap.xml \\ \hline   
22File Type               & document              & document              & data          & data          & data   \\ \hline     
23File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
24Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
25\end{tabular}
26\end{center}
27\caption{XML Document Characteristics}
28\label{XMLDocChars}
29\end{table*}
30
31All the experiments are run on a quad core machine with Linux kernel 2.6.35.
32Table \ref{machineinfo} gives the hardware description of the machine selected.
33\begin{table}[h]
34\begin{center}
35\begin{tabular}{|c||c|}
36\hline
37Processor & Intel Sandybridge i5-2300 (2.80GHz) \\ \hline
38L1 Cache &  4 X 32KB I-Cache 2 X 32KB D-Cache\\ \hline 
39L2 Cache &  4 X 256KB \\ \hline
40L3 Cache & 6-MB \\ \hline
41Front Side Bus &  1333 MHz\\ \hline
42Memory  &  6GB DDDR\\ \hline
43Max TDP & 95W \\ \hline
44
45\end{tabular}
46\end{center}
47\caption{Machine}
48\label{machineinfo}
49\end{table}
50
51\subsection{Parameters}
52\subsubsection{Segment Size}
53Increasing the segment size reduces the synchronization overhead.
54As shown in Figure \ref{para_segsize}, the processing time drops dramatically as the segment size goes from 128 bytes to 2KB.
55However, the performance cease to improve, actually slightly degrades when segment size is larger than 16KB because of cache contention.
56In real applications, we would like to use as little memory as possible without hurting much of the performance.
57The rest of the experiments are run using segment size 16KB.
58
59\begin{figure}
60\begin{center}
61\includegraphics[width=0.5\textwidth]{plots/para_segsize.pdf}
62\end{center}
63\caption{Processing Time with Different Segment Size (x axis: byte, y axis: CPU cycles per byte)}
64\label{para_segsize}
65\end{figure}
66
67\subsubsection{Circular Array Size}
68When the circular array size $C$ is smaller than the number of threads,
69only $C$ threads will be able to do useful work at the same time.
70As shown in Figure \ref{para_arrayentry}, when there are only two entries,
71the performance is even worse than the sequential Parabix.
72When the circular array size is larger than number of threads,
73increasing the size of the circular array allows threads to be pushed deeper into the queue.
74Then if one thread runs faster than the following thread,
75it has more time to process before it must stop and wait.
76However, allocating a larger memory area can also degrade the performance and
77the processing time is depending on the slowest stage in the pipeline not the fast one.
78Therefore, the rest of the experiments are run using only 6 entries.
79\begin{figure}
80\begin{center}
81\includegraphics[width=0.5\textwidth]{plots/para_arrayentry.pdf}
82\end{center}
83\caption{Processing Time with Different Circular Array Size (x axis: number of entries, y axis: CPU cycles per byte)}
84\label{para_arrayentry}
85\end{figure}
86
87
88\subsection{Load Balance}
89Figure \ref{work_balance} shows the work time and stall time of each thread with different test files.
90As discussed in the previous section, the work loads are not evenly divided.
91Therefore, the threads that process faster have to wait for its predecessor to finish
92and thus consists certain amount of stall time.
93The overhead introduced by data migration and resource contention is 27\% to 37\%,
94calculated as $(overall\_work time - sequential\_time)/sequential\_time$.
95The overhead introduced by synchronization is 15\% to 50\%, calculated as $overall\_stalltime/sequential\_time$.
96\begin{figure}
97\begin{center}
98\includegraphics[width=0.5\textwidth]{plots/work_balance.pdf}
99\end{center}
100\caption{Processing Time of Each Thread (y axis: CPU cycles per byte)}
101\label{work_balance}
102\end{figure}
103
104\subsection{Performance}
105Figure \ref{perf} demonstrates the XML well-formedness checking performance of
106the parallelized Parabix in comparison with the sequential version.
107The parallelized Parabix is more than 2 times faster on the quad core machine.
108With the sequential Parabix, the performance decrease as markup density of the test files increase.
109However, the high density files are better balance and consumes less stall time.
110Therefore, the processing time of all the test files are about 2.7 cycles per byte.
111
112\begin{figure}
113\begin{center}
114\includegraphics[width=0.5\textwidth]{plots/performance.pdf}
115\end{center}
116\caption{Processing Time (y axis: CPU cycles per byte)}
117\label{perf}
118\end{figure}
119\subsection{Power and Energy}
120Figure \ref{power} shows the average power consumed by the parallelized Parabix in comparison with the sequential version.
121By running four threads and using all the cores at the same time, the power consumption of the parallelized Parabix is much higher
122than the sequential version. However, the energy consumption is about the same, because the parallelized Parabix needs less processing time.
123In fact, as shown in Figure \ref{energy}, parsing soap.xml using parallelized Parabix consumes less energy than using sequential Parabix.
124\begin{figure}
125\begin{center}
126\includegraphics[width=0.5\textwidth]{plots/power.pdf}
127\end{center}
128\caption{Average Power (watts)}
129\label{power}
130\end{figure}
131\begin{figure}
132\begin{center}
133\includegraphics[width=0.5\textwidth]{plots/energy.pdf}
134\end{center}
135\caption{Energy Consumption (nJ per byte)}
136\label{energy}
137\end{figure}
138\subsection{Performance vs. Energy}
139Figure \ref{perf_energy} shows the performance and energy consumption of sequential and parallelized Parabix
140as well as two other XML parsers, Expat and Xerces.
141Parabix consumes 25\% of the energy of Xerces and Expat but with much better performance.
142Although the parallelized Parabix consumes slightly higher average energy than the sequential Parabix,
143it is more than 2 times faster.
144
145\begin{figure}
146\begin{center}
147\includegraphics[width=0.5\textwidth]{plots/perf_energy.pdf}
148\end{center}
149\caption{Energy vs. Performance (x axis: bytes per cycle, y axis: nJ per byte)}
150\label{perf_energy}
151\end{figure}
Note: See TracBrowser for help on using the repository browser.