Changeset 2865


Ignore:
Timestamp:
Jan 30, 2013, 2:08:17 PM (6 years ago)
Author:
cameron
Message:

GML2SVG performance write-up

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/performance.tex

    r2863 r2865  
    9191the requirements of an alternative format.
    9292
    93 \subsubsection{Workload}
    94 
    9593Our GML to SVG data translations are executed on GML source data
    9694modelling the city of Vancouver, British Columbia, Canada.
     
    10199213.4 MB of source GML data generates 91.9 MB of target SVG data.
    102100
    103 Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
    104 the Xerces, \icXML{} and pipelined \icXML{}.   On the GML workload with this application,
    105 single-thread \icXML{}
    106 achieved about a 50\% acceleration over Xerces, reducing CPU cycles per byte from 62.8 to 40.2.
    107 Total elapsed time reduced from 3.66 seconds to 2.43.   Using pipelined  \icXML{}, the
    108 processing time further reduced to 1.92 seconds.
    109 
    110 
    111 
    112101\begin{figure}
    113 \includegraphics[width=0.5\textwidth]{plots/perf_GML2SVG.pdf}
     102\includegraphics[width=0.5\textwidth]{plots/Throughput.pdf}
    114103\caption{Performance Comparison for GML2SVG}
    115104\label{perf_GML2SVG}
     
    117106
    118107
     108Figure \ref{perf_GML2SVG} compares the performance of the GML2SVG application linked against
     109the Xerces, \icXML{} and pipelined \icXML{}.   On the GML workload with this application,
     110single-thread \icXML{}
     111achieved about a 50\% acceleration over Xerces,
     112increasing throughput on our test machine from ???  MB/sec to ??? MB/sec.   Using pipelined  \icXML{}, a
     113further throughput increase to ???  MB/sec was recorded.
     114
     115An important aspect of \icXML{} is the replacement of much branch-laden
     116sequential code inside Xerces with straight-line SIMD code using far
     117fewer branches.  Figure \ref{branchmiss_GML2SVG} shows the corresponding
     118improvement in branching behaviour, with a dramatic reduction in branch misses per KB.
     119It is also interesting to note that pipelined \icXML{} goes even
     120further.   In essence, in using pipeline parallelism to split the instruction
     121stream onto separate cores, the branch target buffers on each core are
     122less overloaded and able to increase the successful branch prediction rate.
     123
     124\begin{figure}
     125\includegraphics[width=0.5\textwidth]{plots/BM.pdf}
     126\caption{Comparative Branch Misprediction Rate}
     127\label{branchmiss_GML2SVG}
     128\end{figure}
     129
     130
     131The behaviour of the three versions with respect to L1 cache misses per MB is shown
     132in Figure \ref{cachemiss_GML2SVG}.   Improvements are shown in both instruction-
     133and data-cache performance with the improveements in instruction-cache
     134behaviour the most dramatic.   Single-threaded \icXML{} shows substantially improved
     135performance over Xerces on both measures.   The pipelined version shows a slight
     136worsening in data-cache performance, well more than offset by a further dramatic
     137reduction in instruction-cache miss rate.   Again partitioning the instruction
     138stream through the pipeline parallelism model has significant benefit.
     139
     140\begin{figure}
     141\includegraphics[width=0.5\textwidth]{plots/CM.pdf}
     142\caption{Comparative Cache Miss Rate}
     143\label{cachemiss_GML2SVG}
     144\end{figure}
     145
     146One caveat with this study is that the GML2SVG application did not exhibit
     147a relative balance of processing between application code and Xerces library
     148code reaching the 33\% figure.  This suggests that for this application and
     149possibly others, further separating the logical layers of the
     150\icXML{} engine into different pipeline stages could well offer significant benefit.
     151This remains an area of ongoing work.
     152
     153
    119154
    120155 
Note: See TracChangeset for help on using the changeset viewer.