# Changeset 1783 for docs/HPCA2012/final_ieee/04-methodology.tex

Ignore:
Timestamp:
Dec 14, 2011, 2:27:41 PM (8 years ago)
Message:

Final pass

File:
1 edited

### Legend:

Unmodified
 r1774 \pagebreak \section{Evaluation Framework} \label{section:methodology} \end{table} \begin{table}[htbp] { \footnotesize \begin{center} { \begin{tabular}{|l||@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|} \hline Processor & Core2 Duo & i3-530 & Sandybridge\\ \hline Frequency &  2.13GHz & 2.93GHz & 2.80GHz \\ \hline L1 D Cache & 32KB & 32KB & 32KB \\ \hline L2 Cache & Shared 2MB & 256KB/core & 256KB/core \\ \hline L3 Cache & --- & 4MB  & 6MB \\ \hline Max TDP & 65W & 73W &  95W \\ \hline \end{tabular} } \end{center} } \caption{Platform Hardware Specs} \label{hwinfo} \end{table} %\begin{table}[htbp] %{ %  \footnotesize %  \begin{center} %{ %\begin{tabular}{|l||@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|} %\hline %Processor & Core2 Duo & i3-530 & Sandybridge\\ \hline %Frequency &  2.13GHz & 2.93GHz & 2.80GHz \\ \hline %L1 D Cache & 32KB & 32KB & 32KB \\ \hline %L2 Cache & Shared 2MB & 256KB/core & 256KB/core \\ \hline %L3 Cache & --- & 4MB  & 6MB \\ \hline %Max TDP & 65W & 73W &  95W \\ \hline %\end{tabular} %} %\end{center} %  } %\caption{Platform Hardware Specs} %\label{hwinfo} %\end{table} \paragraph{Platform Hardware:} SSE SIMD extensions have been available on commodity Intel processors for over a decade since the Pentium III. They have steadily evolved with improvements in instruction latency, cache interface, register SSE SIMD extensions have been available on commodity Intel processors for over a decade since the Pentium III. They have steadily evolved with improvements in instruction latency, cache interface, register resources, and the addition of domain specific instructions. Here we investigate SIMD extensions across three different generations of intel processors (hardware details given in Table \ref{hwinfo}). We compare the energy and performance profile of the Parabix parser on each of the platforms. We also analyze the implementation specifics of SIMD extensions under various microarchitectures as well as the newer AVX extensions supported by \SB{}. We investigate the execution profiles of each XML parser using the performance counters found in the processor. We choose several key hardware events that provide insight into the profile of each application and indicate if the processor is doing useful work ~\cite{bellosa2001, bertran2010}. The set of events included in our study are: branch instructions, branch mispredictions, integer instructions, SIMD instructions, and cache misses. In addition, we characterize the SIMD operations and study the type and class of SIMD operations using the Intel Pin binary instrumentation intel processors: Core2Duo (2.13Ghz,32KB L1, 2MB Shared L2), Core i3 (2.9Ghz, 32KB L1,256KB L2, 4MB Shared LLC), and Sandybridge (2.8Ghz, 32KB L1, 256KB L2, 6MB LLC). We compare the energy and performance profile of the Parabix under the platforms.  We also analyze the implementation specifics of SIMD extensions under various microarchitectures and the newer AVX extensions.  We investigate the execution profiles of each XML parser using the performance counters found in the processor.  We choose several key hardware events that provide insight into the profile of each application and indicate if the processor is doing useful work ~\cite{bellosa2001, bertran2010}. The set of events included in our study are: branch instructions, branch mispredictions, integer instructions, SIMD instructions, and cache misses. In addition, we characterize the SIMD operations and study the type and class of SIMD operations using the Intel Pin framework. \paragraph{Energy Measurement:} A key benefit of the Parabix parser is its more efficient use of the processor pipeline which reflects in the overall energy usage.  We measure the energy consumption of the processor directly using a %A key benefit of the We measure the energy consumption of the processor directly using a current clamp. We apply the Fluke i410 current clamp \cite{clamp} to the 12V wires that supply power to the processor sockets. The clamp detects the throughout the entire execution of the program and then calculate overall total energy as  $12V*\sum^{N_{samples}}_{i=1} Sample_i$.