Changeset 1738


Ignore:
Timestamp:
Nov 24, 2011, 11:18:10 AM (7 years ago)
Author:
lindanl
Message:

Figure adjustment and some minor changes

Location:
docs/HPCA2012/final_ieee
Files:
26 added
7 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/final_ieee/04-methodology.tex

    r1737 r1738  
    3232entirely of single byte  encoded ASCII characters.
    3333
    34 \begin{table*}[htbp]
     34\begin{table}[htbp]
    3535\begin{center}
    3636{
    3737\footnotesize
    38 \begin{tabular}{|l||l|l|l|l|l|}
     38\begin{tabular}{|@{~}l@{~}||@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|}
    3939\hline
    40 File Name               & dew.xml               & jaw.xml               & roads.gml     & po.xml        & soap.xml \\ \hline   
    41 File Type               & document              & document              & data          & data          & data   \\ \hline     
    42 File Size (kB)          & 66240                 & 7343                  & 11584         & 76450         & 2717 \\ \hline
    43 Markup Item Count       & 406792                & 74882                 & 280724        & 4634110       & 18004 \\ \hline
    44 Markup Density          & 0.07                  & 0.13                  & 0.57          & 0.76          & 0.87  \\ \hline
     40File Name               & dew.xml       & jaw.xml       & roads.gml     & po.xml        & soap.xml \\ \hline   
     41File Type               & doc           & doc           & data          & data          & data   \\ \hline     
     42File Size (kB)          & 66240         & 7343          & 11584         & 76450         & 2717 \\ \hline
     43Markup Density          & 0.07          & 0.13          & 0.57          & 0.76          & 0.87  \\ \hline
    4544\end{tabular}
    4645}
     
    4847\caption{XML Document Characteristics}
    4948\label{XMLDocChars}
    50 \end{table*}
     49\end{table}
    5150
    5251
     
    7574framework.
    7675
    77 \begin{table*}[htbp]
     76\begin{table}[htbp]
    7877\begin{center}
    79 \footnotesize
    80 \begin{tabular}{|l||l|l|l|}
     78{
     79\begin{tabular}{|l||@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|}
    8180\hline
    82 Processor & Core2 Duo (2.13GHz) & i3-530 (2.93GHz) & Sandybridge (2.80GHz) \\ \hline
     81Processor & Core2 Duo & i3-530 & Sandybridge\\ \hline
     82Frequency &  2.13GHz & 2.93GHz & 2.80GHz \\ \hline
    8383L1 D Cache & 32KB & 32KB & 32KB \\ \hline       
    8484L2 Cache & Shared 2MB & 256KB/core & 256KB/core \\ \hline
    8585L3 Cache & --- & 4MB  & 6MB \\ \hline
    86 Bus or QPI &  1066Mhz Bus & 1333Mhz QPI & 1333Mhz QPI \\ \hline
    8786Memory  & 2GB & 4GB & 6GB\\ \hline
    8887Max TDP & 65W & 73W &  95W \\ \hline
    8988\end{tabular}
     89}
     90\end{center}
    9091\caption{Platform Hardware Specs}
    9192\label{hwinfo}
    92 \end{center}
    93 \vspace{-20pt}
    94 \end{table*}
     93\end{table}
    9594
    9695
  • docs/HPCA2012/final_ieee/05-corei3.tex

    r1733 r1738  
    12124, 11, and 36 cycles respectively. The L1 (32KB) and L2 cache (256KB)
    1313are private per core; L3 (4MB) is shared by all the cores.
    14 Figure \ref{cache_misses} shows the cache misses per kilobyte
     14Table \ref{cache_misses} shows the cache misses per kilobyte
    1515of input data. Analytically, the cache misses for the Expat and Xerces
    1616parsers represent a 0.5 cycle per XML byte cost. This overhead
     
    3131
    3232
    33 \begin{figure*}[htbp]
    34 \subfigure[L1 Misses]{
    35 \includegraphics[width=0.32\textwidth]{plots/corei3_L1DM.pdf}
    36 \label{corei3_L1DM}
    37 }
    38 \subfigure[L2 Misses]{
    39 \includegraphics[width=0.32\textwidth]{plots/corei3_L2DM.pdf}
    40 \label{corei3_L2DM}
    41 }
    42 \subfigure[L3 Misses]{
    43 \includegraphics[width=0.32\textwidth]{plots/corei3_L3CM.pdf}
    44 \label{corei3_L3DM}
    45 }
    46 \caption{Cache Misses per kB of input data.}
     33\begin{table}[htbp]
     34\begin{center}
     35\begin{tabular}{|c|c|c|c|}
     36\hline
     37        & Parabix       & Expat         & Xerces  \\ \hline
     38L1      & 4.1           & 31.7          & 104.2   \\ \hline
     39L2      & 0.1           & 12.0          & 1.7     \\ \hline
     40L3      & 0.03          & 3.9           & 0.3     \\ \hline
     41\end{tabular}
     42\end{center}
     43\caption{Cache Misses per kB of input data}
    4744\label{cache_misses}
    48 \end{figure*}
     45\end{table}
    4946
    5047\subsection{Branch Mispredictions}
     
    8582
    8683\begin{figure}
     84\begin{center}
     85{
    8786\subfigure[Branch Instructions / kB]{
    8887\includegraphics[width=0.5\textwidth]{plots/corei3_BR.pdf}
     
    9493\label{corei3_BM}
    9594}
     95}
     96\end{center}
    9697\caption{Branch characteristics on the \CITHREE\ per kB of input data.}
    9798\end{figure}
     
    144145requires less than a single cycle per byte.
    145146
     147\begin{table}[htbp]
     148\begin{center}
     149{
     150\begin{tabular}{|@{~}l@{~}||@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|@{~}l@{~}|}
     151\hline
     152File Name               & dew.xml       & jaw.xml       & roads.gml     & po.xml        & soap.xml \\ \hline   
     153SIMD                    & 81.68\%       & 80.59\%       & 70.7\%        & 66.02\%       & 59.9\%   \\ \hline   
     154Non-SIMD                & 18.32\%       & 19.41\%       & 29.3\%        & 33.98\%       & 40.1\%
     155 \\ \hline
     156\end{tabular}
     157}
     158\end{center}
     159\caption{SIMD Instruction Percentage}
     160\label{corei3_INS_p2}
     161\end{table}
     162
     163
    146164\begin{figure}[htbp]
    147 \centering
    148 \includegraphics[width=0.5\textwidth]{plots/corei3_INS_p2.pdf}
    149 \caption{SIMD Instruction Percentage}
    150 \label{corei3_INS_p2}
    151 \end{figure}
    152 
    153 \begin{figure}[htbp]
     165\begin{center}
     166{
    154167\includegraphics[width=0.5\textwidth]{plots/corei3_TOT.pdf}
     168}
     169\end{center}
    155170\caption{Performance (CPU Cycles per kB)}
    156171\label{corei3_TOT}
     
    179194
    180195
    181 
    182 
    183 
    184 
    185196\begin{figure}
     197\begin{center}
     198{
    186199\subfigure[Avg. Power (Watts)]{
    187200\includegraphics[width=0.5\textwidth]{plots/corei3_power.pdf}
     
    193206\label{corei3_energy}
    194207}
     208}
     209\end{center}
    195210\caption{Power profile of Parabix on \CITHREE{}}
    196211\end{figure}
  • docs/HPCA2012/final_ieee/06-scalability.tex

    r1733 r1738  
    44\label{section:scalability:intel}
    55In this section, we study the performance of the XML parsers across
    6 three generations of Intel architectures.  Figure \ref{ScalabilityA}
     6three generations of Intel architectures.  Figure \ref{Parabix_all_platform}
    77shows the average execution time of Parabix-XML (over all workloads).  We analyze the
    88execution time in terms of SIMD operations that operate on ``bit streams''
     
    2020demonstrate data dependent variance. Performance on the \CITHREE{} increases by
    212127\%--40\% compared to \CO{} whereas \SB{} increases by 16\%--29\%
    22 compared to \CITHREE{}. For the purpose of comparison, Figure
    23 \ref{ScalabilityB} shows the performance of the Expat parser.
     22compared to \CITHREE{}.
    2423\CITHREE\ improves performance only by 29\% over \CO\ while \SB\
    2524improves performance by less than 6\% over \CITHREE{}. Note that the
     
    2726frequency and microarchitecture improvements while \SB{}'s gains can
    2827be mainly attributed to the architecture.
    29 Figure \ref{power_Parabix2} shows the average power consumption of
     28Figure \ref{Parabix_all_platform} also shows the average power consumption of
    3029Parabix-XML over each workload and as executed on each of the processor
    3130cores: \CO{}, \CITHREE\ and \SB{}.  Each
     
    3534
    3635\begin{figure}
    37 \centering
    38 \subfigure[Parabix]{
    39 \includegraphics[width=0.22\textwidth]{plots/P2_scalability.pdf}
    40 \label{ScalabilityA}
     36\begin{center}
     37{
     38\includegraphics[width=0.5\textwidth]{plots/Parabix2_all_platform.pdf}
    4139}
    42 \hfill
    43 \centering
    44 \subfigure[Avg. Energy Consumption on various hardware (nJ per kB)]{
    45 \includegraphics[width=0.22\textwidth]{plots/energy_Parabix2.pdf}
    46 \label{energy_Parabix2}
    47 }
    48 \subfigure[Expat]{
    49 \includegraphics[width=0.40\textwidth]{plots/Expat_scalability.pdf}
    50 \label{ScalabilityB}
    51 }
    52 \caption{Average Performance Parabix vs. Expat (y-axis: ns per kB)}
    53 \label{Scalability}
    54 \end{figure}
    55 
    56 \begin{figure}
    57 \centering
    58 \subfigure[Avg. Power of Parabix on various hardware (Watts)]{
    59 \includegraphics[width=85mm]{plots/power_Parabix2.pdf}
    60 \label{power_Parabix2}
    61 }
    62 
    63 \caption{Energy Profile of Parabix on various hardware platforms}
     40\end{center}
     41\caption{Parabix on various hardware platforms}
     42\label{Parabix_all_platform}
    6443\end{figure}
    6544
     
    10281
    10382\begin{figure*}[htbp]
     83\begin{center}
     84{
    10485\subfigure[ARM Neon Performance (cycles per kB)]{
    10586\includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf}
     
    11697\label{relative_performance_intel}
    11798}
     99}
     100\end{center}
    118101\caption{Comparison of Parabix-XML on ARM vs. Intel.}
    119102\end{figure*}
  • docs/HPCA2012/final_ieee/09-pipeline.tex

    r1737 r1738  
    2525this introduced a significant level of complexity into the overall logic of the program.
    2626
    27 \begin{table*}[htbp]
     27\begin{table}[htbp]
    2828{
    2929\centering
    3030\footnotesize
    3131\begin{center}
    32 \begin{tabular}{|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|}
     32\begin{tabular}{|c|c|c|}
    3333\hline
    34         &      & & \multicolumn{10}{|c|}{Data Structure Flow / Dependencies}\\ \hline
    35         &      &                & data\_buffer& basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & err\_streams\\ \hline
    36         & latency(C/B)     & size (B)       & 128         & 128         & 496  & 448   & 80    & 176    & 112    & 176    & 16         & 112           \\ \hline
    37 Stage1  & 1.97 &read\_data      & write       &             &      &       &       &        &        &        &            &               \\
    38         &      &transposition   & read        & write       &      &       &       &        &        &        &            &               \\
    39         &      &classification  &             & read        &      & write &       &        &        &        &            &               \\ \hline
    40 Stage2  & 1.22 &validate\_u8    &             & read        & write&       &       &        &        &        &            &               \\
    41         &      &gen\_scope      &             &             &      & read  & write &        &        &        &            &               \\
    42         &      &parse\_CtCDPI   &             &             &      & read  & read  & write  &        &        &            & write         \\
    43         &      &parse\_ref      &             &             &      & read  & read  & read   & write  &        &            &               \\ \hline
    44 Stage3  & 2.03 &parse\_tag      &             &             &      & read  & read  & read   &        & write  &            &               \\
    45         &      &validate\_name  &             &             & read & read  &       & read   & read   & read   & write      & write         \\
    46         &      &gen\_check      &             &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
    47 Stage4  & 1.32 &postprocessing  & read        &             &      & read  &       & read   & read   &        &            & read          \\ \hline
     34
     35        & functions                                            & latency(C/B)     \\ \hline
     36Stage1  & read\_data, transposition, classification            & 1.97             \\ \hline
     37Stage2  & validate\_u8,  gen\_scope, parse\_CtCDPI, parse\_ref & 1.22             \\ \hline
     38Stage3  & parse\_tag validate\_name gen\_check                 & 2.03             \\ \hline
     39Stage4  & postprocessing                                       & 1.32             \\ \hline
    4840\end{tabular}
    4941\end{center}
    50 \caption{Relationship between Each Pass and Data Structures}
     42\caption{Stage Division}
    5143\label{pass_structure}
    5244}
    53 \end{table*}
     45\end{table}
    5446
    5547In contrast to those methods, we adopted a parallelism strategy that requires
     
    6355should be grouped together. By analyzing the latency and data dependencies of each of
    6456the passes in the single-threaded version of Parabix-XML
    65 (Column 1 in Table~\ref{pass_structure}), and assigned the passes
    66 to stages such that that provided the maximal throughput.
     57(Column 3 in Table~\ref{pass_structure}), and assigned the passes
     58to stages such that provided the maximal throughput.
    6759
    6860
    6961The interface between stages is implemented using a ring buffer, where
    70 each entry consists of all ten data structures for one segment as
    71 listed in Table \ref{pass_structure}.  Each pipeline stage $S$ maintains
     62each entry consists of all ten data structures for one segment.
     63Each pipeline stage $S$ maintains
    7264the index of the buffer entry ($I_S$) that is being processed. Before
    7365processing the next buffer frame the stage check if the previous stage
     
    7971controlling the overall size of the ring buffer. Whenever a faster stage
    8072runs ahead, it will effectively cause the ring buffer to fill up and
    81 force that stage to stall. Figure \ref{circular_buffer} shows the performance
    82 with different number of entries of the circular buffer, where
    83 6 entries gives the best performance.
    84 
    85 \begin{figure}[htbp]
    86 \includegraphics[width=0.5\textwidth]{plots/circular_buffer.pdf}
    87 \caption{Performance (CPU Cycles per kB)}
    88 \label{circular_buffer}
    89 \end{figure}
     73force that stage to stall. Experiments show that 6 entries of the
     74circular buffer gives the best performance.
    9075
    9176Figure~\ref{multithread_perf} demonstrates the performance improvement
     
    10085
    10186
    102 Figure \ref{pipeline_power} shows the average power consumed by the
     87Figure \ref{multithread_perf} also shows the average power consumed by the
    10388multithreaded Parabix. Overall, as expected the power consumption
    10489increases in proportion to the number of
    10590active cores. Note that the increase is not linear, since shared units
    10691such as last-level-caches consume active power even if only one core
    107 is active. Perhaps more interestingly there is a reduction in execution time, which leads to the energy consumption (see Figure~\ref{pipeline_energy}) being similar to the the single-thread execution (in some cases marginally less energy as shown for soap.xml). 
     92is active. Perhaps more interestingly there is a reduction in execution time,
     93which leads to the energy consumption being similar to the the single-thread execution
     94(in some cases marginally less energy e.g., soap.xml). 
    10895
    109 \begin{figure*}[htbp]
    110 \subfigure[Performance (Cycles / kB)]{
    111 \includegraphics[width=0.32\textwidth]{plots/pipeline_performance.pdf}
    112 \label{pipeline_performance}
     96\begin{figure}[htbp]
     97\begin{center}
     98{
     99\includegraphics[width=0.50\textwidth]{plots/pipeline.pdf}
    113100}
    114 \subfigure[Avg. Power Consumption (Watts)]{
    115 \includegraphics[width=0.32\textwidth]{plots/pipeline_power.pdf}
    116 \label{pipeline_power}
    117 }
    118 \subfigure[Avg. Energy Consumption (nJ / Byte)]{
    119   \includegraphics[width=0.32\textwidth]{plots/pipeline_energy.pdf}
    120 \label{pipeline_energy}
    121 }
    122 \caption{Multithreaded Parabix}
     101\end{center}
     102\caption{Average Statistic of Multithreaded Parabix}
    123103\label{multithread_perf}
    124 \end{figure*}
     104\end{figure}
    125105
  • docs/HPCA2012/final_ieee/final.aux

    r1737 r1738  
    4949\newlabel{workloads}{{5}{6}}
    5050\@writefile{toc}{\contentsline {paragraph}{XML Workloads:}{6}}
     51\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{6}}
     52\newlabel{XMLDocChars}{{1}{6}}
    5153\@writefile{toc}{\contentsline {paragraph}{Platform Hardware:}{6}}
     54\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Platform Hardware Specs\relax }}{6}}
     55\newlabel{hwinfo}{{2}{6}}
    5256\@writefile{toc}{\contentsline {paragraph}{Energy Measurement:}{6}}
    53 \@writefile{toc}{\contentsline {section}{\numberline {6}Efficiency of the Parabix-XML Parser}{6}}
    54 \newlabel{section:baseline}{{6}{6}}
    55 \@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Cache behavior}{6}}
    56 \@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{7}}
    57 \newlabel{XMLDocChars}{{1}{7}}
    58 \@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Platform Hardware Specs\relax }}{7}}
    59 \newlabel{hwinfo}{{2}{7}}
     57\@writefile{toc}{\contentsline {section}{\numberline {6}Efficiency of the Parabix-XML Parser}{7}}
     58\newlabel{section:baseline}{{6}{7}}
     59\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Cache behavior}{7}}
     60\@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Cache Misses per kB of input data\relax }}{7}}
     61\newlabel{cache_misses}{{3}{7}}
    6062\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Branch Mispredictions}{7}}
    6163\newlabel{section:XML-branches}{{6.2}{7}}
    62 \newlabel{corei3_BR}{{9(a)}{7}}
     64\newlabel{corei3_BR}{{8(a)}{7}}
    6365\newlabel{sub@corei3_BR}{{(a)}{7}}
    64 \newlabel{corei3_BM}{{9(b)}{7}}
     66\newlabel{corei3_BM}{{8(b)}{7}}
    6567\newlabel{sub@corei3_BM}{{(b)}{7}}
    66 \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Branch characteristics on the Core-i3\ per kB of input data.\relax }}{7}}
     68\@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Branch characteristics on the Core-i3\ per kB of input data.\relax }}{7}}
    6769\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Branch Instructions / kB}}}{7}}
    6870\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Branch Misses / kB}}}{7}}
    6971\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}SIMD Instructions vs. Total Instructions}{7}}
    70 \newlabel{corei3_L1DM}{{8(a)}{8}}
    71 \newlabel{sub@corei3_L1DM}{{(a)}{8}}
    72 \newlabel{corei3_L2DM}{{8(b)}{8}}
    73 \newlabel{sub@corei3_L2DM}{{(b)}{8}}
    74 \newlabel{corei3_L3DM}{{8(c)}{8}}
    75 \newlabel{sub@corei3_L3DM}{{(c)}{8}}
    76 \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Cache Misses per kB of input data.\relax }}{8}}
    77 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {L1 Misses}}}{8}}
    78 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {L2 Misses}}}{8}}
    79 \@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {L3 Misses}}}{8}}
    80 \newlabel{cache_misses}{{8}{8}}
    81 \@writefile{toc}{\contentsline {subsection}{\numberline {6.4}CPU Cycles}{8}}
    82 \@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces SIMD Instruction Percentage\relax }}{8}}
    83 \newlabel{corei3_INS_p2}{{10}{8}}
    84 \@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Performance (CPU Cycles per kB)\relax }}{8}}
    85 \newlabel{corei3_TOT}{{11}{8}}
     72\@writefile{toc}{\contentsline {subsection}{\numberline {6.4}CPU Cycles}{7}}
     73\@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces SIMD Instruction Percentage\relax }}{8}}
     74\newlabel{corei3_INS_p2}{{4}{8}}
     75\@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Performance (CPU Cycles per kB)\relax }}{8}}
     76\newlabel{corei3_TOT}{{9}{8}}
    8677\@writefile{toc}{\contentsline {subsection}{\numberline {6.5}Power and Energy}{8}}
    87 \newlabel{corei3_power}{{12(a)}{8}}
    88 \newlabel{sub@corei3_power}{{(a)}{8}}
    89 \newlabel{corei3_energy}{{12(b)}{8}}
    90 \newlabel{sub@corei3_energy}{{(b)}{8}}
    91 \@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Power profile of Parabix on Core-i3{}\relax }}{8}}
    92 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Avg. Power (Watts)}}}{8}}
    93 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Energy Consumption ($\mu $J per kB)}}}{8}}
    9478\@writefile{toc}{\contentsline {section}{\numberline {7}Evaluation of Parabix across different Hardware}{8}}
    9579\newlabel{section:scalability}{{7}{8}}
    9680\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Performance}{8}}
    9781\newlabel{section:scalability:intel}{{7.1}{8}}
    98 \newlabel{ScalabilityA}{{13(a)}{9}}
    99 \newlabel{sub@ScalabilityA}{{(a)}{9}}
    100 \newlabel{energy_Parabix2}{{13(b)}{9}}
    101 \newlabel{sub@energy_Parabix2}{{(b)}{9}}
    102 \newlabel{ScalabilityB}{{13(c)}{9}}
    103 \newlabel{sub@ScalabilityB}{{(c)}{9}}
    104 \@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Average Performance Parabix vs. Expat (y-axis: ns per kB)\relax }}{9}}
    105 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Parabix}}}{9}}
    106 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Avg. Energy Consumption on various hardware (nJ per kB)}}}{9}}
    107 \@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {Expat}}}{9}}
    108 \newlabel{Scalability}{{13}{9}}
    109 \@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Parabix on Mobile processors}{9}}
    110 \newlabel{section:scalability:Neon{}}{{7.2}{9}}
    111 \newlabel{power_Parabix2}{{14(a)}{9}}
    112 \newlabel{sub@power_Parabix2}{{(a)}{9}}
    113 \@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Energy Profile of Parabix on various hardware platforms\relax }}{9}}
    114 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Avg. Power of Parabix on various hardware (Watts)}}}{9}}
    115 \newlabel{arm_processing_time}{{15(a)}{10}}
     82\newlabel{corei3_power}{{10(a)}{8}}
     83\newlabel{sub@corei3_power}{{(a)}{8}}
     84\newlabel{corei3_energy}{{10(b)}{8}}
     85\newlabel{sub@corei3_energy}{{(b)}{8}}
     86\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Power profile of Parabix on Core-i3{}\relax }}{8}}
     87\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Avg. Power (Watts)}}}{8}}
     88\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Energy Consumption ($\mu $J per kB)}}}{8}}
     89\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Parabix on Mobile processors}{8}}
     90\newlabel{section:scalability:Neon{}}{{7.2}{8}}
     91\@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Parabix on various hardware platforms\relax }}{9}}
     92\newlabel{Parabix_all_platform}{{11}{9}}
     93\@writefile{toc}{\contentsline {section}{\numberline {8}Parabix on AVX}{9}}
     94\newlabel{section:avx}{{8}{9}}
     95\@writefile{toc}{\contentsline {subsection}{\numberline {8.1}3-Operand Form}{9}}
     96\@writefile{toc}{\contentsline {subsection}{\numberline {8.2}256-bit Operations}{9}}
     97\@writefile{toc}{\contentsline {subsection}{\numberline {8.3}Performance Results}{9}}
     98\citation{dataparallel}
     99\citation{Shah:2009}
     100\newlabel{arm_processing_time}{{12(a)}{10}}
    116101\newlabel{sub@arm_processing_time}{{(a)}{10}}
    117 \newlabel{relative_performance_arm}{{15(b)}{10}}
     102\newlabel{relative_performance_arm}{{12(b)}{10}}
    118103\newlabel{sub@relative_performance_arm}{{(b)}{10}}
    119 \newlabel{relative_performance_intel}{{15(c)}{10}}
     104\newlabel{relative_performance_intel}{{12(c)}{10}}
    120105\newlabel{sub@relative_performance_intel}{{(c)}{10}}
    121 \@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Comparison of Parabix-XML on ARM vs. Intel.\relax }}{10}}
     106\@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Comparison of Parabix-XML on ARM vs. Intel.\relax }}{10}}
    122107\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {ARM Neon Performance (cycles per kB)}}}{10}}
    123108\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {ARM Neon}}}{10}}
    124109\@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {Core i3}}}{10}}
    125 \@writefile{toc}{\contentsline {section}{\numberline {8}Parabix on AVX}{10}}
    126 \newlabel{section:avx}{{8}{10}}
    127 \@writefile{toc}{\contentsline {subsection}{\numberline {8.1}3-Operand Form}{10}}
    128 \@writefile{toc}{\contentsline {subsection}{\numberline {8.2}256-bit Operations}{10}}
    129 \@writefile{toc}{\contentsline {subsection}{\numberline {8.3}Performance Results}{10}}
    130 \@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Parabix Performance (y-axis: ns per kB)\relax }}{10}}
    131 \newlabel{avx}{{17}{10}}
    132 \citation{dataparallel}
    133 \citation{Shah:2009}
    134 \@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Parabix Instruction Counts (y-axis: Instructions per kB)\relax }}{11}}
    135 \newlabel{insmix}{{16}{11}}
    136 \@writefile{toc}{\contentsline {section}{\numberline {9}Multithreaded Parabix}{11}}
    137 \newlabel{section:multithread}{{9}{11}}
     110\@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Parabix Performance (y-axis: ns per kB)\relax }}{10}}
     111\newlabel{avx}{{14}{10}}
     112\@writefile{toc}{\contentsline {section}{\numberline {9}Multithreaded Parabix}{10}}
     113\newlabel{section:multithread}{{9}{10}}
    138114\citation{DaiNiZhu2010}
    139115\citation{NicolaJohn03}
     
    146122\citation{CameronLin2009}
    147123\citation{tan-sherwood-isca-2005}
    148 \@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Relationship between Each Pass and Data Structures\relax }}{12}}
    149 \newlabel{pass_structure}{{3}{12}}
    150 \@writefile{lof}{\contentsline {figure}{\numberline {18}{\ignorespaces Performance (CPU Cycles per kB)\relax }}{12}}
    151 \newlabel{circular_buffer}{{18}{12}}
    152 \@writefile{toc}{\contentsline {section}{\numberline {10}Related Work}{12}}
    153 \newlabel{section:related}{{10}{12}}
    154 \@writefile{toc}{\contentsline {section}{\numberline {11}Conclusion}{12}}
    155 \newlabel{section:conclusion}{{11}{12}}
     124\@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Parabix Instruction Counts (y-axis: Instructions per kB)\relax }}{11}}
     125\newlabel{insmix}{{13}{11}}
     126\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Stage Division\relax }}{11}}
     127\newlabel{pass_structure}{{5}{11}}
     128\@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Average Statistic of Multithreaded Parabix\relax }}{11}}
     129\newlabel{multithread_perf}{{15}{11}}
     130\@writefile{toc}{\contentsline {section}{\numberline {10}Related Work}{11}}
     131\newlabel{section:related}{{10}{11}}
    156132\bibstyle{ieee/latex8}
    157133\bibdata{reference}
     
    174150\bibcite{Leventhal2009}{17}
    175151\bibcite{xmlchip}{18}
     152\@writefile{toc}{\contentsline {section}{\numberline {11}Conclusion}{12}}
     153\newlabel{section:conclusion}{{11}{12}}
    176154\bibcite{LiWangLiuLi2009}{19}
    177155\bibcite{dataparallel}{20}
     
    180158\bibcite{Shah:2009}{23}
    181159\bibcite{tan-sherwood-isca-2005}{24}
    182 \newlabel{pipeline_performance}{{19(a)}{13}}
    183 \newlabel{sub@pipeline_performance}{{(a)}{13}}
    184 \newlabel{pipeline_power}{{19(b)}{13}}
    185 \newlabel{sub@pipeline_power}{{(b)}{13}}
    186 \newlabel{pipeline_energy}{{19(c)}{13}}
    187 \newlabel{sub@pipeline_energy}{{(c)}{13}}
    188 \@writefile{lof}{\contentsline {figure}{\numberline {19}{\ignorespaces Multithreaded Parabix\relax }}{13}}
    189 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Performance (Cycles / kB)}}}{13}}
    190 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Avg. Power Consumption (Watts)}}}{13}}
    191 \@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {Avg. Energy Consumption (nJ / Byte)}}}{13}}
    192 \newlabel{multithread_perf}{{19}{13}}
    193160\bibcite{ZhangPanChiu09}{25}
  • docs/HPCA2012/final_ieee/final.log

    r1737 r1738  
    1 This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.10.18)  22 NOV 2011 18:39
     1This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.4.5)  24 NOV 2011 11:16
    22entering extended mode
    33 %&-line parsing enabled.
     
    66LaTeX2e <2009/09/24>
    77Babel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh
    8 yphenation, farsi, arabic, croatian, bulgarian, ukrainian, russian, czech, slov
    9 ak, danish, dutch, finnish, french, basque, ngerman, german, german-x-2009-06-1
    10 9, ngerman-x-2009-06-19, ibycus, monogreek, greek, ancientgreek, hungarian, san
    11 skrit, italian, latin, latvian, lithuanian, mongolian2a, mongolian, bokmal, nyn
    12 orsk, romanian, irish, coptic, serbian, turkish, welsh, esperanto, uppersorbian
    13 , estonian, indonesian, interlingua, icelandic, kurmanji, slovenian, polish, po
    14 rtuguese, spanish, galician, catalan, swedish, ukenglish, pinyin, loaded.
     8yphenation, loaded.
    159(./preamble-final-ieee.tex (/usr/share/texmf-texlive/tex/latex/base/article.cls
    1610Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
     
    457451
    458452[5]
    459 Underfull \hbox (badness 1286) in paragraph at lines 99--112
     453Underfull \hbox (badness 1286) in paragraph at lines 98--111
    460454[] \OT1/ptm/b/n/10 En-ergy Mea-sure-ment:[] \OT1/ptm/m/n/10 A key ben-e-fit of
    461455the Para-bix
     
    463457
    464458) (./05-corei3.tex [6 <./plots/parabix_arch.pdf>]
    465 <plots/corei3_L1DM.pdf, id=68, 442.65375pt x 226.8475pt>
    466 File: plots/corei3_L1DM.pdf Graphic file (type pdf)
    467 
    468 <use plots/corei3_L1DM.pdf>
    469 <plots/corei3_L2DM.pdf, id=70, 456.70625pt x 231.86626pt>
    470 File: plots/corei3_L2DM.pdf Graphic file (type pdf)
    471 
    472 <use plots/corei3_L2DM.pdf>
    473 <plots/corei3_L3CM.pdf, id=72, 456.70625pt x 233.87375pt>
    474 File: plots/corei3_L3CM.pdf Graphic file (type pdf)
    475 
    476 <use plots/corei3_L3CM.pdf>
    477 <plots/corei3_BR.pdf, id=74, 454.69875pt x 206.7725pt>
     459<plots/corei3_BR.pdf, id=68, 454.69875pt x 206.7725pt>
    478460File: plots/corei3_BR.pdf Graphic file (type pdf)
    479461
    480462<use plots/corei3_BR.pdf>
    481 <plots/corei3_BM.pdf, id=76, 440.64626pt x 202.7575pt>
     463<plots/corei3_BM.pdf, id=70, 440.64626pt x 202.7575pt>
    482464File: plots/corei3_BM.pdf Graphic file (type pdf)
    483465
    484466<use plots/corei3_BM.pdf>
    485 Overfull \hbox (12.22688pt too wide) in paragraph at lines 90--96
    486 [][]
    487  []
    488 
    489 
    490 Overfull \hbox (12.22688pt too wide) in paragraph at lines 90--96
    491 []
     467Overfull \hbox (12.22688pt too wide) in paragraph at lines 89--96
     468 []
     469 []
     470
     471
     472Overfull \hbox (12.22688pt too wide) in paragraph at lines 89--96
     473 []
    492474 []
    493475
    494476[7 <./plots/corei3_BR.pdf> <./plots/corei3_BM.pdf>]
    495 <plots/corei3_INS_p2.pdf, id=110, 448.67625pt x 210.7875pt>
    496 File: plots/corei3_INS_p2.pdf Graphic file (type pdf)
    497 
    498 <use plots/corei3_INS_p2.pdf>
    499 Overfull \hbox (7.22688pt too wide) in paragraph at lines 148--149
    500  [][]
    501  []
    502 
    503 <plots/corei3_TOT.pdf, id=112, 457.71pt x 209.78375pt>
     477Overfull \hbox (7.49034pt too wide) in paragraph at lines 150--158
     478 []
     479 []
     480
     481<plots/corei3_TOT.pdf, id=104, 457.71pt x 209.78375pt>
    504482File: plots/corei3_TOT.pdf Graphic file (type pdf)
    505483
    506484<use plots/corei3_TOT.pdf>
    507 Overfull \hbox (7.22688pt too wide) in paragraph at lines 154--155
    508 [][]
    509  []
    510 
    511 <plots/corei3_power.pdf, id=114, 451.6875pt x 208.78pt>
     485Overfull \hbox (7.22688pt too wide) in paragraph at lines 167--169
     486 []
     487 []
     488
     489<plots/corei3_power.pdf, id=106, 451.6875pt x 208.78pt>
    512490File: plots/corei3_power.pdf Graphic file (type pdf)
    513491
    514492<use plots/corei3_power.pdf>
    515 <plots/corei3_energy.pdf, id=116, 454.69875pt x 203.76125pt>
     493<plots/corei3_energy.pdf, id=108, 454.69875pt x 203.76125pt>
    516494File: plots/corei3_energy.pdf Graphic file (type pdf)
    517495
    518496<use plots/corei3_energy.pdf>
    519 Overfull \hbox (12.22688pt too wide) in paragraph at lines 189--195
    520 [][]
    521  []
    522 
    523 
    524 Overfull \hbox (12.22688pt too wide) in paragraph at lines 189--195
    525 []
    526  []
    527 
    528 ) (./06-scalability.tex [8 <./plots/corei3_L1DM.pdf> <./plots/corei3_L2DM.pdf>
    529 <./plots/corei3_L3CM.pdf> <./plots/corei3_INS_p2.pdf> <./plots/corei3_TOT.pdf>
    530 <./plots/corei3_power.pdf> <./plots/corei3_energy.pdf>]
    531 <plots/P2_scalability.pdf, id=205, 744.7825pt x 466.74374pt>
    532 File: plots/P2_scalability.pdf Graphic file (type pdf)
    533 
    534 <use plots/P2_scalability.pdf>
    535 <plots/energy_Parabix2.pdf, id=206, 454.69875pt x 210.7875pt>
    536 File: plots/energy_Parabix2.pdf Graphic file (type pdf)
    537 
    538 <use plots/energy_Parabix2.pdf>
    539 <plots/Expat_scalability.pdf, id=208, 744.7825pt x 456.70625pt>
    540 File: plots/Expat_scalability.pdf Graphic file (type pdf)
    541 
    542 <use plots/Expat_scalability.pdf>
    543 <plots/power_Parabix2.pdf, id=209, 453.695pt x 211.79124pt>
    544 File: plots/power_Parabix2.pdf Graphic file (type pdf)
    545 
    546 <use plots/power_Parabix2.pdf>
    547 Overfull \hbox (1.13031pt too wide) in paragraph at lines 61--62
    548  [][]
    549  []
    550 
    551 <plots/arm_TOT.pdf, id=211, 424.58624pt x 283.0575pt>
     497Overfull \hbox (12.22688pt too wide) in paragraph at lines 202--209
     498 []
     499 []
     500
     501
     502Overfull \hbox (12.22688pt too wide) in paragraph at lines 202--209
     503 []
     504 []
     505
     506) (./06-scalability.tex
     507<plots/Parabix2_all_platform.pdf, id=110, 432.61626pt x 263.98625pt>
     508File: plots/Parabix2_all_platform.pdf Graphic file (type pdf)
     509
     510<use plots/Parabix2_all_platform.pdf>
     511Overfull \hbox (7.22688pt too wide) in paragraph at lines 38--40
     512 []
     513 []
     514
     515[8 <./plots/corei3_TOT.pdf> <./plots/corei3_power.pdf> <./plots/corei3_energy.p
     516df>] <plots/arm_TOT.pdf, id=157, 424.58624pt x 283.0575pt>
    552517File: plots/arm_TOT.pdf Graphic file (type pdf)
    553  <use plots/arm_TOT.pdf>
    554 <plots/Markup_density_Arm.pdf, id=213, 369.38pt x 252.945pt>
     518
     519<use plots/arm_TOT.pdf>
     520<plots/Markup_density_Arm.pdf, id=159, 369.38pt x 252.945pt>
    555521File: plots/Markup_density_Arm.pdf Graphic file (type pdf)
    556522
    557523<use plots/Markup_density_Arm.pdf>
    558 <plots/Markup_density_Intel.pdf, id=215, 370.38374pt x 252.945pt>
     524<plots/Markup_density_Intel.pdf, id=161, 370.38374pt x 252.945pt>
    559525File: plots/Markup_density_Intel.pdf Graphic file (type pdf)
    560526
    561 <use plots/Markup_density_Intel.pdf>) (./07-avx.tex [9 <./plots/P2_scalability.
    562 pdf> <./plots/energy_Parabix2.pdf> <./plots/Expat_scalability.pdf> <./plots/pow
    563 er_Parabix2.pdf>] <plots/InsMix.pdf, id=278, 744.7825pt x 261.97874pt>
     527<use plots/Markup_density_Intel.pdf>) (./07-avx.tex [9 <./plots/Parabix2_all_pl
     528atform.pdf>] <plots/InsMix.pdf, id=190, 744.7825pt x 261.97874pt>
    564529File: plots/InsMix.pdf Graphic file (type pdf)
    565530
    566 <use plots/InsMix.pdf> <plots/avx.pdf, id=279, 424.58624pt x 212.795pt>
     531<use plots/InsMix.pdf> <plots/avx.pdf, id=191, 424.58624pt x 212.795pt>
    567532File: plots/avx.pdf Graphic file (type pdf)
    568533
     
    572537 []
    573538
    574 [10 <./plots/arm_TOT.pdf> <./plots/Markup_density_Arm.pdf> <./plots/Markup_dens
    575 ity_Intel.pdf> <./plots/avx.pdf>]) (./09-pipeline.tex [11 <./plots/InsMix.pdf>]
    576 <plots/circular_buffer.pdf, id=342, 496.85625pt x 218.8175pt>
    577 File: plots/circular_buffer.pdf Graphic file (type pdf)
    578 
    579 <use plots/circular_buffer.pdf>
    580 Overfull \hbox (7.22688pt too wide) in paragraph at lines 86--87
    581 [][]
    582  []
    583 
    584 
    585 Underfull \hbox (badness 1072) in paragraph at lines 91--100
    586 []\OT1/ptm/m/n/10 Figure 19[] demon-strates the per-for-mance im-prove-ment
    587  []
    588 
    589 <plots/pipeline_performance.pdf, id=343, 489.83pt x 259.97125pt>
    590 File: plots/pipeline_performance.pdf Graphic file (type pdf)
    591 
    592 <use plots/pipeline_performance.pdf>
    593 <plots/pipeline_power.pdf, id=345, 481.8pt x 252.945pt>
    594 File: plots/pipeline_power.pdf Graphic file (type pdf)
    595 
    596 <use plots/pipeline_power.pdf>
    597 <plots/pipeline_energy.pdf, id=347, 478.78876pt x 253.94875pt>
    598 File: plots/pipeline_energy.pdf Graphic file (type pdf)
    599 
    600 <use plots/pipeline_energy.pdf>) (./10-related.tex) (./11-conclusions.tex
    601 [12 <./plots/circular_buffer.pdf>]) (./final.bbl
     539) (./09-pipeline.tex [10 <./plots/arm_TOT.pdf> <./plots/Markup_density_Arm.pdf>
     540 <./plots/Markup_density_Intel.pdf> <./plots/avx.pdf>]
     541Overfull \hbox (9.70384pt too wide) in paragraph at lines 32--41
     542 []
     543 []
     544
     545
     546Underfull \hbox (badness 1072) in paragraph at lines 76--85
     547[]\OT1/ptm/m/n/10 Figure 15[] demon-strates the per-for-mance im-prove-ment
     548 []
     549
     550<plots/pipeline.pdf, id=237, 471.7625pt x 275.0275pt>
     551File: plots/pipeline.pdf Graphic file (type pdf)
     552 <use plots/pipeline.pdf>
     553Overfull \hbox (7.22688pt too wide) in paragraph at lines 99--101
     554 []
     555 []
     556
     557) (./10-related.tex [11 <./plots/InsMix.pdf> <./plots/pipeline.pdf>])
     558(./11-conclusions.tex) (./final.bbl
    602559Underfull \hbox (badness 1137) in paragraph at lines 17--22
    603560[]\OT1/ptm/m/n/9 R. Bertran, M. Gon-za-lez, X. Mar-torell, N. Navarro, and
     
    619576 []
    620577
     578[12]
    621579Missing character: There is no à in font ptmr7t!
    622580Missing character: There is no š in font ptmr7t!
    623 [13 <./plots/pipeline_performance.pdf> <./plots/pipeline_power.pdf> <./plots/pi
    624 peline_energy.pdf>]) [14
     581) [13
    625582
    626583] (./final.aux) )
    627584Here is how much of TeX's memory you used:
    628  4017 strings out of 493848
    629  56933 string characters out of 1152822
    630  123580 words of memory out of 3000000
    631  7113 multiletter control sequences out of 15000+50000
     585 3946 strings out of 495061
     586 55240 string characters out of 1182622
     587 121364 words of memory out of 3000000
     588 6953 multiletter control sequences out of 15000+50000
    632589 68455 words of font info for 164 fonts, out of 3000000 for 9000
    633  717 hyphenation exceptions out of 8191
     590 31 hyphenation exceptions out of 8191
    634591 38i,12n,38p,1456b,370s stack positions out of 5000i,500n,10000p,200000b,50000s
    635592{/usr/share/texmf-texlive/fonts/enc/dvips/base/8r.enc}</usr/sh
     
    644601/share/texmf-texlive/fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-texlive
    645602/fonts/type1/urw/times/utmri8a.pfb>
    646 Output written on final.pdf (14 pages, 694749 bytes).
     603Output written on final.pdf (13 pages, 553842 bytes).
    647604PDF statistics:
    648  442 PDF objects out of 1000 (max. 8388607)
     605 311 PDF objects out of 1000 (max. 8388607)
    649606 0 named destinations out of 1000 (max. 500000)
    650  121 words of extra memory for PDF output out of 10000 (max. 10000000)
    651 
     607 71 words of extra memory for PDF output out of 10000 (max. 10000000)
     608
Note: See TracChangeset for help on using the changeset viewer.