Ignore:
Timestamp:
Nov 24, 2011, 11:18:10 AM (8 years ago)
Author:
lindanl
Message:

Figure adjustment and some minor changes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/final_ieee/09-pipeline.tex

    r1737 r1738  
    2525this introduced a significant level of complexity into the overall logic of the program.
    2626
    27 \begin{table*}[htbp]
     27\begin{table}[htbp]
    2828{
    2929\centering
    3030\footnotesize
    3131\begin{center}
    32 \begin{tabular}{|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|}
     32\begin{tabular}{|c|c|c|}
    3333\hline
    34         &      & & \multicolumn{10}{|c|}{Data Structure Flow / Dependencies}\\ \hline
    35         &      &                & data\_buffer& basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & err\_streams\\ \hline
    36         & latency(C/B)     & size (B)       & 128         & 128         & 496  & 448   & 80    & 176    & 112    & 176    & 16         & 112           \\ \hline
    37 Stage1  & 1.97 &read\_data      & write       &             &      &       &       &        &        &        &            &               \\
    38         &      &transposition   & read        & write       &      &       &       &        &        &        &            &               \\
    39         &      &classification  &             & read        &      & write &       &        &        &        &            &               \\ \hline
    40 Stage2  & 1.22 &validate\_u8    &             & read        & write&       &       &        &        &        &            &               \\
    41         &      &gen\_scope      &             &             &      & read  & write &        &        &        &            &               \\
    42         &      &parse\_CtCDPI   &             &             &      & read  & read  & write  &        &        &            & write         \\
    43         &      &parse\_ref      &             &             &      & read  & read  & read   & write  &        &            &               \\ \hline
    44 Stage3  & 2.03 &parse\_tag      &             &             &      & read  & read  & read   &        & write  &            &               \\
    45         &      &validate\_name  &             &             & read & read  &       & read   & read   & read   & write      & write         \\
    46         &      &gen\_check      &             &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
    47 Stage4  & 1.32 &postprocessing  & read        &             &      & read  &       & read   & read   &        &            & read          \\ \hline
     34
     35        & functions                                            & latency(C/B)     \\ \hline
     36Stage1  & read\_data, transposition, classification            & 1.97             \\ \hline
     37Stage2  & validate\_u8,  gen\_scope, parse\_CtCDPI, parse\_ref & 1.22             \\ \hline
     38Stage3  & parse\_tag validate\_name gen\_check                 & 2.03             \\ \hline
     39Stage4  & postprocessing                                       & 1.32             \\ \hline
    4840\end{tabular}
    4941\end{center}
    50 \caption{Relationship between Each Pass and Data Structures}
     42\caption{Stage Division}
    5143\label{pass_structure}
    5244}
    53 \end{table*}
     45\end{table}
    5446
    5547In contrast to those methods, we adopted a parallelism strategy that requires
     
    6355should be grouped together. By analyzing the latency and data dependencies of each of
    6456the passes in the single-threaded version of Parabix-XML
    65 (Column 1 in Table~\ref{pass_structure}), and assigned the passes
    66 to stages such that that provided the maximal throughput.
     57(Column 3 in Table~\ref{pass_structure}), and assigned the passes
     58to stages such that provided the maximal throughput.
    6759
    6860
    6961The interface between stages is implemented using a ring buffer, where
    70 each entry consists of all ten data structures for one segment as
    71 listed in Table \ref{pass_structure}.  Each pipeline stage $S$ maintains
     62each entry consists of all ten data structures for one segment.
     63Each pipeline stage $S$ maintains
    7264the index of the buffer entry ($I_S$) that is being processed. Before
    7365processing the next buffer frame the stage check if the previous stage
     
    7971controlling the overall size of the ring buffer. Whenever a faster stage
    8072runs ahead, it will effectively cause the ring buffer to fill up and
    81 force that stage to stall. Figure \ref{circular_buffer} shows the performance
    82 with different number of entries of the circular buffer, where
    83 6 entries gives the best performance.
    84 
    85 \begin{figure}[htbp]
    86 \includegraphics[width=0.5\textwidth]{plots/circular_buffer.pdf}
    87 \caption{Performance (CPU Cycles per kB)}
    88 \label{circular_buffer}
    89 \end{figure}
     73force that stage to stall. Experiments show that 6 entries of the
     74circular buffer gives the best performance.
    9075
    9176Figure~\ref{multithread_perf} demonstrates the performance improvement
     
    10085
    10186
    102 Figure \ref{pipeline_power} shows the average power consumed by the
     87Figure \ref{multithread_perf} also shows the average power consumed by the
    10388multithreaded Parabix. Overall, as expected the power consumption
    10489increases in proportion to the number of
    10590active cores. Note that the increase is not linear, since shared units
    10691such as last-level-caches consume active power even if only one core
    107 is active. Perhaps more interestingly there is a reduction in execution time, which leads to the energy consumption (see Figure~\ref{pipeline_energy}) being similar to the the single-thread execution (in some cases marginally less energy as shown for soap.xml). 
     92is active. Perhaps more interestingly there is a reduction in execution time,
     93which leads to the energy consumption being similar to the the single-thread execution
     94(in some cases marginally less energy e.g., soap.xml). 
    10895
    109 \begin{figure*}[htbp]
    110 \subfigure[Performance (Cycles / kB)]{
    111 \includegraphics[width=0.32\textwidth]{plots/pipeline_performance.pdf}
    112 \label{pipeline_performance}
     96\begin{figure}[htbp]
     97\begin{center}
     98{
     99\includegraphics[width=0.50\textwidth]{plots/pipeline.pdf}
    113100}
    114 \subfigure[Avg. Power Consumption (Watts)]{
    115 \includegraphics[width=0.32\textwidth]{plots/pipeline_power.pdf}
    116 \label{pipeline_power}
    117 }
    118 \subfigure[Avg. Energy Consumption (nJ / Byte)]{
    119   \includegraphics[width=0.32\textwidth]{plots/pipeline_energy.pdf}
    120 \label{pipeline_energy}
    121 }
    122 \caption{Multithreaded Parabix}
     101\end{center}
     102\caption{Average Statistic of Multithreaded Parabix}
    123103\label{multithread_perf}
    124 \end{figure*}
     104\end{figure}
    125105
Note: See TracChangeset for help on using the changeset viewer.