Changeset 1331 for docs/HPCA2012


Ignore:
Timestamp:
Aug 20, 2011, 7:51:52 PM (8 years ago)
Author:
lindanl
Message:

section 4

Location:
docs/HPCA2012
Files:
1 added
6 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/03-research.tex

    r1302 r1331  
    113113processed in as a final post processing step. A further aspect of the parallel cursor method with bit stream addition is that the conditional branch statements used to identify syntax error at each each parsing position are eliminated. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing and further reduces branch misprediction penalties.
    114114
    115 \subsection{Parallel Bit Stream Compilation}
    116 
    117 While the description of parallel bit stream parsing in the previous section works conceptually on
    118 unbounded bit streams, in practice, a corresponding C implementation to process input streams into blocks
    119 of size equal to the SIMD register width of the target processor is required. In our work, we leverage the unbounded
    120 integer type of the Python programming language. Using a restricted subset of Python, we prototype and validate the
    121 functionality of applications, such as XML validation and UTF-8 to UTF-16 transcoding. We then compile this Python code
    122 into equivalent block-at-a-time C code. The key question becomes how to transfer information from one block to the next whenever
    123 token scans cross block boundaries.
    124 
    125 The answer lies in carry bit propagation. Since the parallel $scanto$ operation relies solely on bit-wise addition and logical operations,
    126 block-to-block information transfer is captured in entirety by the carry bit associated with each underlying addition operation. Logical operations
    127 do not require information flow across block boundaries. Properly determining, initializing and inserting carry bits into a block-by-block
    128 implementation is tedious and error prone. Thus we have developed compiler technology to automatically transform parallel bit stream
    129 Python code to block-at-a-time C implementations. Details are beyond the scope of this paper, but are described in the on-line
    130 source code repository at parabix.costar.sfu.ca.
    131115
    132116
  • docs/HPCA2012/03b-research.tex

    r1302 r1331  
    1 \section{Parabix2}
    2 Talk about where Parabix1 may be improved.
     1\section{Parabix}
    32
    4 Talk about compiler effort.
    5 
    6 Talk about usage of new SIMD instructions.
    7 
    8 Describe Parabix 2.
    9 
    10 Describe differences between Parabix1 and Parabix2.
    11 
    12 Why is Parabix2 better?
     3\subsection{Parabix Architecture}
     4\begin{figure}
     5\begin{center}
     6\includegraphics[width=0.5\textwidth]{plots/parabix_arch.pdf}
     7\end{center}
     8\caption{Parabix2 Architecture}
     9\label{parabix_arch}
     10\end{figure}
    1311
    1412
     13Figure \ref{parabix_arch} shows the overall architecture of the parabix for well-formedness checking.
     14The input file is processed by 7 modules or 11 stages and the error position is reported at the end if there is any.
     15The first stage, Read\_Data, loads a chunk of data from an input file to data\_buffer.
     16The data is then transposed to eight parallel basis bitstreams (basis\_bits) in the Transposition stage.
     17The eight bitstreams are used in Classification stage to generate all the XML lexical item streams (lex)
     18as well as in U8\_Validation stage to validate UTF-8 characters.
     19The lexical item streams and scope streams (scope) that are generated in Gen\_Scope stage
     20are supplied to the parsing module, which consists three stages, Parse\_CtCDPI, Parse\_Ref and Parse\_tag.
     21After parsing the comments, cdata, processing instructions, references and tags,
     22information is gathered by Name\_Validation and Err\_Check stages,
     23where name streams and error streams are calculated and passed to the final stage, Postprocessing.
     24All the possible errors that cannot be detected by bitstreams are checked in this last stage and
     25error type with line and column number will be reported.
     26
     27\subsection{Parallel Bit Stream Compilation}
     28
     29
     30While the description of parallel bit stream parsing in the previous section works conceptually on
     31unbounded bit streams, in practice, a corresponding C implementation to process input streams into blocks
     32of size equal to the SIMD register width of the target processor is required. In our work, we leverage the unbounded
     33integer type of the Python programming language. Using a restricted subset of Python, we prototype and validate the
     34functionality of applications, such as XML validation and UTF-8 to UTF-16 transcoding. We then compile this Python code
     35into equivalent block-at-a-time C code. The key question becomes how to transfer information from one block to the next whenever
     36token scans cross block boundaries.
     37
     38The answer lies in carry bit propagation. Since the parallel $scanto$ operation relies solely on bit-wise addition and logical operations,
     39block-to-block information transfer is captured in entirety by the carry bit associated with each underlying addition operation. Logical operations
     40do not require information flow across block boundaries. Properly determining, initializing and inserting carry bits into a block-by-block
     41implementation is tedious and error prone. Thus we have developed compiler technology to automatically transform parallel bit stream
     42Python code to block-at-a-time C implementations. Details are beyond the scope of this paper, but are described in the on-line
     43source code repository at parabix.costar.sfu.ca.
     44
  • docs/HPCA2012/09-pipeline.tex

    r1329 r1331  
    3535\hline
    3636       & & \multicolumn{10}{|c|}{Data Structures}\\ \hline
    37        &                & srcbuf & basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & check\_streams\\ \hline
    38 Stage1 &fill\_buffer    & write  &             &      &       &       &        &        &        &            &               \\
    39        &s2p             & read   & write       &      &       &       &        &        &        &            &               \\
    40        &classify\_bytes &        & read        &      & write &       &        &        &        &            &               \\ \hline
    41 Stage2 &validate\_u8    &        & read        & write&       &       &        &        &        &            &               \\
    42        &gen\_scope      &        &             &      & read  & write &        &        &        &            &               \\
    43        &parse\_CtCDPI   &        &             &      & read  & read  & write  &        &        &            & write         \\
    44        &parse\_ref      &        &             &      & read  & read  & read   & write  &        &            &               \\ \hline
    45 Stage3 &parse\_tag      &        &             &      & read  & read  & read   &        & write  &            &               \\
    46        &validate\_name  &        &             & read & read  &       & read   & read   & read   & write      & write         \\
    47        &gen\_check      &        &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
    48 Stage4 &postprocessing  & read   &             &      & read  &       & read   & read   &        &            & read          \\ \hline
     37       &                & data\_buffer& basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & err\_streams\\ \hline
     38Stage1 &read\_data      & write       &             &      &       &       &        &        &        &            &               \\
     39       &transposition   & read        & write       &      &       &       &        &        &        &            &               \\
     40       &classification  &             & read        &      & write &       &        &        &        &            &               \\ \hline
     41Stage2 &validate\_u8    &             & read        & write&       &       &        &        &        &            &               \\
     42       &gen\_scope      &             &             &      & read  & write &        &        &        &            &               \\
     43       &parse\_CtCDPI   &             &             &      & read  & read  & write  &        &        &            & write         \\
     44       &parse\_ref      &             &             &      & read  & read  & read   & write  &        &            &               \\ \hline
     45Stage3 &parse\_tag      &             &             &      & read  & read  & read   &        & write  &            &               \\
     46       &validate\_name  &             &             & read & read  &       & read   & read   & read   & write      & write         \\
     47       &gen\_check      &             &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
     48Stage4 &postprocessing  & read        &             &      & read  &       & read   & read   &        &            & read          \\ \hline
    4949\end{tabular}
    5050\end{center}
  • docs/HPCA2012/main.aux

    r1329 r1331  
    6969\@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Parabix2 Start Tag Validation\relax }}{9}{figure.caption.5}}
    7070\newlabel{fig:Parabix2StarttagExample}{{5}{9}{Parabix2 Start Tag Validation\relax \relax }{figure.caption.5}{}}
    71 \@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Parallel Bit Stream Compilation}{9}{subsection.3.3}}
     71\@writefile{toc}{\contentsline {section}{\numberline {4}Parabix}{9}{section.4}}
     72\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Parabix Architecture}{9}{subsection.4.1}}
     73\@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Parabix2 Architecture\relax }}{10}{figure.caption.6}}
     74\newlabel{parabix_arch}{{6}{10}{Parabix2 Architecture\relax \relax }{figure.caption.6}{}}
     75\@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Parallel Bit Stream Compilation}{10}{subsection.4.2}}
    7276\citation{bellosa2001,bertran2010,bircher2007}
    7377\citation{bellosa2001}
    7478\citation{bircher2007,bertran2010}
    7579\citation{bellosa2001,bircher2007,bertran2010}
     80\@writefile{toc}{\contentsline {section}{\numberline {5}Methodology}{11}{section.5}}
     81\@writefile{brf}{\backcite{bellosa2001, bertran2010, bircher2007}{{11}{5}{section.5}}}
     82\@writefile{brf}{\backcite{bellosa2001}{{11}{5}{section.5}}}
     83\@writefile{brf}{\backcite{bircher2007, bertran2010}{{11}{5}{section.5}}}
    7684\citation{Parabix1}
    7785\citation{parabix2}
    7886\citation{xerces}
    7987\citation{expat}
    80 \@writefile{toc}{\contentsline {section}{\numberline {4}Methodology}{10}{section.4}}
    81 \@writefile{brf}{\backcite{bellosa2001, bertran2010, bircher2007}{{10}{4}{section.4}}}
    82 \@writefile{brf}{\backcite{bellosa2001}{{10}{4}{section.4}}}
    83 \@writefile{brf}{\backcite{bircher2007, bertran2010}{{10}{4}{section.4}}}
    84 \@writefile{brf}{\backcite{bellosa2001, bircher2007, bertran2010}{{10}{4}{section.4}}}
    85 \@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{11}{table.caption.6}}
    86 \newlabel{XMLDocChars}{{1}{11}{XML Document Characteristics\relax \relax }{table.caption.6}{}}
    87 \@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Parsers}{11}{subsection.4.1}}
    88 \newlabel{parsers}{{4.1}{11}{Parsers\relax }{subsection.4.1}{}}
    89 \@writefile{brf}{\backcite{Parabix1}{{11}{4.1}{subsection.4.1}}}
    90 \@writefile{brf}{\backcite{parabix2}{{11}{4.1}{subsection.4.1}}}
    91 \@writefile{brf}{\backcite{xerces}{{11}{4.1}{subsection.4.1}}}
    92 \@writefile{brf}{\backcite{expat}{{11}{4.1}{subsection.4.1}}}
    93 \@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Workloads}{11}{subsection.4.2}}
    94 \newlabel{workloads}{{4.2}{11}{Workloads\relax }{subsection.4.2}{}}
    95 \@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Platform Hardware}{12}{subsection.4.3}}
    96 \@writefile{toc}{\contentsline {paragraph}{Intel Core2{}}{12}{section*.7}}
    97 \@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Core2{}\relax }}{12}{table.caption.8}}
    98 \newlabel{core2info}{{2}{12}{\CO {}\relax \relax }{table.caption.8}{}}
    99 \@writefile{toc}{\contentsline {paragraph}{Intel Core-i3{}}{12}{section*.9}}
    100 \@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Core-i3{}\relax }}{12}{table.caption.10}}
    101 \newlabel{i3info}{{3}{12}{\CITHREE {}\relax \relax }{table.caption.10}{}}
    102 \@writefile{toc}{\contentsline {paragraph}{Intel Core-i5{}}{12}{section*.11}}
    103 \@writefile{toc}{\contentsline {subsection}{\numberline {4.4}PMC Hardware Events}{12}{subsection.4.4}}
    104 \newlabel{events}{{4.4}{12}{PMC Hardware Events\relax }{subsection.4.4}{}}
     88\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{12}{table.caption.7}}
     89\newlabel{XMLDocChars}{{1}{12}{XML Document Characteristics\relax \relax }{table.caption.7}{}}
     90\@writefile{brf}{\backcite{bellosa2001, bircher2007, bertran2010}{{12}{5}{section.5}}}
     91\@writefile{toc}{\contentsline {subsection}{\numberline {5.1}Parsers}{12}{subsection.5.1}}
     92\newlabel{parsers}{{5.1}{12}{Parsers\relax }{subsection.5.1}{}}
     93\@writefile{brf}{\backcite{Parabix1}{{12}{5.1}{subsection.5.1}}}
     94\@writefile{brf}{\backcite{parabix2}{{12}{5.1}{subsection.5.1}}}
     95\@writefile{brf}{\backcite{xerces}{{12}{5.1}{subsection.5.1}}}
     96\@writefile{brf}{\backcite{expat}{{12}{5.1}{subsection.5.1}}}
     97\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Workloads}{12}{subsection.5.2}}
     98\newlabel{workloads}{{5.2}{12}{Workloads\relax }{subsection.5.2}{}}
     99\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Platform Hardware}{13}{subsection.5.3}}
     100\@writefile{toc}{\contentsline {paragraph}{Intel Core2{}}{13}{section*.8}}
     101\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Core2{}\relax }}{13}{table.caption.9}}
     102\newlabel{core2info}{{2}{13}{\CO {}\relax \relax }{table.caption.9}{}}
     103\@writefile{toc}{\contentsline {paragraph}{Intel Core-i3{}}{13}{section*.10}}
     104\@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Core-i3{}\relax }}{13}{table.caption.11}}
     105\newlabel{i3info}{{3}{13}{\CITHREE {}\relax \relax }{table.caption.11}{}}
     106\@writefile{toc}{\contentsline {paragraph}{Intel Core-i5{}}{13}{section*.12}}
    105107\citation{clamp}
    106 \@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces SandyBridge{}\relax }}{13}{table.caption.12}}
    107 \newlabel{sandybridgeinfo}{{4}{13}{\SB {}\relax \relax }{table.caption.12}{}}
    108 \@writefile{toc}{\contentsline {subsection}{\numberline {4.5}Energy Measurement}{13}{subsection.4.5}}
    109 \@writefile{brf}{\backcite{clamp}{{13}{4.5}{subsection.4.5}}}
    110 \@writefile{toc}{\contentsline {section}{\numberline {5}Baseline Evaluation on Core-i3{}}{13}{section.5}}
    111 \@writefile{toc}{\contentsline {subsection}{\numberline {5.1}Cache behavior}{13}{subsection.5.1}}
    112 \@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Core-i3\ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{14}{figure.caption.13}}
    113 \newlabel{corei3_L1DM}{{6}{14}{\CITHREE \ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.13}{}}
    114 \@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Core-i3\ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{14}{figure.caption.14}}
    115 \newlabel{corei3_L2DM}{{7}{14}{\CITHREE \ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.14}{}}
    116 \@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Branch Mispredictions}{14}{subsection.5.2}}
    117 \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Core-i3\ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax }}{15}{figure.caption.15}}
    118 \newlabel{corei3_L3TM}{{8}{15}{\CITHREE \ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.15}{}}
    119 \@writefile{toc}{\contentsline {subsection}{\numberline {5.3}SIMD Instructions vs. Total Instructions}{15}{subsection.5.3}}
     108\@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces SandyBridge{}\relax }}{14}{table.caption.13}}
     109\newlabel{sandybridgeinfo}{{4}{14}{\SB {}\relax \relax }{table.caption.13}{}}
     110\@writefile{toc}{\contentsline {subsection}{\numberline {5.4}PMC Hardware Events}{14}{subsection.5.4}}
     111\newlabel{events}{{5.4}{14}{PMC Hardware Events\relax }{subsection.5.4}{}}
     112\@writefile{toc}{\contentsline {subsection}{\numberline {5.5}Energy Measurement}{14}{subsection.5.5}}
     113\@writefile{brf}{\backcite{clamp}{{14}{5.5}{subsection.5.5}}}
     114\@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Core-i3\ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{15}{figure.caption.14}}
     115\newlabel{corei3_L1DM}{{7}{15}{\CITHREE \ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.14}{}}
     116\@writefile{toc}{\contentsline {section}{\numberline {6}Baseline Evaluation on Core-i3{}}{15}{section.6}}
     117\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Cache behavior}{15}{subsection.6.1}}
     118\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Branch Mispredictions}{15}{subsection.6.2}}
     119\@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Core-i3\ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{16}{figure.caption.15}}
     120\newlabel{corei3_L2DM}{{8}{16}{\CITHREE \ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.15}{}}
     121\@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Core-i3\ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax }}{16}{figure.caption.16}}
     122\newlabel{corei3_L3TM}{{9}{16}{\CITHREE \ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.16}{}}
     123\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Core-i3\ --- Branch Instructions (y-axis: Branches per kB)\relax }}{17}{figure.caption.17}}
     124\newlabel{corei3_BR}{{10}{17}{\CITHREE \ --- Branch Instructions (y-axis: Branches per kB)\relax \relax }{figure.caption.17}{}}
     125\@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Core-i3\ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax }}{17}{figure.caption.18}}
     126\newlabel{corei3_BM}{{11}{17}{\CITHREE \ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax \relax }{figure.caption.18}{}}
     127\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}SIMD Instructions vs. Total Instructions}{17}{subsection.6.3}}
    120128\citation{Cameron2008}
    121 \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Core-i3\ --- Branch Instructions (y-axis: Branches per kB)\relax }}{16}{figure.caption.16}}
    122 \newlabel{corei3_BR}{{9}{16}{\CITHREE \ --- Branch Instructions (y-axis: Branches per kB)\relax \relax }{figure.caption.16}{}}
    123 \@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Core-i3\ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax }}{16}{figure.caption.17}}
    124 \newlabel{corei3_BM}{{10}{16}{\CITHREE \ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax \relax }{figure.caption.17}{}}
    125 \@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax }}{17}{figure.caption.18}}
    126 \newlabel{corei3_INS_p1}{{11}{17}{Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax \relax }{figure.caption.18}{}}
    127 \@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax }}{17}{figure.caption.19}}
    128 \newlabel{corei3_INS_p2}{{12}{17}{Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax \relax }{figure.caption.19}{}}
    129 \@writefile{toc}{\contentsline {subsection}{\numberline {5.4}CPU Cycles}{17}{subsection.5.4}}
    130 \@writefile{brf}{\backcite{Cameron2008}{{17}{5.4}{subsection.5.4}}}
    131 \@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Core-i3\ --- Performance (y-axis: CPU Cycles per kB)\relax }}{18}{figure.caption.20}}
    132 \newlabel{corei3_TOT}{{13}{18}{\CITHREE \ --- Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.20}{}}
    133 \@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Core-i3\ --- Average Power Consumption (watts)\relax }}{18}{figure.caption.21}}
    134 \newlabel{corei3_power}{{14}{18}{\CITHREE \ --- Average Power Consumption (watts)\relax \relax }{figure.caption.21}{}}
    135 \@writefile{toc}{\contentsline {subsection}{\numberline {5.5}Power and Energy}{18}{subsection.5.5}}
    136 \@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Core-i3\ --- Energy Consumption ($\mu $J per kB)\relax }}{19}{figure.caption.22}}
    137 \newlabel{corei3_energy}{{15}{19}{\CITHREE \ --- Energy Consumption ($\mu $J per kB)\relax \relax }{figure.caption.22}{}}
    138 \@writefile{toc}{\contentsline {section}{\numberline {6}Scalability}{19}{section.6}}
    139 \@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Performance}{19}{subsection.6.1}}
    140 \@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax }}{20}{figure.caption.23}}
    141 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Parabix2}}}{20}{figure.caption.23}}
    142 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Expat}}}{20}{figure.caption.23}}
    143 \newlabel{Scalability}{{16}{20}{Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.23}{}}
    144 \@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Average Power of Parabix2 (watts)\relax }}{20}{figure.caption.24}}
    145 \newlabel{power_Parabix2}{{17}{20}{Average Power of Parabix2 (watts)\relax \relax }{figure.caption.24}{}}
    146 \@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Power and Energy}{20}{subsection.6.2}}
    147 \@writefile{lof}{\contentsline {figure}{\numberline {18}{\ignorespaces Energy consumption of Parabix2 (nJ/B)\relax }}{21}{figure.caption.25}}
    148 \newlabel{energy_Parabix2}{{18}{21}{Energy consumption of Parabix2 (nJ/B)\relax \relax }{figure.caption.25}{}}
    149 \@writefile{lof}{\contentsline {figure}{\numberline {19}{\ignorespaces Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax }}{21}{figure.caption.26}}
    150 \newlabel{insmix}{{19}{21}{Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax \relax }{figure.caption.26}{}}
    151 \@writefile{toc}{\contentsline {section}{\numberline {7}Scaling Parabix2 for AVX}{21}{section.7}}
    152 \@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Three Operand Form}{21}{subsection.7.1}}
    153 \@writefile{lof}{\contentsline {figure}{\numberline {20}{\ignorespaces Parabix2 Performance (y-axis: CPU Cycles per kB)\relax }}{22}{figure.caption.27}}
    154 \newlabel{avx}{{20}{22}{Parabix2 Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.27}{}}
    155 \@writefile{toc}{\contentsline {subsection}{\numberline {7.2}256-bit AVX Operations}{22}{subsection.7.2}}
    156 \@writefile{toc}{\contentsline {subsection}{\numberline {7.3}Performance Results}{22}{subsection.7.3}}
    157 \@writefile{toc}{\contentsline {section}{\numberline {8}Parabix2 on GT-P1000M}{24}{section.8}}
    158 \@writefile{toc}{\contentsline {subsection}{\numberline {8.1}Platform Hardware}{24}{subsection.8.1}}
    159 \@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces GT-P1000M\relax }}{24}{table.caption.28}}
    160 \newlabel{arminfo}{{5}{24}{GT-P1000M\relax \relax }{table.caption.28}{}}
    161 \@writefile{toc}{\contentsline {subsection}{\numberline {8.2}Performance Results}{24}{subsection.8.2}}
    162 \@writefile{lof}{\contentsline {figure}{\numberline {21}{\ignorespaces Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax }}{25}{figure.caption.29}}
    163 \newlabel{arm_processing_time}{{21}{25}{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.29}{}}
     129\@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax }}{18}{figure.caption.19}}
     130\newlabel{corei3_INS_p1}{{12}{18}{Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax \relax }{figure.caption.19}{}}
     131\@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax }}{18}{figure.caption.20}}
     132\newlabel{corei3_INS_p2}{{13}{18}{Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax \relax }{figure.caption.20}{}}
     133\@writefile{toc}{\contentsline {subsection}{\numberline {6.4}CPU Cycles}{18}{subsection.6.4}}
     134\@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Core-i3\ --- Performance (y-axis: CPU Cycles per kB)\relax }}{19}{figure.caption.21}}
     135\newlabel{corei3_TOT}{{14}{19}{\CITHREE \ --- Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.21}{}}
     136\@writefile{brf}{\backcite{Cameron2008}{{19}{6.4}{subsection.6.4}}}
     137\@writefile{toc}{\contentsline {subsection}{\numberline {6.5}Power and Energy}{19}{subsection.6.5}}
     138\@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Core-i3\ --- Average Power Consumption (watts)\relax }}{20}{figure.caption.22}}
     139\newlabel{corei3_power}{{15}{20}{\CITHREE \ --- Average Power Consumption (watts)\relax \relax }{figure.caption.22}{}}
     140\@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Core-i3\ --- Energy Consumption ($\mu $J per kB)\relax }}{20}{figure.caption.23}}
     141\newlabel{corei3_energy}{{16}{20}{\CITHREE \ --- Energy Consumption ($\mu $J per kB)\relax \relax }{figure.caption.23}{}}
     142\@writefile{toc}{\contentsline {section}{\numberline {7}Scalability}{20}{section.7}}
     143\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Performance}{20}{subsection.7.1}}
     144\@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax }}{21}{figure.caption.24}}
     145\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Parabix2}}}{21}{figure.caption.24}}
     146\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Expat}}}{21}{figure.caption.24}}
     147\newlabel{Scalability}{{17}{21}{Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.24}{}}
     148\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Power and Energy}{21}{subsection.7.2}}
     149\@writefile{lof}{\contentsline {figure}{\numberline {18}{\ignorespaces Average Power of Parabix2 (watts)\relax }}{22}{figure.caption.25}}
     150\newlabel{power_Parabix2}{{18}{22}{Average Power of Parabix2 (watts)\relax \relax }{figure.caption.25}{}}
     151\@writefile{lof}{\contentsline {figure}{\numberline {19}{\ignorespaces Energy consumption of Parabix2 (nJ/B)\relax }}{22}{figure.caption.26}}
     152\newlabel{energy_Parabix2}{{19}{22}{Energy consumption of Parabix2 (nJ/B)\relax \relax }{figure.caption.26}{}}
     153\@writefile{toc}{\contentsline {section}{\numberline {8}Scaling Parabix2 for AVX}{22}{section.8}}
     154\@writefile{toc}{\contentsline {subsection}{\numberline {8.1}Three Operand Form}{22}{subsection.8.1}}
     155\@writefile{lof}{\contentsline {figure}{\numberline {20}{\ignorespaces Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax }}{23}{figure.caption.27}}
     156\newlabel{insmix}{{20}{23}{Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax \relax }{figure.caption.27}{}}
     157\@writefile{lof}{\contentsline {figure}{\numberline {21}{\ignorespaces Parabix2 Performance (y-axis: CPU Cycles per kB)\relax }}{23}{figure.caption.28}}
     158\newlabel{avx}{{21}{23}{Parabix2 Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.28}{}}
     159\@writefile{toc}{\contentsline {subsection}{\numberline {8.2}256-bit AVX Operations}{23}{subsection.8.2}}
     160\@writefile{toc}{\contentsline {subsection}{\numberline {8.3}Performance Results}{24}{subsection.8.3}}
     161\@writefile{toc}{\contentsline {section}{\numberline {9}Parabix2 on GT-P1000M}{25}{section.9}}
     162\@writefile{toc}{\contentsline {subsection}{\numberline {9.1}Platform Hardware}{25}{subsection.9.1}}
     163\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces GT-P1000M\relax }}{26}{table.caption.29}}
     164\newlabel{arminfo}{{5}{26}{GT-P1000M\relax \relax }{table.caption.29}{}}
     165\@writefile{lof}{\contentsline {figure}{\numberline {22}{\ignorespaces Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax }}{26}{figure.caption.30}}
     166\newlabel{arm_processing_time}{{22}{26}{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.30}{}}
     167\@writefile{toc}{\contentsline {subsection}{\numberline {9.2}Performance Results}{26}{subsection.9.2}}
    164168\citation{dataparallel}
    165169\citation{Shah:2009}
    166 \@writefile{lof}{\contentsline {figure}{\numberline {22}{\ignorespaces Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. Core-i3{} \relax }}{26}{figure.caption.30}}
    167 \newlabel{relative_performance_arm_vs_i3}{{22}{26}{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE {} \relax \relax }{figure.caption.30}{}}
    168 \@writefile{toc}{\contentsline {section}{\numberline {9}Multi-threaded Parabix}{26}{section.9}}
    169 \@writefile{brf}{\backcite{dataparallel}{{26}{9}{section.9}}}
    170 \@writefile{brf}{\backcite{Shah:2009}{{26}{9}{section.9}}}
    171 \@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Relationship between Each Pass and Data Structures\relax }}{27}{table.caption.31}}
    172 \newlabel{pass_structure}{{6}{27}{Relationship between Each Pass and Data Structures\relax \relax }{table.caption.31}{}}
    173 \@writefile{lof}{\contentsline {figure}{\numberline {23}{\ignorespaces Processing Time (y axis: CPU cycles per byte)\relax }}{27}{figure.caption.32}}
    174 \newlabel{multithread_perf}{{23}{27}{Processing Time (y axis: CPU cycles per byte)\relax \relax }{figure.caption.32}{}}
    175 \@writefile{lof}{\contentsline {figure}{\numberline {24}{\ignorespaces Average Power (watts)\relax }}{28}{figure.caption.33}}
    176 \newlabel{power}{{24}{28}{Average Power (watts)\relax \relax }{figure.caption.33}{}}
    177 \@writefile{lof}{\contentsline {figure}{\numberline {25}{\ignorespaces Energy Consumption (nJ per byte)\relax }}{28}{figure.caption.34}}
    178 \newlabel{energy}{{25}{28}{Energy Consumption (nJ per byte)\relax \relax }{figure.caption.34}{}}
    179 \@writefile{toc}{\contentsline {section}{\numberline {10}Conclusion}{28}{section.10}}
     170\@writefile{lof}{\contentsline {figure}{\numberline {23}{\ignorespaces Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. Core-i3{} \relax }}{27}{figure.caption.31}}
     171\newlabel{relative_performance_arm_vs_i3}{{23}{27}{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE {} \relax \relax }{figure.caption.31}{}}
     172\@writefile{toc}{\contentsline {section}{\numberline {10}Multi-threaded Parabix}{27}{section.10}}
     173\@writefile{brf}{\backcite{dataparallel}{{27}{10}{section.10}}}
     174\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Relationship between Each Pass and Data Structures\relax }}{28}{table.caption.32}}
     175\newlabel{pass_structure}{{6}{28}{Relationship between Each Pass and Data Structures\relax \relax }{table.caption.32}{}}
     176\@writefile{brf}{\backcite{Shah:2009}{{28}{10}{section.10}}}
     177\@writefile{lof}{\contentsline {figure}{\numberline {24}{\ignorespaces Processing Time (y axis: CPU cycles per byte)\relax }}{29}{figure.caption.33}}
     178\newlabel{multithread_perf}{{24}{29}{Processing Time (y axis: CPU cycles per byte)\relax \relax }{figure.caption.33}{}}
     179\@writefile{lof}{\contentsline {figure}{\numberline {25}{\ignorespaces Average Power (watts)\relax }}{29}{figure.caption.34}}
     180\newlabel{power}{{25}{29}{Average Power (watts)\relax \relax }{figure.caption.34}{}}
     181\@writefile{lof}{\contentsline {figure}{\numberline {26}{\ignorespaces Energy Consumption (nJ per byte)\relax }}{29}{figure.caption.35}}
     182\newlabel{energy}{{26}{29}{Energy Consumption (nJ per byte)\relax \relax }{figure.caption.35}{}}
    180183\bibstyle{abbrv}
    181184\bibdata{reference}
     
    189192\bibcite{Cameron2010}{8}
    190193\bibcite{CameronHerdyLin2008}{9}
     194\@writefile{toc}{\contentsline {section}{\numberline {11}Conclusion}{30}{section.11}}
    191195\bibcite{expat}{10}
    192196\bibcite{clamp}{11}
  • docs/HPCA2012/main.tex

    r1327 r1331  
    176176\input{02-background.tex}
    177177\input{03-research.tex}
     178\input{03b-research.tex}
    178179\input{04-methodology.tex}
    179180\input{05-corei3.tex}
Note: See TracChangeset for help on using the changeset viewer.