Changeset 1329 for docs


Ignore:
Timestamp:
Aug 20, 2011, 10:49:39 AM (8 years ago)
Author:
lindanl
Message:

section 9

Location:
docs/HPCA2012
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/09-pipeline.tex

    r1326 r1329  
    11\section{Multi-threaded Parabix}
     2The general problem of addressing performance through multicore parallelism
     3is the increasing energy cost. As discussed in previous sections,
     4Parabix, which applies SIMD-based techniques can not only achieves better performance but consumes less energy.
     5Moreover, using mulitiple cores, we can further improve the performance of Parabix while keeping the energy consumption at the same level.
     6
     7A typical approach to parallelizing software, data parallelism, requires nearly independent data,
     8However, the nature of XML files makes them hard to partition nicely for data parallelism.
     9Several approaches have been used to address this problem.
     10A preparsing phase has been proposed to help partition the XML document \cite{dataparallel}.
     11The goal of this preparsing is to determine the tree structure of the XML document
     12so that it can be used to guide the full parsing in the next phase.
     13Another data parallel algorithm is called ParDOM \cite{Shah:2009}.
     14It first builds partial DOM node tree structures for each data segments and then link them
     15using preorder numbers that has been assigned to each start element to determine the ordering among siblings
     16and a stack to manage the parent-child relationship between elements.
     17
     18Data parallelism approaches introduce a lot of overheads to solve the data dependencies between segments.
     19Therefore, instead of partitioning the data into segments and assigning different data segments to different cores,
     20we propose a pipeline parallelism strategy that partitions the process into several stages and let each core work with one single stage.
     21
     22The interface between stages is implemented using a circular array,
     23where each entry consists of all ten data structures for one segment as listed in Table \ref{pass_structure}.
     24Each thread keeps an index of the array ($I_N$),
     25which is compared with the index ($I_{N-1}$) kept by its previous thread before processing the segment.
     26If $I_N$ is smaller than $I_{N-1}$, thread N can start processing segment $I_N$,
     27otherwise the thread keeps reading $I_{N-1}$ until $I_{N-1}$ is larger than $I_N$.
     28The time consumed by continuously loading the value of $I_{N-1}$ and
     29comparing it with $I_N$ will be later referred as stall time.
     30When a thread finishes processing the segment, it increases the index by one.
    231
    332\begin{table*}[t]
     
    534\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
    635\hline
    7 Stage Name & \multicolumn{10}{|c|}{Data Structures}\\ \hline
    8                 & srcbuf & basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & check\_streams\\ \hline
    9 fill\_buffer    & write  &             &      &       &       &        &        &        &            &               \\ \hline
    10 s2p             & read   & write       &      &       &       &        &        &        &            &               \\ \hline
    11 classify\_bytes &        & read        &      & write &       &        &        &        &            &               \\ \hline
    12 validate\_u8    &        & read        & write&       &       &        &        &        &            &               \\ \hline
    13 gen\_scope      &        &             &      & read  & write &        &        &        &            &               \\ \hline
    14 parse\_CtCDPI   &        &             &      & read  & read  & write  &        &        &            & write         \\ \hline
    15 parse\_ref      &        &             &      & read  & read  & read   & write  &        &            &               \\ \hline
    16 parse\_tag      &        &             &      & read  & read  & read   &        & write  &            &               \\ \hline
    17 validate\_name  &        &             & read & read  &       & read   & read   & read   & write      & write         \\ \hline
    18 gen\_check      &        &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
    19 postprocessing  & read   &             &      & read  &       & read   & read   &        &            & read          \\ \hline
     36       & & \multicolumn{10}{|c|}{Data Structures}\\ \hline
     37       &                & srcbuf & basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & check\_streams\\ \hline
     38Stage1 &fill\_buffer    & write  &             &      &       &       &        &        &        &            &               \\
     39       &s2p             & read   & write       &      &       &       &        &        &        &            &               \\
     40       &classify\_bytes &        & read        &      & write &       &        &        &        &            &               \\ \hline
     41Stage2 &validate\_u8    &        & read        & write&       &       &        &        &        &            &               \\
     42       &gen\_scope      &        &             &      & read  & write &        &        &        &            &               \\
     43       &parse\_CtCDPI   &        &             &      & read  & read  & write  &        &        &            & write         \\
     44       &parse\_ref      &        &             &      & read  & read  & read   & write  &        &            &               \\ \hline
     45Stage3 &parse\_tag      &        &             &      & read  & read  & read   &        & write  &            &               \\
     46       &validate\_name  &        &             & read & read  &       & read   & read   & read   & write      & write         \\
     47       &gen\_check      &        &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
     48Stage4 &postprocessing  & read   &             &      & read  &       & read   & read   &        &            & read          \\ \hline
    2049\end{tabular}
    2150\end{center}
     
    2453\end{table*}
    2554
     55Figure \ref{multithread_perf} demonstrates the XML well-formedness checking performance of
     56the multi-threaded Parabix in comparison with the single-threaded version.
     57The multi-threaded Parabix is more than two times faster and runs at 2.7 cycles per input byte on the \SB{} machine.
    2658
    2759\begin{figure}
     
    3062\end{center}
    3163\caption{Processing Time (y axis: CPU cycles per byte)}
    32 \label{perf}
     64\label{multithread_perf}
    3365\end{figure}
     66
     67Figure \ref{power} shows the average power consumed by the multi-threaded Parabix in comparison with the single-threaded version.
     68By running four threads and using all the cores at the same time, the power consumption of the multi-threaded Parabix is much higher
     69than the single-threaded version. However, the energy consumption is about the same, because the multi-threaded Parabix needs less processing time.
     70In fact, as shown in Figure \ref{energy}, parsing soap.xml using multi-threaded Parabix consumes less energy than using single-threaded Parabix.
    3471
    3572\begin{figure}
    3673\begin{center}
    37 \includegraphics[width=0.5\textwidth]{plots/perf_energy.pdf}
     74\includegraphics[width=0.5\textwidth]{plots/power.pdf}
    3875\end{center}
    39 \caption{Energy vs. Performance (x axis: bytes per cycle, y axis: nJ per byte)}
    40 \label{perf_energy}
     76\caption{Average Power (watts)}
     77\label{power}
     78\end{figure}
     79\begin{figure}
     80\begin{center}
     81\includegraphics[width=0.5\textwidth]{plots/energy.pdf}
     82\end{center}
     83\caption{Energy Consumption (nJ per byte)}
     84\label{energy}
    4185\end{figure}
    4286
  • docs/HPCA2012/main.aux

    r1327 r1329  
    1616\citation{esmaeilzadeh-isca-2011}
    1717\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}{section.1}}
     18\@writefile{brf}{\backcite{}{{1}{1}{section.1}}}
    1819\citation{venkatesh-asplos-2010,hameed-isca-2010}
    1920\citation{}
     21\@writefile{brf}{\backcite{blake-isca-2010}{{2}{1}{section.1}}}
     22\@writefile{brf}{\backcite{esmaeilzadeh-isca-2011}{{2}{1}{section.1}}}
     23\@writefile{brf}{\backcite{venkatesh-asplos-2010, hameed-isca-2010}{{2}{1}{section.1}}}
     24\@writefile{brf}{\backcite{}{{2}{1}{section.1}}}
    2025\citation{TR:XML}
    2126\citation{DuCharme04}
    2227\citation{TR:XML}
    23 \@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces XML Parser Technology Energy vs. Performance}}{4}{figure.1}}
    24 \newlabel{perf-energy}{{1}{4}{XML Parser Technology Energy vs. Performance\relax }{figure.1}{}}
     28\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces XML Parser Technology Energy vs. Performance\relax }}{4}{figure.caption.1}}
     29\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
     30\newlabel{perf-energy}{{1}{4}{XML Parser Technology Energy vs. Performance\relax \relax }{figure.caption.1}{}}
    2531\@writefile{toc}{\contentsline {section}{\numberline {2}Background}{4}{section.2}}
    2632\newlabel{section:background}{{2}{4}{Background\relax }{section.2}{}}
    2733\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}XML}{4}{subsection.2.1}}
     34\@writefile{brf}{\backcite{TR:XML}{{4}{2.1}{subsection.2.1}}}
     35\@writefile{brf}{\backcite{DuCharme04}{{4}{2.1}{subsection.2.1}}}
     36\@writefile{brf}{\backcite{TR:XML}{{4}{2.1}{subsection.2.1}}}
    2837\citation{Cameron2010}
    2938\citation{expat}
     
    3241\citation{ZhangPanChiu09}
    3342\citation{ZhangPanChiu09}
    34 \@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Example XML Document}}{5}{figure.2}}
    35 \newlabel{fig:sample_xml}{{2}{5}{Example XML Document\relax }{figure.2}{}}
     43\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Example XML Document\relax }}{5}{figure.caption.2}}
     44\newlabel{fig:sample_xml}{{2}{5}{Example XML Document\relax \relax }{figure.caption.2}{}}
    3645\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}Traditional XML Parsers}{5}{subsection.2.2}}
     46\@writefile{brf}{\backcite{Cameron2010}{{5}{2.2}{subsection.2.2}}}
     47\@writefile{brf}{\backcite{expat}{{5}{2.2}{subsection.2.2}}}
     48\@writefile{brf}{\backcite{xerces}{{5}{2.2}{subsection.2.2}}}
     49\@writefile{brf}{\backcite{CameronHerdyLin2008}{{5}{2.2}{subsection.2.2}}}
    3750\citation{Cameron2010}
    3851\citation{CameronHerdyLin2008}
    3952\@writefile{toc}{\contentsline {subsection}{\numberline {2.3}Parallel XML Parsing}{6}{subsection.2.3}}
     53\@writefile{brf}{\backcite{ZhangPanChiu09}{{6}{2.3}{subsection.2.3}}}
     54\@writefile{brf}{\backcite{ZhangPanChiu09}{{6}{2.3}{subsection.2.3}}}
    4055\@writefile{toc}{\contentsline {section}{\numberline {3}Parabix}{6}{section.3}}
    4156\newlabel{section:parabix}{{3}{6}{Parabix\relax }{section.3}{}}
     57\@writefile{brf}{\backcite{Cameron2010}{{6}{3}{section.3}}}
    4258\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Parabix1}{6}{subsection.3.1}}
    4359\citation{CameronHerdyLin2008,Herdy2008,Cameron2009}
    44 \@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Example 8-bit ASCII Character Basis Bit Streams}}{7}{figure.3}}
    45 \newlabel{fig:BitstreamsExample}{{3}{7}{Example 8-bit ASCII Character Basis Bit Streams\relax }{figure.3}{}}
     60\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Example 8-bit ASCII Character Basis Bit Streams\relax }}{7}{figure.caption.3}}
     61\newlabel{fig:BitstreamsExample}{{3}{7}{Example 8-bit ASCII Character Basis Bit Streams\relax \relax }{figure.caption.3}{}}
     62\@writefile{brf}{\backcite{CameronHerdyLin2008}{{7}{3.1}{figure.caption.3}}}
    4663\citation{Cameron2010}
    47 \@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Parabix1 Start Tag Validation}}{8}{figure.4}}
    48 \newlabel{fig:Parabix1StarttagExample}{{4}{8}{Parabix1 Start Tag Validation\relax }{figure.4}{}}
     64\@writefile{brf}{\backcite{CameronHerdyLin2008, Herdy2008, Cameron2009}{{8}{3.1}{figure.caption.3}}}
     65\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Parabix1 Start Tag Validation\relax }}{8}{figure.caption.4}}
     66\newlabel{fig:Parabix1StarttagExample}{{4}{8}{Parabix1 Start Tag Validation\relax \relax }{figure.caption.4}{}}
    4967\@writefile{toc}{\contentsline {subsection}{\numberline {3.2}Parabix2}{8}{subsection.3.2}}
    50 \@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Parabix2 Start Tag Validation}}{9}{figure.5}}
    51 \newlabel{fig:Parabix2StarttagExample}{{5}{9}{Parabix2 Start Tag Validation\relax }{figure.5}{}}
     68\@writefile{brf}{\backcite{Cameron2010}{{9}{3.2}{subsection.3.2}}}
     69\@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Parabix2 Start Tag Validation\relax }}{9}{figure.caption.5}}
     70\newlabel{fig:Parabix2StarttagExample}{{5}{9}{Parabix2 Start Tag Validation\relax \relax }{figure.caption.5}{}}
    5271\@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Parallel Bit Stream Compilation}{9}{subsection.3.3}}
    5372\citation{bellosa2001,bertran2010,bircher2007}
     
    6079\citation{expat}
    6180\@writefile{toc}{\contentsline {section}{\numberline {4}Methodology}{10}{section.4}}
    62 \@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics}}{11}{table.1}}
    63 \newlabel{XMLDocChars}{{1}{11}{XML Document Characteristics\relax }{table.1}{}}
     81\@writefile{brf}{\backcite{bellosa2001, bertran2010, bircher2007}{{10}{4}{section.4}}}
     82\@writefile{brf}{\backcite{bellosa2001}{{10}{4}{section.4}}}
     83\@writefile{brf}{\backcite{bircher2007, bertran2010}{{10}{4}{section.4}}}
     84\@writefile{brf}{\backcite{bellosa2001, bircher2007, bertran2010}{{10}{4}{section.4}}}
     85\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{11}{table.caption.6}}
     86\newlabel{XMLDocChars}{{1}{11}{XML Document Characteristics\relax \relax }{table.caption.6}{}}
    6487\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Parsers}{11}{subsection.4.1}}
    6588\newlabel{parsers}{{4.1}{11}{Parsers\relax }{subsection.4.1}{}}
     89\@writefile{brf}{\backcite{Parabix1}{{11}{4.1}{subsection.4.1}}}
     90\@writefile{brf}{\backcite{parabix2}{{11}{4.1}{subsection.4.1}}}
     91\@writefile{brf}{\backcite{xerces}{{11}{4.1}{subsection.4.1}}}
     92\@writefile{brf}{\backcite{expat}{{11}{4.1}{subsection.4.1}}}
    6693\@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Workloads}{11}{subsection.4.2}}
    6794\newlabel{workloads}{{4.2}{11}{Workloads\relax }{subsection.4.2}{}}
    6895\@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Platform Hardware}{12}{subsection.4.3}}
    69 \@writefile{toc}{\contentsline {paragraph}{Intel Core2{}}{12}{section*.1}}
    70 \@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Core2{}}}{12}{table.2}}
    71 \newlabel{core2info}{{2}{12}{\CO {}\relax }{table.2}{}}
    72 \@writefile{toc}{\contentsline {paragraph}{Intel Core-i3{}}{12}{section*.2}}
    73 \@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Core-i3{}}}{12}{table.3}}
    74 \newlabel{i3info}{{3}{12}{\CITHREE {}\relax }{table.3}{}}
    75 \@writefile{toc}{\contentsline {paragraph}{Intel Core-i5{}}{12}{section*.3}}
     96\@writefile{toc}{\contentsline {paragraph}{Intel Core2{}}{12}{section*.7}}
     97\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Core2{}\relax }}{12}{table.caption.8}}
     98\newlabel{core2info}{{2}{12}{\CO {}\relax \relax }{table.caption.8}{}}
     99\@writefile{toc}{\contentsline {paragraph}{Intel Core-i3{}}{12}{section*.9}}
     100\@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Core-i3{}\relax }}{12}{table.caption.10}}
     101\newlabel{i3info}{{3}{12}{\CITHREE {}\relax \relax }{table.caption.10}{}}
     102\@writefile{toc}{\contentsline {paragraph}{Intel Core-i5{}}{12}{section*.11}}
    76103\@writefile{toc}{\contentsline {subsection}{\numberline {4.4}PMC Hardware Events}{12}{subsection.4.4}}
    77104\newlabel{events}{{4.4}{12}{PMC Hardware Events\relax }{subsection.4.4}{}}
    78105\citation{clamp}
    79 \@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces SandyBridge{}}}{13}{table.4}}
    80 \newlabel{sandybridgeinfo}{{4}{13}{\SB {}\relax }{table.4}{}}
     106\@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces SandyBridge{}\relax }}{13}{table.caption.12}}
     107\newlabel{sandybridgeinfo}{{4}{13}{\SB {}\relax \relax }{table.caption.12}{}}
    81108\@writefile{toc}{\contentsline {subsection}{\numberline {4.5}Energy Measurement}{13}{subsection.4.5}}
     109\@writefile{brf}{\backcite{clamp}{{13}{4.5}{subsection.4.5}}}
    82110\@writefile{toc}{\contentsline {section}{\numberline {5}Baseline Evaluation on Core-i3{}}{13}{section.5}}
    83111\@writefile{toc}{\contentsline {subsection}{\numberline {5.1}Cache behavior}{13}{subsection.5.1}}
    84 \@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Core-i3\ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)}}{14}{figure.6}}
    85 \newlabel{corei3_L1DM}{{6}{14}{\CITHREE \ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax }{figure.6}{}}
    86 \@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Core-i3\ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)}}{14}{figure.7}}
    87 \newlabel{corei3_L2DM}{{7}{14}{\CITHREE \ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax }{figure.7}{}}
     112\@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Core-i3\ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{14}{figure.caption.13}}
     113\newlabel{corei3_L1DM}{{6}{14}{\CITHREE \ --- L1 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.13}{}}
     114\@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Core-i3\ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax }}{14}{figure.caption.14}}
     115\newlabel{corei3_L2DM}{{7}{14}{\CITHREE \ --- L2 Data Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.14}{}}
    88116\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Branch Mispredictions}{14}{subsection.5.2}}
    89 \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Core-i3\ --- L3 Cache Misses (y-axis: Cache Misses per kB)}}{15}{figure.8}}
    90 \newlabel{corei3_L3TM}{{8}{15}{\CITHREE \ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax }{figure.8}{}}
     117\@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Core-i3\ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax }}{15}{figure.caption.15}}
     118\newlabel{corei3_L3TM}{{8}{15}{\CITHREE \ --- L3 Cache Misses (y-axis: Cache Misses per kB)\relax \relax }{figure.caption.15}{}}
    91119\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}SIMD Instructions vs. Total Instructions}{15}{subsection.5.3}}
    92120\citation{Cameron2008}
    93 \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Core-i3\ --- Branch Instructions (y-axis: Branches per kB)}}{16}{figure.9}}
    94 \newlabel{corei3_BR}{{9}{16}{\CITHREE \ --- Branch Instructions (y-axis: Branches per kB)\relax }{figure.9}{}}
    95 \@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Core-i3\ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)}}{16}{figure.10}}
    96 \newlabel{corei3_BM}{{10}{16}{\CITHREE \ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax }{figure.10}{}}
    97 \@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions}}{17}{figure.11}}
    98 \newlabel{corei3_INS_p1}{{11}{17}{Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax }{figure.11}{}}
    99 \@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)}}{17}{figure.12}}
    100 \newlabel{corei3_INS_p2}{{12}{17}{Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax }{figure.12}{}}
     121\@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Core-i3\ --- Branch Instructions (y-axis: Branches per kB)\relax }}{16}{figure.caption.16}}
     122\newlabel{corei3_BR}{{9}{16}{\CITHREE \ --- Branch Instructions (y-axis: Branches per kB)\relax \relax }{figure.caption.16}{}}
     123\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Core-i3\ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax }}{16}{figure.caption.17}}
     124\newlabel{corei3_BM}{{10}{16}{\CITHREE \ --- Branch Mispredictions (y-axis: Branch Mispredictions per kB)\relax \relax }{figure.caption.17}{}}
     125\@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax }}{17}{figure.caption.18}}
     126\newlabel{corei3_INS_p1}{{11}{17}{Parabix1 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions\relax \relax }{figure.caption.18}{}}
     127\@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax }}{17}{figure.caption.19}}
     128\newlabel{corei3_INS_p2}{{12}{17}{Parabix2 --- SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)\relax \relax }{figure.caption.19}{}}
    101129\@writefile{toc}{\contentsline {subsection}{\numberline {5.4}CPU Cycles}{17}{subsection.5.4}}
    102 \@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Core-i3\ --- Performance (y-axis: CPU Cycles per kB)}}{18}{figure.13}}
    103 \newlabel{corei3_TOT}{{13}{18}{\CITHREE \ --- Performance (y-axis: CPU Cycles per kB)\relax }{figure.13}{}}
    104 \@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Core-i3\ --- Average Power Consumption (watts)}}{18}{figure.14}}
    105 \newlabel{corei3_power}{{14}{18}{\CITHREE \ --- Average Power Consumption (watts)\relax }{figure.14}{}}
     130\@writefile{brf}{\backcite{Cameron2008}{{17}{5.4}{subsection.5.4}}}
     131\@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Core-i3\ --- Performance (y-axis: CPU Cycles per kB)\relax }}{18}{figure.caption.20}}
     132\newlabel{corei3_TOT}{{13}{18}{\CITHREE \ --- Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.20}{}}
     133\@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Core-i3\ --- Average Power Consumption (watts)\relax }}{18}{figure.caption.21}}
     134\newlabel{corei3_power}{{14}{18}{\CITHREE \ --- Average Power Consumption (watts)\relax \relax }{figure.caption.21}{}}
    106135\@writefile{toc}{\contentsline {subsection}{\numberline {5.5}Power and Energy}{18}{subsection.5.5}}
    107 \@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Core-i3\ --- Energy Consumption ($\mu $J per kB)}}{19}{figure.15}}
    108 \newlabel{corei3_energy}{{15}{19}{\CITHREE \ --- Energy Consumption ($\mu $J per kB)\relax }{figure.15}{}}
     136\@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Core-i3\ --- Energy Consumption ($\mu $J per kB)\relax }}{19}{figure.caption.22}}
     137\newlabel{corei3_energy}{{15}{19}{\CITHREE \ --- Energy Consumption ($\mu $J per kB)\relax \relax }{figure.caption.22}{}}
    109138\@writefile{toc}{\contentsline {section}{\numberline {6}Scalability}{19}{section.6}}
    110139\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}Performance}{19}{subsection.6.1}}
    111 \@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)}}{20}{figure.16}}
    112 \newlabel{Scalability}{{16}{20}{Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax }{figure.16}{}}
    113 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Parabix2}}}{20}{figure.16}}
    114 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Expat}}}{20}{figure.16}}
    115 \@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Average Power of Parabix2 (watts)}}{20}{figure.17}}
    116 \newlabel{power_Parabix2}{{17}{20}{Average Power of Parabix2 (watts)\relax }{figure.17}{}}
     140\@writefile{lof}{\contentsline {figure}{\numberline {16}{\ignorespaces Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax }}{20}{figure.caption.23}}
     141\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Parabix2}}}{20}{figure.caption.23}}
     142\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Expat}}}{20}{figure.caption.23}}
     143\newlabel{Scalability}{{16}{20}{Average Performance Parabix vs. Expat (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.23}{}}
     144\@writefile{lof}{\contentsline {figure}{\numberline {17}{\ignorespaces Average Power of Parabix2 (watts)\relax }}{20}{figure.caption.24}}
     145\newlabel{power_Parabix2}{{17}{20}{Average Power of Parabix2 (watts)\relax \relax }{figure.caption.24}{}}
    117146\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Power and Energy}{20}{subsection.6.2}}
    118 \@writefile{lof}{\contentsline {figure}{\numberline {18}{\ignorespaces Energy consumption of Parabix2 (nJ/B)}}{21}{figure.18}}
    119 \newlabel{energy_Parabix2}{{18}{21}{Energy consumption of Parabix2 (nJ/B)\relax }{figure.18}{}}
    120 \@writefile{lof}{\contentsline {figure}{\numberline {19}{\ignorespaces Parabix2 Instruction Counts (y-axis: Instructions per kB)}}{21}{figure.19}}
    121 \newlabel{insmix}{{19}{21}{Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax }{figure.19}{}}
     147\@writefile{lof}{\contentsline {figure}{\numberline {18}{\ignorespaces Energy consumption of Parabix2 (nJ/B)\relax }}{21}{figure.caption.25}}
     148\newlabel{energy_Parabix2}{{18}{21}{Energy consumption of Parabix2 (nJ/B)\relax \relax }{figure.caption.25}{}}
     149\@writefile{lof}{\contentsline {figure}{\numberline {19}{\ignorespaces Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax }}{21}{figure.caption.26}}
     150\newlabel{insmix}{{19}{21}{Parabix2 Instruction Counts (y-axis: Instructions per kB)\relax \relax }{figure.caption.26}{}}
    122151\@writefile{toc}{\contentsline {section}{\numberline {7}Scaling Parabix2 for AVX}{21}{section.7}}
    123152\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Three Operand Form}{21}{subsection.7.1}}
    124 \@writefile{lof}{\contentsline {figure}{\numberline {20}{\ignorespaces Parabix2 Performance (y-axis: CPU Cycles per kB)}}{22}{figure.20}}
    125 \newlabel{avx}{{20}{22}{Parabix2 Performance (y-axis: CPU Cycles per kB)\relax }{figure.20}{}}
     153\@writefile{lof}{\contentsline {figure}{\numberline {20}{\ignorespaces Parabix2 Performance (y-axis: CPU Cycles per kB)\relax }}{22}{figure.caption.27}}
     154\newlabel{avx}{{20}{22}{Parabix2 Performance (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.27}{}}
    126155\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}256-bit AVX Operations}{22}{subsection.7.2}}
    127156\@writefile{toc}{\contentsline {subsection}{\numberline {7.3}Performance Results}{22}{subsection.7.3}}
    128157\@writefile{toc}{\contentsline {section}{\numberline {8}Parabix2 on GT-P1000M}{24}{section.8}}
    129158\@writefile{toc}{\contentsline {subsection}{\numberline {8.1}Platform Hardware}{24}{subsection.8.1}}
    130 \@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces GT-P1000M}}{24}{table.5}}
    131 \newlabel{arminfo}{{5}{24}{GT-P1000M\relax }{table.5}{}}
     159\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces GT-P1000M\relax }}{24}{table.caption.28}}
     160\newlabel{arminfo}{{5}{24}{GT-P1000M\relax \relax }{table.caption.28}{}}
    132161\@writefile{toc}{\contentsline {subsection}{\numberline {8.2}Performance Results}{24}{subsection.8.2}}
    133 \@writefile{lof}{\contentsline {figure}{\numberline {21}{\ignorespaces Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)}}{25}{figure.21}}
    134 \newlabel{arm_processing_time}{{21}{25}{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax }{figure.21}{}}
    135 \@writefile{lof}{\contentsline {figure}{\numberline {22}{\ignorespaces Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. Core-i3{} }}{26}{figure.22}}
    136 \newlabel{relative_performance_arm_vs_i3}{{22}{26}{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE {} \relax }{figure.22}{}}
    137 \@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Relationship between Each Pass and Data Structures}}{26}{table.6}}
    138 \newlabel{pass_structure}{{6}{26}{Relationship between Each Pass and Data Structures\relax }{table.6}{}}
    139 \@writefile{lof}{\contentsline {figure}{\numberline {23}{\ignorespaces Processing Time (y axis: CPU cycles per byte)}}{26}{figure.23}}
    140 \newlabel{perf}{{23}{26}{Processing Time (y axis: CPU cycles per byte)\relax }{figure.23}{}}
     162\@writefile{lof}{\contentsline {figure}{\numberline {21}{\ignorespaces Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax }}{25}{figure.caption.29}}
     163\newlabel{arm_processing_time}{{21}{25}{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)\relax \relax }{figure.caption.29}{}}
     164\citation{dataparallel}
     165\citation{Shah:2009}
     166\@writefile{lof}{\contentsline {figure}{\numberline {22}{\ignorespaces Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. Core-i3{} \relax }}{26}{figure.caption.30}}
     167\newlabel{relative_performance_arm_vs_i3}{{22}{26}{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE {} \relax \relax }{figure.caption.30}{}}
     168\@writefile{toc}{\contentsline {section}{\numberline {9}Multi-threaded Parabix}{26}{section.9}}
     169\@writefile{brf}{\backcite{dataparallel}{{26}{9}{section.9}}}
     170\@writefile{brf}{\backcite{Shah:2009}{{26}{9}{section.9}}}
     171\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Relationship between Each Pass and Data Structures\relax }}{27}{table.caption.31}}
     172\newlabel{pass_structure}{{6}{27}{Relationship between Each Pass and Data Structures\relax \relax }{table.caption.31}{}}
     173\@writefile{lof}{\contentsline {figure}{\numberline {23}{\ignorespaces Processing Time (y axis: CPU cycles per byte)\relax }}{27}{figure.caption.32}}
     174\newlabel{multithread_perf}{{23}{27}{Processing Time (y axis: CPU cycles per byte)\relax \relax }{figure.caption.32}{}}
     175\@writefile{lof}{\contentsline {figure}{\numberline {24}{\ignorespaces Average Power (watts)\relax }}{28}{figure.caption.33}}
     176\newlabel{power}{{24}{28}{Average Power (watts)\relax \relax }{figure.caption.33}{}}
     177\@writefile{lof}{\contentsline {figure}{\numberline {25}{\ignorespaces Energy Consumption (nJ per byte)\relax }}{28}{figure.caption.34}}
     178\newlabel{energy}{{25}{28}{Energy Consumption (nJ per byte)\relax \relax }{figure.caption.34}{}}
     179\@writefile{toc}{\contentsline {section}{\numberline {10}Conclusion}{28}{section.10}}
    141180\bibstyle{abbrv}
    142181\bibdata{reference}
    143182\bibcite{bellosa2001}{1}
    144183\bibcite{bertran2010}{2}
    145 \@writefile{lof}{\contentsline {figure}{\numberline {24}{\ignorespaces Energy vs. Performance (x axis: bytes per cycle, y axis: nJ per byte)}}{27}{figure.24}}
    146 \newlabel{perf_energy}{{24}{27}{Energy vs. Performance (x axis: bytes per cycle, y axis: nJ per byte)\relax }{figure.24}{}}
    147 \@writefile{toc}{\contentsline {section}{\numberline {9}Multi-threaded Parabix}{27}{section.9}}
    148 \@writefile{toc}{\contentsline {section}{\numberline {10}Conclusion}{27}{section.10}}
    149184\bibcite{bircher2007}{3}
    150185\bibcite{blake-isca-2010}{4}
     
    163198\bibcite{hameed-isca-2010}{17}
    164199\bibcite{Herdy2008}{18}
    165 \bibcite{venkatesh-asplos-2010}{19}
    166 \bibcite{ZhangPanChiu09}{20}
     200\bibcite{dataparallel}{19}
     201\bibcite{Shah:2009}{20}
     202\bibcite{venkatesh-asplos-2010}{21}
     203\bibcite{ZhangPanChiu09}{22}
  • docs/HPCA2012/main.bbl

    r1326 r1329  
    110110\newblock In {\em Proceedings of {SVG} Open 2008}, August 2008.
    111111
     112\bibitem{dataparallel}
     113W.~Lu, Y.~Pan, , and K.~Chiu.
     114\newblock A parallel approach to xml parsing.
     115\newblock {\em The 7th IEEE/ACM International Conference on Grid Computing},
     116  2006.
     117
     118\bibitem{Shah:2009}
     119B.~Shah, P.~R. Rao, B.~Moon, and M.~Rajagopalan.
     120\newblock A data parallel algorithm for xml dom parsing.
     121\newblock In {\em Proceedings of the 6th International XML Database Symposium
     122  on Database and XML Technologies}, XSym '09, pages 75--90, Berlin,
     123  Heidelberg, 2009. Springer-Verlag.
     124
    112125\bibitem{venkatesh-asplos-2010}
    113126G.~Venkatesh, J.~Sampson, N.~Goulding, S.~Garcia, V.~Bryksin, J.~Lugo-Martinez,
  • docs/HPCA2012/reference.bib

    r1326 r1329  
    511511 year = {2011}
    512512}
     513
     514@inproceedings{Shah:2009,
     515 author = {Shah, Bhavik and Rao, Praveen R. and Moon, Bongki and Rajagopalan, Mohan},
     516 title = {A Data Parallel Algorithm for XML DOM Parsing},
     517 booktitle = {Proceedings of the 6th International XML Database Symposium on Database and XML Technologies},
     518 series = {XSym '09},
     519 year = {2009},
     520 isbn = {978-3-642-03554-8},
     521 location = {Lyon, France},
     522 pages = {75--90},
     523 numpages = {16},
     524 acmid = {1616486},
     525 publisher = {Springer-Verlag},
     526 address = {Berlin, Heidelberg},
     527}
     528
     529@article{dataparallel,
     530 author = {Wei Lu and Yinfei Pan and and Kenneth Chiu},
     531 title = {A Parallel Approach to XML Parsing},
     532 journal = {The 7th IEEE/ACM International Conference on Grid Computing},
     533 year = {2006}
     534 }
Note: See TracChangeset for help on using the changeset viewer.