Changeset 1775 for docs/HPCA2012


Ignore:
Timestamp:
Dec 13, 2011, 5:37:36 PM (7 years ago)
Author:
cameron
Message:

Minor fixes; figure placement

Location:
docs/HPCA2012/final_ieee
Files:
7 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/final_ieee/03-research.tex

    r1774 r1775  
    137137Unlike the single-cursor approach of traditional text parsers, the marking of multiple lexical items allows Parabix to process multiple items in parallel.
    138138Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness
    139 issues found during the parsing process. A $\tt 1$ bit in an error stream indicates the precense of a potential error that may require further
     139issues found during the parsing process. A $\tt 1$ bit in an error stream indicates the presence of a potential error that may require further
    140140processing to determine cause and severity.
    141141
  • docs/HPCA2012/final_ieee/05-corei3.tex

    r1774 r1775  
    163163\subsection{Performance and Energy Characteristics}
    164164
     165\begin{figure*}[!htbp]
     166\begin{center}
     167\subfigure[Performance (CPU Cycles per kB)]{
     168\includegraphics[width=0.45\textwidth]{plots/corei3_TOT.pdf}
     169\label{corei3_TOT}
     170}
     171\subfigure[Energy Consumption ($\mu$J per kB)]{
     172\includegraphics[width=0.45\textwidth]{plots/corei3_energy.pdf}
     173\label{corei3_energy}
     174}
     175\caption{Performance and Energy profile of Parabix on Core i3}
     176\end{center}
     177\end{figure*}
     178
    165179Figure \ref{corei3_TOT} shows overall parser performance in
    166180terms of CPU cycles per kB. Parabix-XML  is 2.5
     
    188202 is significantly lower resulting in an overall improvement in energy.
    189203
    190 \begin{figure*}[!htbp]
    191 \begin{center}
    192 \subfigure[Performance (CPU Cycles per kB)]{
    193 \includegraphics[width=0.45\textwidth]{plots/corei3_TOT.pdf}
    194 \label{corei3_TOT}
    195 }
    196 \subfigure[Energy Consumption ($\mu$J per kB)]{
    197 \includegraphics[width=0.45\textwidth]{plots/corei3_energy.pdf}
    198 \label{corei3_energy}
    199 }
    200 \caption{Performance and Energy profile of Parabix on Core i3}
    201 \end{center}
    202 \end{figure*}
    203 
    204 
     204
     205
  • docs/HPCA2012/final_ieee/06-scalability.tex

    r1774 r1775  
    1010processing'' operations on the source input.
    1111
    12 Our results demonstrate that Parabix-XML's optimizations complement
    13 newer hardware improvements. For bit stream processing,
    14 \CITHREE{} has a 40\% performance increase over \CO{};
    15 similarly, \SB{} has a 20\% improvement compared to
    16 \CITHREE{}. These gains appear independent of the markup.
    17 Postprocessing operations
    18 demonstrate data dependent variance. Performance on the \CITHREE{} increases by
    19 27\%--40\% compared to \CO{} whereas \SB{} increases by 16\%--29\%
    20 compared to \CITHREE{}.
    21 \CITHREE\ improves performance only by 29\% over \CO\ while \SB\
    22 improves performance by less than 6\% over \CITHREE{}. Note that the
    23 gains of \CITHREE\ over \CO\ includes an improvement both in clock
    24 frequency and microarchitecture while \SB{}'s gains are mainly attributed to the architecture.
    25 Figure \ref{Parabix_all_platform} also shows the average power consumption of
    26 Parabix-XML over each workload and as executed on each of the processors:
    27 \CO{}, \CITHREE\ and \SB{}.  Each generation of processor appears to bring a 25--30\% improvement
    28 in power consumption over the previous generation. Parabix-XML on \SB\ consumes 72\%--75\% less energy than it did on \CO{}.
    2912
    30 \begin{figure}[!htb]
     13\begin{figure}[htb]
    3114\begin{center}
    3215{
     
    6144\end{figure*}
    6245
     46Our results demonstrate that Parabix-XML's optimizations complement
     47newer hardware improvements. For bit stream processing,
     48\CITHREE{} has a 40\% performance increase over \CO{};
     49similarly, \SB{} has a 20\% improvement compared to
     50\CITHREE{}. These gains appear independent of the markup.
     51Postprocessing operations
     52demonstrate data dependent variance. Performance on the \CITHREE{} increases by
     5327\%--40\% compared to \CO{} whereas \SB{} increases by 16\%--29\%
     54compared to \CITHREE{}.
     55\CITHREE\ improves performance only by 29\% over \CO\ while \SB\
     56improves performance by less than 6\% over \CITHREE{}. Note that the
     57gains of \CITHREE\ over \CO\ includes an improvement both in clock
     58frequency and microarchitecture while \SB{}'s gains are mainly attributed to the architecture.
     59Figure \ref{Parabix_all_platform} also shows the average power consumption of
     60Parabix-XML over each workload and as executed on each of the processors:
     61\CO{}, \CITHREE\ and \SB{}.  Each generation of processor appears to bring a 25--30\% improvement
     62in power consumption over the previous generation. Parabix-XML on \SB\ consumes 72\%--75\% less energy than it did on \CO{}.
    6363
    6464\def\CORTEXA8{Cortex-A8}
  • docs/HPCA2012/final_ieee/11-conclusions.tex

    r1774 r1775  
    1010% Future research
    1111
    12 In this paper we presented Parabix, a software runtime framework for
     12This paper presents Parabix as a software runtime framework for
    1313exploiting SIMD data units found on commodity processors for text
    1414processing.  The Parabix framework allows programmers to focus on exposing the
     
    1616abstract SIMD machine without worrying about or having to change code
    1717to handle processor specifics (e.g., 128-bit SIMD SSE vs 256-bit SIMD
    18 on AVX). We applied Parabix technology to a widely deployed
    19 application, XML parsing and demonstrate the efficiency gains that can
     18on AVX). Parabix technology was applied to XML parsing
     19to demonstrate the efficiency gains that can
    2020be obtained on commodity processors. Compared to the conventional XML
    21 parsers, Expat and Xerces, we achieve 2$\times$---7$\times$
     21parsers, Expat and Xerces, a 2$\times$---7$\times$
    2222improvement in performance and average 4$\times$ improvement in
    23 energy. We achieve high compute efficiency with an overall 9$\times$---15$\times$
    24 reduction in branches, 7$\times$---15$\times$ reduction in branch mispredictions,
    25 % ?\times$ reduction in LLC misses, and increase in data parallelism
    26 and process up to 128 characters with a single operation. We used the
    27 Parabix framework and XML parsers to study the features of the new 256-bit
    28 AVX extension in Intel processors. We find that while the move to
    29 3-operand instructions deliver significant benefit the wider
    30 operations in some cases have higher overheads compared to the
    31 existing 128-bit SSE operations. We also compare Intel's SIMD
    32 extensions against the ARM \NEON{}. Note that Parabix allowed us to
     23energy was achieved. Furthermore, computational efficiency was
     24greatly increased, with an overall 9$\times$---15$\times$
     25reduction in branches and 7$\times$---15$\times$ reduction in branch mispredictions.
     26
     27The Parabix framework and XML parsers was also used to study the
     28features of the new 256-bit AVX extension in Intel processors.  While the move to
     293-operand instructions delivers significant benefits, the
     30advantage of loads and bitwise logic with 256 bits at a time was
     31negated by the need to convert to 128 bit SIMD registers for
     32integer operations.  We expect this will be remedied with AVX2.
     33Intel's SIMD
     34extensions were also compared with the ARM \NEON{}. Note that Parabix allowed us to
    3335perform these studies without having to change the application source.
    34 Finally, we parallelized the Parabix XML parser to take advantage of
    35 the SIMD units in every core on the chip. We demonstrate that the
     36Finally, the Parabix XML parser was parallelized
     37to take advantage of the SIMD units in every core on the chip, demonstrating that the
    3638benefits of thread-level-parallelism are complementary to the
    37 fine-grain parallelism we exploit; parallelized Parabix achieves a
     39fine-grain parallelism we exploit.   In this study, our parallelized Parabix achieves a
    3840further 2$\times$ improvement in performance.
    3941
  • docs/HPCA2012/final_ieee/final.aux

    r1774 r1775  
    6666\newlabel{corei3_INS_p2}{{4}{7}}
    6767\@writefile{toc}{\contentsline {subsection}{\numberline {6.4}Performance and Energy Characteristics}{7}}
     68\newlabel{corei3_TOT}{{9(a)}{8}}
     69\newlabel{sub@corei3_TOT}{{(a)}{8}}
     70\newlabel{corei3_energy}{{9(b)}{8}}
     71\newlabel{sub@corei3_energy}{{(b)}{8}}
     72\@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Performance and Energy profile of Parabix on Core i3\relax }}{8}}
     73\@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Performance (CPU Cycles per kB)}}}{8}}
     74\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Energy Consumption ($\mu $J per kB)}}}{8}}
    6875\@writefile{toc}{\contentsline {section}{\numberline {7}Parabix on different platforms}{8}}
    6976\newlabel{section:scalability}{{7}{8}}
     
    7481\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Parabix on Mobile Processors}{8}}
    7582\newlabel{section:scalability:Neon{}}{{7.2}{8}}
    76 \@writefile{toc}{\contentsline {section}{\numberline {8}Parabix on AVX}{8}}
    77 \newlabel{section:avx}{{8}{8}}
    78 \newlabel{corei3_TOT}{{9(a)}{9}}
    79 \newlabel{sub@corei3_TOT}{{(a)}{9}}
    80 \newlabel{corei3_energy}{{9(b)}{9}}
    81 \newlabel{sub@corei3_energy}{{(b)}{9}}
    82 \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Performance and Energy profile of Parabix on Core i3\relax }}{9}}
    83 \@writefile{lof}{\contentsline {subfigure}{\numberline{(a)}{\ignorespaces {Performance (CPU Cycles per kB)}}}{9}}
    84 \@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {Energy Consumption ($\mu $J per kB)}}}{9}}
    8583\newlabel{arm_processing_time}{{11(a)}{9}}
    8684\newlabel{sub@arm_processing_time}{{(a)}{9}}
     
    9391\@writefile{lof}{\contentsline {subfigure}{\numberline{(b)}{\ignorespaces {ARM Neon}}}{9}}
    9492\@writefile{lof}{\contentsline {subfigure}{\numberline{(c)}{\ignorespaces {Core i3}}}{9}}
    95 \@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix Instruction Counts (y-axis: Instructions per kB)\relax }}{9}}
    96 \newlabel{insmix}{{12}{9}}
    97 \@writefile{toc}{\contentsline {subsection}{\numberline {8.1}3-Operand Form}{10}}
    98 \@writefile{toc}{\contentsline {subsection}{\numberline {8.2}256-bit Operations}{10}}
    99 \@writefile{toc}{\contentsline {subsection}{\numberline {8.3}Performance Results}{10}}
     93\@writefile{toc}{\contentsline {section}{\numberline {8}Parabix on AVX}{9}}
     94\newlabel{section:avx}{{8}{9}}
     95\@writefile{toc}{\contentsline {subsection}{\numberline {8.1}3-Operand Form}{9}}
     96\@writefile{toc}{\contentsline {subsection}{\numberline {8.2}256-bit Operations}{9}}
     97\@writefile{toc}{\contentsline {subsection}{\numberline {8.3}Performance Results}{9}}
     98\citation{dataparallel}
     99\citation{Shah:2009}
     100\@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Parabix Instruction Counts (y-axis: Instructions per kB)\relax }}{10}}
     101\newlabel{insmix}{{12}{10}}
     102\@writefile{toc}{\contentsline {section}{\numberline {9}Multithreaded Parabix}{10}}
     103\newlabel{section:multithread}{{9}{10}}
    100104\@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Parabix Performance (y-axis: ns per kB)\relax }}{10}}
    101105\newlabel{avx}{{13}{10}}
    102 \@writefile{toc}{\contentsline {section}{\numberline {9}Multithreaded Parabix}{10}}
    103 \newlabel{section:multithread}{{9}{10}}
    104 \citation{dataparallel}
    105 \citation{Shah:2009}
     106\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Stage Division\relax }}{10}}
     107\newlabel{pass_structure}{{5}{10}}
    106108\citation{DaiNiZhu2010}
    107109\citation{NicolaJohn03}
     
    115117\citation{cameron-EuroPar2011}
    116118\citation{CameronLin2009}
    117 \@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Stage Division\relax }}{11}}
    118 \newlabel{pass_structure}{{5}{11}}
     119\@writefile{toc}{\contentsline {section}{\numberline {10}Related Work}{11}}
     120\newlabel{section:related}{{10}{11}}
    119121\@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Average Statistic of Multithreaded Parabix\relax }}{11}}
    120122\newlabel{multithread_perf}{{14}{11}}
    121 \@writefile{toc}{\contentsline {section}{\numberline {10}Related Work}{11}}
    122 \newlabel{section:related}{{10}{11}}
     123\@writefile{toc}{\contentsline {section}{\numberline {11}Conclusion}{11}}
     124\newlabel{section:conclusion}{{11}{11}}
    123125\bibstyle{ieee/latex8}
    124126\bibdata{reference}
     
    142144\bibcite{NicolaJohn03}{18}
    143145\bibcite{JMBE:31@99}{19}
    144 \@writefile{toc}{\contentsline {section}{\numberline {11}Conclusion}{12}}
    145 \newlabel{section:conclusion}{{11}{12}}
    146146\bibcite{ParaDOM2009}{20}
    147147\bibcite{Shah:2009}{21}
  • docs/HPCA2012/final_ieee/final.log

    r1774 r1775  
    1 This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.4.5)  13 DEC 2011 16:49
     1This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.5.12)  13 DEC 2011 17:34
    22entering extended mode
    33 %&-line parsing enabled.
    4 **final.tex
     4**final
    55(./final.tex
    66LaTeX2e <2009/09/24>
    77Babel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh
    8 yphenation, loaded.
     8yphenation, farsi, arabic, croatian, bulgarian, ukrainian, russian, czech, slov
     9ak, danish, dutch, finnish, french, basque, ngerman, german, german-x-2009-06-1
     109, ngerman-x-2009-06-19, ibycus, monogreek, greek, ancientgreek, hungarian, san
     11skrit, italian, latin, latvian, lithuanian, mongolian2a, mongolian, bokmal, nyn
     12orsk, romanian, irish, coptic, serbian, turkish, welsh, esperanto, uppersorbian
     13, estonian, indonesian, interlingua, icelandic, kurmanji, slovenian, polish, po
     14rtuguese, spanish, galician, catalan, swedish, ukenglish, pinyin, loaded.
    915(./preamble-final-ieee.tex (/usr/share/texmf-texlive/tex/latex/base/article.cls
    1016Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
     
    474480 []
    475481
    476 [7 <./plots/corei3_BM.pdf>]
    477 <plots/corei3_TOT.pdf, id=88, 457.71pt x 209.78375pt>
     482<plots/corei3_TOT.pdf, id=70, 457.71pt x 209.78375pt>
    478483File: plots/corei3_TOT.pdf Graphic file (type pdf)
    479484
    480485<use plots/corei3_TOT.pdf>
    481 <plots/corei3_energy.pdf, id=90, 454.69875pt x 203.76125pt>
     486<plots/corei3_energy.pdf, id=72, 454.69875pt x 203.76125pt>
    482487File: plots/corei3_energy.pdf Graphic file (type pdf)
    483488
    484 <use plots/corei3_energy.pdf>) (./06-scalability.tex
     489<use plots/corei3_energy.pdf> [7 <./plots/corei3_BM.pdf>])
     490(./06-scalability.tex
    485491<plots/Parabix2_all_platform.pdf, id=92, 432.61626pt x 263.98625pt>
    486492File: plots/Parabix2_all_platform.pdf Graphic file (type pdf)
    487493
    488494<use plots/Parabix2_all_platform.pdf>
    489 Overfull \hbox (7.22688pt too wide) in paragraph at lines 33--35
     495Overfull \hbox (7.22688pt too wide) in paragraph at lines 16--18
    490496 []
    491497 []
     
    507513 []
    508514
    509 <plots/InsMix.pdf, id=99, 744.7825pt x 261.97874pt>
     515[8 <./plots/corei3_TOT.pdf> <./plots/corei3_energy.pdf> <./plots/Parabix2_all_p
     516latform.pdf>] <plots/InsMix.pdf, id=155, 744.7825pt x 261.97874pt>
    510517File: plots/InsMix.pdf Graphic file (type pdf)
    511  <use plots/InsMix.pdf>)
    512 (./07-avx.tex [8 <./plots/Parabix2_all_platform.pdf>] [9 <./plots/corei3_TOT.pd
    513 f> <./plots/corei3_energy.pdf> <./plots/arm_TOT.pdf> <./plots/Markup_density_Ar
    514 m.pdf> <./plots/Markup_density_Intel.pdf> <./plots/InsMix.pdf>]
    515 <plots/avx.pdf, id=200, 424.58624pt x 212.795pt>
     518
     519<use plots/InsMix.pdf>) (./07-avx.tex [9 <./plots/arm_TOT.pdf> <./plots/Markup_
     520density_Arm.pdf> <./plots/Markup_density_Intel.pdf>]
     521<plots/avx.pdf, id=186, 424.58624pt x 212.795pt>
    516522File: plots/avx.pdf Graphic file (type pdf)
    517523 <use plots/avx.pdf>
     
    520526 []
    521527
    522 ) (./09-pipeline.tex [10 <./plots/avx.pdf>]
     528) (./09-pipeline.tex [10 <./plots/InsMix.pdf> <./plots/avx.pdf>]
    523529Underfull \hbox (badness 1072) in paragraph at lines 75--84
    524530[]\OT1/ptm/m/n/10 Figure 14[] demon-strates the per-for-mance im-prove-ment
     
    532538 []
    533539
    534 ) (./10-related.tex [11 <./plots/pipeline.pdf>]) (./11-conclusions.tex)
     540) (./10-related.tex) (./11-conclusions.tex [11 <./plots/pipeline.pdf>])
    535541(./final.bbl
    536542Underfull \hbox (badness 1137) in paragraph at lines 17--22
     
    553559 []
    554560
    555 [12]
    556561Missing character: There is no à in font ptmr7t!
    557562Missing character: There is no š in font ptmr7t!
    558 ) [13
    559 
    560 ] (./final.aux) )
     563) [12] (./final.aux) )
    561564Here is how much of TeX's memory you used:
    562  3934 strings out of 495061
    563  54935 string characters out of 1182622
    564  118305 words of memory out of 3000000
    565  6940 multiletter control sequences out of 15000+50000
     565 3934 strings out of 493848
     566 54935 string characters out of 1152823
     567 119286 words of memory out of 3000000
     568 7039 multiletter control sequences out of 15000+50000
    566569 69892 words of font info for 168 fonts, out of 3000000 for 9000
    567  31 hyphenation exceptions out of 8191
    568  38i,12n,38p,1456b,370s stack positions out of 5000i,500n,10000p,200000b,50000s
    569 {/usr/share/texmf-texlive/fonts/enc/dvips/base/8r.enc}</usr/sh
    570 are/texmf-texlive/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/share/texmf-t
    571 exlive/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/share/texmf-texlive/fonts
    572 /type1/public/amsfonts/cm/cmsy10.pfb></usr/share/texmf-texlive/fonts/type1/publ
    573 ic/amsfonts/cm/cmtt10.pfb></usr/share/texmf-texlive/fonts/type1/public/amsfonts
    574 /cm/cmtt8.pfb></usr/share/texmf-texlive/fonts/type1/urw/courier/ucrb8a.pfb></us
    575 r/share/texmf-texlive/fonts/type1/urw/courier/ucrr8a.pfb></usr/share/texmf-texl
    576 ive/fonts/type1/urw/symbol/usyr.pfb></usr/share/texmf-texlive/fonts/type1/urw/s
    577 ymbol/usyr.pfb></usr/share/texmf-texlive/fonts/type1/urw/times/utmb8a.pfb></usr
    578 /share/texmf-texlive/fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-texlive
    579 /fonts/type1/urw/times/utmri8a.pfb>
    580 Output written on final.pdf (13 pages, 518284 bytes).
     570 717 hyphenation exceptions out of 8191
     571 38i,12n,38p,1452b,370s stack positions out of 5000i,500n,10000p,200000b,50000s
     572{/usr/share/texmf-texlive/fonts/enc/dvips/base/8r.enc}</u
     573sr/share/texmf-texlive/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/share/te
     574xmf-texlive/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/share/texmf-texlive/
     575fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/share/texmf-texlive/fonts/type1
     576/public/amsfonts/cm/cmtt10.pfb></usr/share/texmf-texlive/fonts/type1/public/ams
     577fonts/cm/cmtt8.pfb></usr/share/texmf-texlive/fonts/type1/urw/courier/ucrb8a.pfb
     578></usr/share/texmf-texlive/fonts/type1/urw/courier/ucrr8a.pfb></usr/share/texmf
     579-texlive/fonts/type1/urw/symbol/usyr.pfb></usr/share/texmf-texlive/fonts/type1/
     580urw/symbol/usyr.pfb></usr/share/texmf-texlive/fonts/type1/urw/times/utmb8a.pfb>
     581</usr/share/texmf-texlive/fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-te
     582xlive/fonts/type1/urw/times/utmri8a.pfb>
     583Output written on final.pdf (12 pages, 518018 bytes).
    581584PDF statistics:
    582  279 PDF objects out of 1000 (max. 8388607)
     585 275 PDF objects out of 1000 (max. 8388607)
    583586 0 named destinations out of 1000 (max. 500000)
    584587 61 words of extra memory for PDF output out of 10000 (max. 10000000)
Note: See TracChangeset for help on using the changeset viewer.