Changeset 1768 for docs/HPCA2012


Ignore:
Timestamp:
Dec 12, 2011, 3:33:33 PM (7 years ago)
Author:
ashriram
Message:

Version sent to Martha

Location:
docs/HPCA2012/final_ieee
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/final_ieee/01-intro.tex

    r1752 r1768  
    4242
    4343Parallel bit stream (Parabix) technology is a promising new approach
    44 for high performance text processing taking advantage of the SIMD
    45 capabilities of commodity processors.   Based on the transposition
    46 of byte-oriented character data into parallel bit streams each
    47 with one bit per input byte,
    48 first-generation Parabix technology has been
    49 applied to accelerate UTF-8 to UTF-16 transcoding \cite{Cameron2008} as
    50 well as exact string matching in protein identification \cite{JMBE:31@99}.
    51 It has also been applied to the problem of XML parsing using
    52 a traditional recursive-descent parser accelerated with
    53 sequential bit scans.  Most recently, the foundation of
    54 second-generation Parabix technology and the toolchain
    55 described in this paper has been established with the
    56 introduction of a parallel scanning primitive to replace
    57 sequential bit scans \cite{cameron-EuroPar2011}.
     44for high performance text processing. The key insight is based on the transposition of
     45byte-oriented character data into parallel bit streams (each with one
     46bit per input byte) which permits text processing to exploit SIMD
     47operations on modern processors. Our earlier work on inductive doubling
     48instructions~\cite{CameronLin2009} discusses effective techniques to
     49transform the text into the Parabix representation.  We
     50have used Parabix to accelerate UTF-8 to UTF-16 transcoding
     51\cite{Cameron2008}, string matching in protein identification
     52\cite{JMBE:31@99}, and specific parts of a traditional
     53recursive-descent XML parser~\cite{cameron-EuroPar2011}.
    5854
    5955
  • docs/HPCA2012/final_ieee/05-corei3.tex

    r1744 r1768  
    99%some of the numbers are roughly calculated, needs to be recalculated for final version
    1010\subsection{Cache behavior}
     11
     12
     13
     14Table \ref{cache_misses} shows the cache misses per kilobyte of input
     15data. Analytically, the cache misses for the Expat and Xerces parsers
     16represent a 0.5 cycle per XML byte cost.\footnote{The approximate miss penalty on the \CITHREE\ for L1, L2 and L3 caches is
     174, 11, and 36 cycles respectively.}
     18
    1119
    1220
     
    2432\label{cache_misses}
    2533\end{table}
    26 
    27 
    28 Table \ref{cache_misses} shows the cache misses per kilobyte of input
    29 data. Analytically, the cache misses for the Expat and Xerces parsers
    30 represent a 0.5 cycle per XML byte cost.\footnote{The approximate miss penalty on the \CITHREE\ for L1, L2 and L3 caches is
    31 4, 11, and 36 cycles respectively.}
    32 
    3334
    3435
  • docs/HPCA2012/final_ieee/10-related.tex

    r1743 r1768  
    2525instructions~\cite{sse4}.
    2626
    27 Recently, Cameron et al.~\cite{CameronHerdyLin2008,
    28   cameron-EuroPar2011} accelerated specific phases in an XML parser
    29 using widely available SSE2 instructions and proposed an inductive
    30 doubling instruction set ~\cite{CameronLin2009}. In this paper, we
    31 have developed a generalized parabix architecture and have described
    32 the software tool chain that programmers can use to build scalable
    33 text processing applications on commodity multicores. We have explored
    34 in the detail the tradeoffs between the SIMD implementations across
    35 processor generations (i.e., SSE vs AVX) and multiple platfoms (ARM vs
    36 Intel). Finally, we have also explored the benefits of pipeline parallelism.
     27Parallel bitstreams were introduced by Cameron et
     28al.~\cite{CameronHerdyLin2008} and used it to implement an efficient
     29UTF-8 to 16 parser. Subsequent work ~\cite{cameron-EuroPar2011}
     30accelerated specific phases in an XML parser using widely available
     31SSE2 instructions and proposed an inductive doubling instruction set
     32~\cite{CameronLin2009}. In this paper, we have developed a generalized
     33parabix architecture and have described the software tool chain that
     34programmers can use to build scalable text processing applications on
     35commodity multicores. We have explored in the detail the tradeoffs
     36between the SIMD implementations across processor generations (i.e.,
     37SSE vs AVX) and multiple platfoms (ARM vs Intel). Finally, we have
     38also explored the benefits of using pipeline-based multicore
     39parallelism as a technique to eliminate imbalances in SIMD
     40bitstream-based parallelization and improve overall efficiency.
    3741
    3842
  • docs/HPCA2012/final_ieee/final.aux

    r1752 r1768  
    22\citation{Asanovic:EECS-2006-183}
    33\citation{xmlchip}
     4\citation{CameronLin2009}
    45\citation{Cameron2008}
    56\citation{JMBE:31@99}
     
    4041\newlabel{parsers}{{5}{5}}
    4142\@writefile{toc}{\contentsline {paragraph}{XML Parsers:}{5}}
     43\newlabel{workloads}{{5}{5}}
    4244\citation{bellosa2001,bertran2010}
    4345\citation{clamp}
    4446\@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Parabix XML Parser Structure\relax }}{6}}
    4547\newlabel{parabix_arch}{{7}{6}}
    46 \newlabel{workloads}{{5}{6}}
    4748\@writefile{toc}{\contentsline {paragraph}{XML Workloads:}{6}}
    4849\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces XML Document Characteristics\relax }}{6}}
     
    113114\citation{tan-sherwood-isca-2005}
    114115\citation{sse4}
    115 \citation{CameronHerdyLin2008,cameron-EuroPar2011}
     116\citation{CameronHerdyLin2008}
     117\citation{cameron-EuroPar2011}
    116118\citation{CameronLin2009}
    117119\@writefile{toc}{\contentsline {section}{\numberline {10}Related Work}{11}}
  • docs/HPCA2012/final_ieee/final.log

    r1752 r1768  
    1 This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.5.12)  7 DEC 2011 08:02
     1This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2011.10.18)  8 DEC 2011 12:16
    22entering extended mode
    33 %&-line parsing enabled.
    4 **final
     4**final.tex
    55(./final.tex
    66LaTeX2e <2009/09/24>
     
    412412 <./plots/performance_energy_chart.pdf>]
    413413LaTeX Font Info:    Font shape `OT1/ptm/bx/n' in size <8> not available
    414 (Font)              Font shape `OT1/ptm/b/n' tried instead on input line 104.
     414(Font)              Font shape `OT1/ptm/b/n' tried instead on input line 100.
    415415LaTeX Font Info:    Font shape `OT1/ptm/bx/n' in size <5> not available
    416 (Font)              Font shape `OT1/ptm/b/n' tried instead on input line 104.
     416(Font)              Font shape `OT1/ptm/b/n' tried instead on input line 100.
    417417) (./02-background.tex
    418418LaTeX Font Info:    Try loading font information for OT1+pcr on input line 55.
     
    468468
    469469<use plots/corei3_BM.pdf>
    470 Overfull \hbox (7.22688pt too wide) in paragraph at lines 100--102
     470Overfull \hbox (7.22688pt too wide) in paragraph at lines 101--103
    471471 []
    472472 []
     
    476476
    477477
    478 Overfull \hbox (7.49034pt too wide) in paragraph at lines 147--155
     478Overfull \hbox (7.49034pt too wide) in paragraph at lines 148--156
    479479 []
    480480 []
     
    558558Here is how much of TeX's memory you used:
    559559 3934 strings out of 493848
    560  54935 string characters out of 1152823
    561  120286 words of memory out of 3000000
     560 54935 string characters out of 1152822
     561 119286 words of memory out of 3000000
    562562 7039 multiletter control sequences out of 15000+50000
    563563 69892 words of font info for 168 fonts, out of 3000000 for 9000
    564564 717 hyphenation exceptions out of 8191
    565  38i,12n,38p,1452b,370s stack positions out of 5000i,500n,10000p,200000b,50000s
     565 38i,12n,38p,1456b,370s stack positions out of 5000i,500n,10000p,200000b,50000s
    566566{/usr/share/texmf-texlive/fonts/enc/dvips/base/8r.enc}</u
    567567sr/share/texmf-texlive/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/share/te
     
    575575</usr/share/texmf-texlive/fonts/type1/urw/times/utmr8a.pfb></usr/share/texmf-te
    576576xlive/fonts/type1/urw/times/utmri8a.pfb>
    577 Output written on final.pdf (12 pages, 517871 bytes).
     577Output written on final.pdf (12 pages, 517924 bytes).
    578578PDF statistics:
    579579 275 PDF objects out of 1000 (max. 8388607)
Note: See TracChangeset for help on using the changeset viewer.