Changeset 1350 for docs


Ignore:
Timestamp:
Aug 23, 2011, 11:42:04 AM (8 years ago)
Author:
ashriram
Message:

New conclusion

Location:
docs/HPCA2012
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/00-abstract.tex

    r1349 r1350  
    11In modern applications text files are employed widely. For example,
    22XML files provide data storage in human readable format and are widely
    3 used in web services, database systems, and mobile phone SDKs.
    4 Traditional text processing tools are built around a byte-at-a-time
    5 processing model where each character token of a document is
    6 examined. The byte-at-a-time model is highly challenging for commodity
    7 processors. It includes many unpredictable input-dependent branches
    8 which cause pipeline squashes and stalls. Furthermore, typical text
    9 processing tools perform few operations per processed character and
    10 experience high cache miss rates when parsing the file. Overall,
    11 parsing text in important domains like XML processing requires high
    12 performance motivating hardware designers to adopt customized hardware
    13 and ASIC solutions.
     3used in applications ranging from database systems to mobile phone
     4SDKs.  Traditional text processing tools are built around a
     5byte-at-a-time processing model where each character token of a
     6document is examined. The byte-at-a-time model is highly challenging
     7for commodity processors. It includes many unpredictable
     8input-dependent branches which cause pipeline squashes and
     9stalls. Furthermore, typical text processing tools perform few
     10operations per processed character and experience high cache miss
     11rates. Overall, parsing text in important domains like XML processing
     12requires high performance motivating hardware designers to adopt ASIC
     13solutions.
    1414
    1515% In this paper on commodity.
     
    2121In this paper, we enable text processing applications to effectively
    2222use commodity processors. We introduce Parabix (Parallel Bitstream)
    23 technology, a software runtime and execution model that allows applications
    24 to exploit modern SIMD instructions extensions for high performance
    25 text processing. Parabix enables the application developer to write
    26 constructs assuming unlimited SIMD data parallelism. Our runtime
    27 translator generates code based on machine specifics (e.g., SIMD
    28 register widths) to realize the programmer specifications.  The key
    29 insight into efficient text processing in Parabix is the data
    30 organization. It transposes the sequence of 8-bit characters into
    31 sets of 8 parallel bit streams which then enables us to operate on
    32 multiple characters with single bit-parallel SIMD operators. We
    33 demonstrate the features and efficiency of parabix with a XML parsing
    34 application. We evaluate a Parabix-based XML parser against two widely
    35 used XML parsers, Expat and Apache's Xerces, and across three
    36 generations of x86 processors, including the new Intel \SB{}.  We show
    37 that Parabix's speedup is 2$\times$--7$\times$ over Expat and
    38 Xerces. We observe that Parabix overall makes efficient use of
    39 intra-core parallel hardware on commodity processors and supports
    40 significant gains in energy. Using Parabix, we assess the scalability
    41 advantages of SIMD processor improvements across Intel processor
    42 generations, culminating with a look at the latex 256-bit AVX
    43 technology in \SB{} versus the now legacy 128-bit SSE technology. As
    44 part of this study we also preview the Neon extensions on ARM
    45 processors. Finally, we partition the XML program into pipeline stages
    46 and demonstrate that thread-level parallelism exploits SIMD units
    47 scattered across the different cores and improves performance
    48 (2$\times$ on 4 cores) at same energy levels as the single-thread
    49 version.
     23technology, a software runtime and execution model that allows
     24applications to exploit modern SIMD instructions extensions for high
     25performance text processing. Parabix enables the application developer
     26to write constructs assuming unlimited SIMD data parallelism and
     27Parabix's runtime translator generates code based on machine specifics
     28(e.g., SIMD register widths).  The key insight into efficient text
     29processing in Parabix is the data organization. Parabix transposes the
     30sequence of character bytes into sets of 8 parallel bit streams which
     31then enables us to operate on multiple characters with single
     32bit-parallel SIMD operators. We demonstrate the features and
     33efficiency of parabix with a XML parsing application. We evaluate a
     34Parabix-based XML parser against two widely used XML parsers, Expat
     35and Apache's Xerces, and across three generations of x86 processors,
     36including the new Intel \SB{}.  We show that Parabix's speedup is
     372$\times$--7$\times$ over Expat and Xerces. We observe that Parabix
     38overall makes efficient use of intra-core parallel hardware on
     39commodity processors and supports significant gains in energy. Using
     40Parabix, we assess the scalability advantages of SIMD processor
     41improvements across Intel processor generations, culminating with a
     42look at the latex 256-bit AVX technology in \SB{} versus the now
     43legacy 128-bit SSE technology. Finally, we partition the XML
     44program into pipeline stages and demonstrate that thread-level
     45parallelism exploits SIMD units scattered across the different cores
     46and improves performance (2$\times$ on 4 cores) at same energy levels
     47as the single-thread version.
    5048
    5149
  • docs/HPCA2012/10-conclusions.tex

    r1339 r1350  
    11\section{Conclusion}
    22\label{section:conclusion}
    3 This paper has examined energy efficiency and performance
    4 characteristics of four XML parsers considered over three
    5 generations of Intel processor architecture and shown that
    6 parsers based on parallel bit stream technology have dramatically
    7 better performance, energy efficiency and scalability than
    8 traditional byte-at-a-time parsers widely deployed in current
    9 software.  Based on a novel application of the short vector
    10 SIMD technology commonly found in commodity processors of
    11 all kinds, parallel bit stream technology scales well with
    12 improvements in processor SIMD capabilities.  With the recent
    13 introduction of the first generation of Intel processors that
    14 incorporate AVX technology, the change to 3-operand
    15 form SIMD operations has delivered a substantial benefit
    16 for the Parabix2 parsers simply through recompilation.
    17 Restructuring of Parabix2 to take advantage of the 256-bit SIMD
    18 capabilities also delivered a substantial reduction in
    19 instruction count, but without corresponding performance
    20 benefits in the first generation of AVX implementations.
     3% In this paper we presented a framework.
     4% We demonstrated on XML.
     5% We showed benefits
     6% We analyzed SIMD
     7% We stacked multithreading
     8% We have released it.
    219
     10% Future research
    2211
    23 There are many directions for further research. These
    24 include compiler and tools technology to automate the low-level
    25 programming tasks inherent in building parallel bit stream
    26 applications, widening the research by applying the techniques
    27 to other forms of text analysis and parsing, and further
    28 investigation of the interaction between parallel bit
    29 stream technology and processor architecture.  Two promising
    30 avenues include investigation of GPGPU approaches to parallel
    31 bit stream technology and the leveraging of the intraregister parallelism
    32 inherent in this approach to also take advantage of the intrachip
    33 parallelism of multicore processors.
     12In this paper we presented Parabix a software runtime framework for
     13exploiting SIMD data units found on commodity processors for text
     14processing.  The Parabix framework allows to focus on exposing the
     15parallelism in their application assuming an infinite resource
     16abstract SIMD machine without worrying about or having to change code
     17to handle processor specifics (e.g., 128 bit SIMD SSE vs 256 bit SIMD
     18on AVX). We applied Parabix technology to a widely deployed
     19application; XML parsing and demonstrate the efficiency gains that can
     20be obtained on commodity processors. Compared to the conventional XML
     21parsers, Expat and Xerces, we achieve 2$\times$---7$\times$
     22improvement in performance and average x$\times$ improvement in
     23energy. We achieve high compute efficiency with an overall ?$\times$
     24reduction in branches, ?$\times$ reduction in branche mispredictions,
     25?%\times$ reduction in LLC misses, and increase in data parallelism
     26processing upto 128 characters with a single operation. We used the
     27Parabix framework and XML parsers to study the features of the new 256
     28bit AVX extension in Intel processors. We find that while the move to
     293-operand instructions deliver significant benefit the wider
     30operations in some cases have higher overheads compared to the
     31existing 128 bit SSE operations. We also compare Intel's SIMD
     32extensions against the ARM Neon. Note that Parabix allowed us to
     33perform these studies without having to change the application source.
     34Finally, we parallelized the Parabix XML parser to take advantage of
     35the SIMD units in every core on the chip. We demonstrate that the
     36benefits of thread-level-parallelism are complementary to the
     37fine-grain parallelism we exploit; parallelized Parabix achieves a
     38further 2$\times$ improvement in performance.
  • docs/HPCA2012/latex/iccv.sty

    r1335 r1350  
    9999      \begin{tabular}[t]{c}
    100100 %     \ificcvfinal\@author\else Anonymous HPCA submission\\
    101         \vspace*{1pt}\\%This space will need to be here in the final copy, so don't squeeze it out for the review copy.
     101 %    \vspace*{1pt}\\%This space will need to be here in the final copy, so don't squeeze it out for the review copy.
    102102Paper ID \iccvPaperID %\fi
    103103      \end{tabular}
     
    114114   {%
    115115   \centerline{\large\bf Abstract}%
    116   \vspace*{12pt}%
     116  % \vspace*{12pt}%
    117117   \it%
    118118   }
  • docs/HPCA2012/main.tex

    r1348 r1350  
    4242\renewcommand\section{\@startsection
    4343{section}{1}{0pt}%
    44 {0.2\baselineskip}%
     44{0.1\baselineskip}%
    4545{0.1\baselineskip}%
    4646{\normalfont\Large\bfseries\raggedright}%
     
    164164\maketitle
    165165\begin{abstract}
    166 %\centerline{\large\textbf{Abstract}}
    167166\begin{singlespace}
    168 %\bigskip
    169 %\begin{quotation}
    170167\emph{\input{00-abstract.tex}}
    171 %\end{quotation}
    172168\end{singlespace}
    173169\end{abstract}
     
    184180\input{09-pipeline.tex}
    185181\input{10-conclusions.tex}
     182
     183
     184
    186185% tighten spacing:
    187186\let\oldthebibliography\thebibliography
  • docs/HPCA2012/preamble-submit.tex

    r1335 r1350  
    6060\marginparsep 0in
    6161\marginparwidth 0in
    62 \topmargin -0.2in
     62\topmargin -0.1in
    6363%\headheight 0in
    6464%\headsep 0in
Note: See TracChangeset for help on using the changeset viewer.