Changeset 1411


Ignore:
Timestamp:
Aug 31, 2011, 6:14:20 PM (8 years ago)
Author:
ashriram
Message:

spell checked evaluation

Location:
docs/HPCA2012
Files:
11 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/01-intro.tex

    r1407 r1411  
    9191\footnote{The actual energy consumption of the XML
    9292  ASIC chips is not published by the companies.}
    93 
    94 Overall we make the following contributions in this paper.
     93%
     94Overall we make the following contributions:
    9595
    96961) We outline the Parabix architecture, tool chain and run-time
     
    9999While studied in the context of XML parsing, the Parabix framework
    100100can be widely applied to many problems in text processing and
    101 parsing.
     101parsing.  We have realeased Parabix completely open source
     102 and are interested in exploring the applications that can take
     103 advantage of our tool chain(\textit{http://anonymous}).
    102104
    103 2) We compare Parabix XML parsers against conventional parsers and
     105
     1062) We compare the Parabix XML parser against conventional parsers and
    104107assess the improvement in overall performance and energy efficiency on
    105 each platform.  We are the first to compare and contrast SSE/AVX
    106 extensions across multiple generation of Intel processors and show
    107 that there are performance challenges when using newer generation SIMD
    108 extensions. We compare the ARM Neon extensions against the x86 SIMD
    109 extensions and comment on the latency of SIMD operations across these
    110 architectures.
     108variety of hardware platforms.  We are the first to compare and
     109contrast SSE/AVX extensions across multiple generation of Intel
     110processors and show that there are performance challenges when using
     111newer generation SIMD extensions. We compare the ARM Neon extensions
     112against the x86 SIMD extensions and comment on the latency of SIMD
     113operations.
    111114
    1121153) Finally, building on the SIMD parallelism of Parabix technology,
     
    120123Section~\ref{section:background} presents background material on XML
    121124parsing and provides insight into the inefficiency of traditional
    122 parsers on mainstream processors.  Section~\ref{section:parabix}
    123 describes the Parabix architecture, tool chain and run-time
    124 environment.  Section~\ref{section:parser} describes the application
    125 of the Parabix framework to the construction of an XML parser
    126 enforcing all the well-formedness rules of the XML specification.
    127 Section~\ref{section:baseline} presents a detailed performance
    128 analysis of Parabix on a \CITHREE\ system using hardware performance
    129 counters and compares it against conventional parsers.
     125parsers.  Section~\ref{section:parabix} describes the Parabix
     126architecture, tool chain and run-time environment.
     127Section~\ref{section:parser} describes the our design of an XML parser
     128based on the Parabix framework.  Section~\ref{section:baseline}
     129presents a detailed performance analysis of Parabix on a
     130\CITHREE\ system using hardware performance counters.
    130131Section~\ref{section:scalability} compares the performance and energy
    131132efficiency of 128 bit SIMD extensions across three generations of
  • docs/HPCA2012/02-background.tex

    r1393 r1411  
    9393\cite{xerces}, uses a series of nested switch statements and
    9494state-dependent flag tests to control the parsing logic of the
    95 program.  Our analysis, which we detail in Section
    96 \ref{section:XML-branches}, found that Xerces requires between 6 - 13
    97 branches per byte of XML to support this form of control flow,
    98 depending on the fraction of markup in the overall document.  Cache
     95program. Xerces's complex data dependent control flow requires between
     966 --- 13 branches per byte of XML input, depending on the markup in
     97the file (details in Section~\ref{section:XML-branches}).  Cache
    9998utilization is also significantly reduced due to the manner in which
    10099markup and content must be scanned and buffered for future use.  For
  • docs/HPCA2012/05-corei3.tex

    r1407 r1411  
    5151\label{section:XML-branches}
    5252In general, performance is limited by branch mispredictions.
    53 Unfortunetly, it is difficult to reduce the branch misprediction rate of
     53Unfortunately, it is difficult to reduce the branch misprediction rate of
    5454traditional XML parsers due to:
    5555(1) the variable length nature of the syntactic elements contained within XML documents;
  • docs/HPCA2012/06-scalability.tex

    r1409 r1411  
    1 \section{Evaluation of Parabix accross different Hardware}
     1\section{Evaluation of Parabix across different Hardware}
    22\label{section:scalability}
    33\subsection{Performance}
  • docs/HPCA2012/07-avx.tex

    r1410 r1411  
    5151the version that only takes advantage of the AVX 3-operand mode is
    5252labeled ``128-bit avx,'' and the version uses the 256-bit
    53 operations wherever possible is labelled ``256-bit avx.''  The
     53operations wherever possible is labeled ``256-bit avx.''  The
    5454instruction counts are divided into three classes: ``non-SIMD''
    5555operations are the general purpose instructions.  The ``bitwise SIMD''
  • docs/HPCA2012/08-arm.tex

    r1339 r1411  
    3333
    3434Migration of Parabix2 to the Android platform began with the
    35 retargetting of a subset of the Parabix2 IDISA SIMD library for ARM
     35re-targeting of a subset of the Parabix2 IDISA SIMD library for ARM
    3636NEON.  This library code was cross-compiled for Android using the
    3737Android NDK. The Android NDK is a companion tool to the Android SDK
  • docs/HPCA2012/10-related.tex

    r1407 r1411  
    1515of numerous multi-threaded and hardware-based approaches:
    1616Multithreaded XML techniques include preparsing the XML file to locate
    17 key partitioning points \cite{ZhangPanChiu09} and speculative p-DFAs
    18 \cite{ZhangPanChiu09}. Hardware methods include custom XML chips
    19 \cite{Leventhal2009} and FPGA-based implementations
    20 \cite{DaiNiZhu2010}.  Recently Cameron et
    21 al.~\cite{CameronHerdyLin2008, cameron-EuroPar2011} accelerated XML
    22 parsing using SSE instructions. Finally, other have explored the
    23 design of custom hardware for bit parallel operations in network
     17key partitioning points~\cite{ParaDOM2009,LiWangLiuLi2009} and
     18speculative p-DFAs~\cite{ZhangPanChiu09}. Hardware methods include
     19custom XML chips \cite{Leventhal2009} and FPGA-based implementations
     20\cite{DaiNiZhu2010}.  Intel's SSE4 instructions targeted
     21XML parsers, but these have not seen widespread use because of portability
     22concerns and the programming challenges that accompany low level
     23instructions~\cite{sse4}. Recently, Cameron et
     24al.~\cite{CameronHerdyLin2008, cameron-EuroPar2011} designed an
     25accelerated XML parser using widely available SSE2
     26instructions. Finally, others have explored the design of custom
     27hardware for bit parallel operations for text search in network
    2428processors~\cite{tan-sherwood-isca-2005}.
     29
     30
     31
     32% To accelerate XML parsingmost of the recent work has
     33% focused on parallelization through the use of multicore parallelism
     34% for chip multiprocessors \cite{ZhangPanChiu09, },
     35
     36
     37
    2538
    2639
  • docs/HPCA2012/11-conclusions.tex

    r1379 r1411  
    2424reduction in branches, 7$\times$---15$\times$ reduction in branch mispredictions,
    2525% ?\times$ reduction in LLC misses, and increase in data parallelism
    26 processing upto 128 characters with a single operation. We used the
     26processing up to 128 characters with a single operation. We used the
    2727Parabix framework and XML parsers to study the features of the new 256
    2828bit AVX extension in Intel processors. We find that while the move to
  • docs/HPCA2012/main.tex

    r1398 r1411  
    128128%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    129129% ACM title header format
    130 \title{\vspace{-30pt} Boosting the Efficiency of Text Processing on Commodity Processors: The Parabix Story
     130\title{\vspace{-30pt} Parabix : Boosting the Efficiency of Text
     131  Processing on \\ Commodity Processors
    131132%
    132133% \thanks{%
     
    187188% tighten spacing:
    188189\let\oldthebibliography\thebibliography
    189 \def\thebibliography#1{\oldthebibliography{#1}\parsep-5pt\itemsep0pt}
    190 % \vspace{-\baselineskip}
     190\def\thebibliography#1{\oldthebibliography{#1}\parsep5pt\itemsep0pt}
    191191{
    192192\setstretch{1}
    193193 \footnotesize
    194 % \scriptsize
    195194\bibliographystyle{abbrv}
    196195 \bibliography{reference}
  • docs/HPCA2012/reference.bib

    r1405 r1411  
    563563  year = {Aug 2009}
    564564  }
     565
     566@misc{sse4,
     567author= {Zhai Lei},
     568title = {XML Parsing Accelerator with Intel Streaming SIMD Extensions 4},
     569howpublished = "{http://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/}"},
     570year = {2008}
     571}
Note: See TracChangeset for help on using the changeset viewer.