Ignore:
Timestamp:
Nov 30, 2011, 11:30:44 AM (7 years ago)
Author:
ashriram
Message:

First pass final version [ashriram]

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/01-intro.tex

    r1691 r1743  
    11\section{Introduction}
    22
    3 As a result of information expansion and diversification of the data format,
    4 the demands of high performance and energy efficient text processing are rapidly increasing.
    5 However, classical Dennard voltage scaling has reached its limits
    6 which gives the traditional byte-at-a-time processing methods little space
    7 for further improvement. An alternative is to increase energy efficiency
    8 by operating at more optimal core frequencies and achieve better performance
    9 with a larger number of cores. Unfortunately, given the limited levels of parallelism
    10 that can be found in applications~\cite{blake-isca-2010}, especailly text processing,
    11 in which, many applications, for example, XML parsing, are sequential by nature,
    12 it is not certain how many cores can be productively used in scaling our
    13 chips~\cite{esmaeilzadeh-isca-2011}. In a widely cited Berkeley study~\cite{Asanovic:EECS-2006-183},
    14 the infamous ``thirteenth dwarf'' (parsers/finite state machines) is considered to be the hardest
    15 application class to parallelize.
     3As a result of information expansion and diversification of the data
     4format, the demand for high performance and energy efficient text
     5processing is rapidly rising. A widely-used text-based data storage
     6format is XML. XML is a standard of the web consortium that provides a
     7common framework for encoding and communicating data.  XML is used in
     8applications ranging from Office Open XML in Microsoft Office to NDFD
     9XML of the NOAA National Weather Service, from KML in Google Earth to
     10Castor XML in the Martian Rovers. In a widely cited Berkeley
     11study~\cite{Asanovic:EECS-2006-183}, the ``thirteenth dwarf''
     12(parsers/finite state machines) which processes text is considered to be
     13the hardest application class to parallelize.
    1614
    17 A new technology, Parabix, was introduced to exploit the SIMD extensions on commodity processors
    18 to process hundreds of character positions in an input stream simultaneously~\cite{Cameron2008}.
    19 Parabix first transposes byte-oriented character data into parallel bit streams
    20 using sophisticated SIMD instructions that enable data elements to be packed into registers.
    21 With the bit streams, where each bit represents one character from the input data, the text can then
    22 be processed in parallel within the SIMD registers.
    23 This improves the overall cache behaviour of the application resulting in significantly
    24 fewer misses and better utilization.  Parabix also dramatically
    25 reduces branches in the parsing routines resulting in a more efficient
    26 pipeline and substantially improves register utilization which
    27 minimizes energy wasted on data transfers.
    2815
    29 We apply Parabix technology to the problem of XML parsing.  XML is a
    30 standard of the web consortium that provides a common framework for
    31 encoding and communicating data.  XML provides critical data storage
    32 for applications ranging from Office Open XML in Microsoft Office to
    33 NDFD XML of the NOAA National Weather Service, from KML in Google
    34 Earth to Castor XML in the Martian Rovers.  XML parsing efficiency is
    35 important for multiple application areas; in server workloads the key
    36 focus in on overall transactions per second, while in applications for
    37 network switches and cell phones, latency and energy are of paramount
    38 importance.  Conventional software-based XML parsers have many
    39 inefficiencies including considerable branch misprediction penalties
    40 due to complex input-dependent branching structures as well as poor
    41 use of caches and memory bandwidth due to byte-at-a-time
    42 processing.  XML ASIC chips have been around since early 2003, but typically lag behind CPUs in technology due to
    43 cost constraints~\cite{xmlchip}. They also focus mainly on speeding up the parser
    44 computation itself and are limited by the poor memory behaviour.
     16Given the limited levels of parallelism that can be found in text
     17processing, for example, XML parsing, is inherently sequential, it is
     18not certain how many cores can be productively used.  Conventional
     19software-based XML parsers have many inefficiencies including
     20considerable branch misprediction penalties due to complex
     21input-dependent branching structures as well as poor use of caches and
     22memory bandwidth due to byte-at-a-time processing. XML ASIC chips have
     23been around since early 2003, but typically lag behind CPUs in
     24technology due to cost constraints~\cite{xmlchip}. They also focus
     25mainly on speeding up the parser computation itself and are limited by
     26the poor memory behaviour.
     27
     28
     29; in server
     30workloads the key focus in on overall transactions per second, while
     31in applications for network switches and cell phones, latency and
     32energy are of paramount importance.
     33
     34
     35
     36
     37%
     38% Introduce Parabix.
     39%
     40
     41
     42
     43
     44%However, classical Dennard voltage scaling has reached its limits
     45%which gives the traditional byte-at-a-time processing methods little
     46%space for further improvement. An alternative is to increase energy
     47%efficiency by operating at more optimal core frequencies and achieve
     48%better performance with a larger number of cores.
     49%~\cite{blake-isca-2010}, in scaling our
     50%chips~\cite{esmaeilzadeh-isca-2011}
     51
     52A new technology, Parabix, was introduced to exploit the SIMD
     53extensions on commodity processors to process hundreds of character
     54positions in an input stream simultaneously~\cite{Cameron2008}.
     55Parabix first transposes byte-oriented character data into parallel
     56bit streams using sophisticated SIMD instructions that enable data
     57elements to be packed into registers.  With the bit streams, where
     58each bit represents one character from the input data, the text can
     59then be processed in parallel within the SIMD registers.  This
     60improves the overall cache behaviour of the application resulting in
     61significantly fewer misses and better utilization.  Parabix also
     62dramatically reduces branches in the parsing routines resulting in a
     63more efficient pipeline and substantially improves register
     64utilization which minimizes energy wasted on data transfers.
     65
     66We apply Parabix technology to the problem of XML parsing.
    4567Our focus is how much we can improve performance of
    4668the XML parser on commodity processors with Parabix technology.
    4769
    48 The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
    49 which applies a sequential bit scan method, has already shown a
    50 substantial improvement on performance. The latest version or the
    51 second generation of Parabix XML parser~\cite{Cameron2010} introduced
    52 a new idea, parallel bit scan, which provides us a more efficient
    53 scanning and better utilization of the resources.
     70%The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
     71%which applies a sequential bit scan method, has already shown a
     72%substantial improvement on performance. The latest version or the
     73%second generation of Parabix XML parser~\cite{Cameron2010} introduced
     74%a new idea, parallel bit scan, which provides us a more efficient
     75%scanning and better utilization of the resources.
    5476
    5577
Note: See TracChangeset for help on using the changeset viewer.