Ignore:
Timestamp:
Nov 17, 2011, 2:52:12 PM (8 years ago)
Author:
lindanl
Message:

Change Introduction

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/01-intro.tex

    r1652 r1691  
    11\section{Introduction}
    2 Classical Dennard voltage scaling enabled us to keep all of transistors
    3 afforded by Moore's law active. Dennard scaling reached its limits in
    4 2005 and this has resulted in a rethink of the way general-purpose
    5 processors are built: frequencies have remained stagnant
    6 over the last 5 years with the capability to boost a core's frequency
    7 only if other cores on the chip are shut off. Chip makers strive to
    8 achieve energy efficient computing by operating at more optimal core
    9 frequencies and aim to increase performance with a larger number of
    10 cores. Unfortunately, given the limited levels of parallelism that can
    11 be found in applications~\cite{blake-isca-2010}, it is not certain how
    12 many cores can be productively used in scaling our
    13 chips~\cite{esmaeilzadeh-isca-2011}. This is because exploiting
    14 parallelism across multiple cores tends to require heavy weight
    15 threads that are difficult to manage and synchronize.
    162
    17 The desire to improve the overall efficiency of computing is pushing
    18 designers to explore customized hardware~\cite{venkatesh-asplos-2010,
    19   hameed-isca-2010} that accelerate specific parts of an application
    20 while reducing the overheads present in general-purpose
    21 processors. They seek to exploit the transistor bounty to provision
    22 many different accelerators and keep only the accelerators needed for
    23 an application active while switching off others on the chip to save
    24 power consumption. While promising, given the fast evolution of
    25 languages and software, its hard to define a set of fixed-function
    26 hardware for commodity processors. Furthermore, the toolchain to
    27 create such customized hardware is itself a hard research
    28 challenge. We believe that software, applications, and runtime models
    29 themselves can be refactored to significantly improve the overall
    30 computing efficiency of commodity processors.
     3As a result of information expansion and diversification of the data format,
     4the demands of high performance and energy efficient text processing are rapidly increasing.
     5However, classical Dennard voltage scaling has reached its limits
     6which gives the traditional byte-at-a-time processing methods little space
     7for further improvement. An alternative is to increase energy efficiency
     8by operating at more optimal core frequencies and achieve better performance
     9with a larger number of cores. Unfortunately, given the limited levels of parallelism
     10that can be found in applications~\cite{blake-isca-2010}, especailly text processing,
     11in which, many applications, for example, XML parsing, are sequential by nature,
     12it is not certain how many cores can be productively used in scaling our
     13chips~\cite{esmaeilzadeh-isca-2011}. In a widely cited Berkeley study~\cite{Asanovic:EECS-2006-183},
     14the infamous ``thirteenth dwarf'' (parsers/finite state machines) is considered to be the hardest
     15application class to parallelize.
    3116
    32 In this paper we tackle the infamous ``thirteenth dwarf''
    33 (parsers/finite state machines) that is considered to be the hardest
    34 application class to parallelize~\cite{Asanovic:EECS-2006-183}. We
    35 present Parabix, a novel execution framework and software runtime
    36 environment that can be used to dramatically improve the efficiency of
    37 text processing and parsing on commodity processors.  Parabix
    38 transposes byte-oriented character data into parallel bit streams and
    39 then exploits the SIMD extensions on commodity processors (SSE/AVX on
    40 x86, Neon on ARM) to process hundreds of character positions in an
    41 input stream simultaneously. To transform character-oriented data into
    42 bit streams Parabix exploits sophisticated SIMD instructions that
    43 enable data elements to be packed into registers. This improves the
    44 overall cache behaviour of the application resulting in significantly
     17A new technology, Parabix, was introduced to exploit the SIMD extensions on commodity processors
     18to process hundreds of character positions in an input stream simultaneously~\cite{Cameron2008}.
     19Parabix first transposes byte-oriented character data into parallel bit streams
     20using sophisticated SIMD instructions that enable data elements to be packed into registers.
     21With the bit streams, where each bit represents one character from the input data, the text can then
     22be processed in parallel within the SIMD registers.
     23This improves the overall cache behaviour of the application resulting in significantly
    4524fewer misses and better utilization.  Parabix also dramatically
    4625reduces branches in the parsing routines resulting in a more efficient
     
    6746the XML parser on commodity processors with Parabix technology.
    6847
    69 
     48The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
     49which applies a sequential bit scan method, has already shown a
     50substantial improvement on performance. The latest version or the
     51second generation of Parabix XML parser~\cite{Cameron2010} introduced
     52a new idea, parallel bit scan, which provides us a more efficient
     53scanning and better utilization of the resources.
    7054
    7155
     
    7963
    8064
    81 
     65In this paper, We present Parabix tool chain, a novel execution framework
     66and software runtime environment that can be used to dramatically improve
     67the efficiency of text processing and parsing on commodity processors.
    8268Figure~\ref{perf-energy} showcases the overall efficiency of our
    8369framework. The Parabix-XML parser improves the
Note: See TracChangeset for help on using the changeset viewer.