Ignore:
Timestamp:
Aug 23, 2011, 1:02:30 AM (8 years ago)
Author:
ashriram
Message:

new abstract for new intro

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/00-abstract.tex

    r1302 r1348  
    1 XML is a set of rules for the encoding of documents in machine-readable form.
    2 The simplicity and generality of the rules make it widely used in web services and database
    3 systems.  Traditional XML parsers are built around a
    4 byte-at-a-time processing model where each character token
    5 of an XML document is examined in sequence.  Unfortunately, the byte-at-a-time
    6 sequential model is a performance barrier in more demanding applications,
    7 is energy-inefficient, and makes poor use of the wide SIMD registers
    8 and other parallelism features of modern processors.
     1In modern applications text files are employed widely. For example,
     2XML files provide data storage in human readable format and are widely
     3used in web services, database systems, and mobile phone SDKs.
     4Traditional text processing tools are built around a byte-at-a-time
     5processing model where each character token of a document is
     6examined. The byte-at-a-time model is highly challenging for commodity
     7processors. It includes many unpredictable input-dependent branches
     8which cause pipeline squashes and stalls. Furthermore, typical text
     9processing tools perform few operations per processed character and
     10experience high cache miss rate when parsing the file. Overall,
     11parsing text in important domains like XML processing require high
     12performance and hardware designers have adopted customized hardware
     13and ASIC solutions.
    914
    10 This paper assesses the energy and performance of a new approach
    11 to XML parsing, based on parallel bit stream technology, and as implemented on successive
    12 software generations of the Parabix XML parser.
    13 In Parabix, we first convert character streams into sets of parallel
    14 bit streams. We then exploit the SIMD operations prevalent on commodity-level hardware for performance.
    15 The first generation Parabix1 parser exploits the processor built-in $bitscan$ instructions
    16 over these streams to make multibyte moves but follows an otherwise sequential
    17 approach.  The second generation Parabix2 technology adds further
    18 parallelism by replacing much of the sequential
    19 bit scanning with a parallel scanning approach based on bit stream
    20 addition.  We evaluate Parabix1 and Parabix2
    21 against two widely used XML parsers, James Clark's Expat and Apache's Xerces, and
    22 across three generations of x86 machines, including the new Intel
    23 \SB{}.  We show that Parabix2's speedup is 2$\times$--7$\times$
    24 over Expat and Xerces.  In stark contrast to the energy expenditures necessary
    25 to realize performance gains through multicore parallelism, we also show
    26 that our Parabix parsers deliver energy savings in direct proportion
    27 to the gains in performance.  In addition, we assess the scalability advantages
    28 of SIMD processor improvements across Intel processor generations,
    29 culminating with an evaluation of the 256-bit AVX technology in
    30 \SB{} versus the now legacy 128-bit SSE technology.
     15% In this paper on commodity.
     16% We expose through a toolchain.
     17% We demonstrate what can be achieved with branches etc.
     18% We study various tradeoffs.
     19% Finally we show the benefits can be stacked
    3120
     21In this paper we enable text processing applications to effectively
     22use commodity processors. We introduce Parabix (Parallel Bitstream)
     23technology, a software runtime and execution model that applications
     24to exploit modern SIMD instructions extensions for high performance
     25text processing. Parabix enables the application developer to write
     26constructs assuming unlimited SIMD data parallelism. Our runtime
     27translator generates code based on machine specifics (e.g., SIMD
     28register widths) to realize the programmer specifications.  The key
     29insight into efficient text processing in Parabix is the data
     30organization. It transposes the sequence of 8-byte characters into
     31sets of 8 parallel bit streams which then enables us to operate on
     32multiple characters with a single bit-parallel SIMD operators. We
     33demonstrate the features and efficiency of parabix with a XML parsing
     34application. We evaluate Parabix-based XML parser against two widely
     35used XML parsers, Expat and Apache's Xerces, and across three
     36generations of x86 processors, including the new Intel \SB{}.  We show
     37that Parabix's speedup is 2$\times$--7$\times$ over Expat and
     38Xerces. We observe that Parabix overall makes efficient use of
     39intra-core parallel hardware on commodity processors and supports
     40significant gains in energy. Using Parabix, we assess the scalability
     41advantages of SIMD processor improvements across Intel processor
     42generations, culminating with a look at the latex 256-bit AVX
     43technology in \SB{} versus the now legacy 128-bit SSE technology. As
     44part of this study we also preview the Neon extensions on ARM
     45processors. Finally, we partition the XML program into pipeline stages
     46and demonstrate that thread-level parallelism exploits SIMD units
     47scattered across the different cores and improves performance
     48(2$\times$ on 4 cores) at same energy levels as the single-thread
     49version.
     50
     51
     52
Note: See TracChangeset for help on using the changeset viewer.