Ignore:
Timestamp:
Aug 30, 2011, 10:47:59 AM (8 years ago)
Author:
ashriram
Message:

Minor bug fixes up to 04

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/00-abstract.tex

    r1360 r1393  
    11In modern applications text files are employed widely. For example,
    22XML files provide data storage in human readable format and are
    3 ubiquitous in applications ranging from database systems to mobile phone
    4 SDKs.  Traditional text processing tools are built around a
     3ubiquitous in applications ranging from database systems to mobile
     4phone SDKs.  Traditional text processing tools are built around a
    55byte-at-a-time processing model where each character token of a
    66document is examined. The byte-at-a-time model is highly challenging
     
    88input-dependent branches which cause pipeline squashes and
    99stalls. Furthermore, typical text processing tools perform few
    10 operations per processed character and experience high cache miss
     10operations per character and experience high cache miss
    1111rates. Overall, parsing text in important domains like XML processing
    12 requires high performance motivating hardware designers to adopt ASIC
     12requires high performance motivating the adoption of custom hardware
    1313solutions.
    1414
     
    2121In this paper, we enable text processing applications to effectively
    2222use commodity processors. We introduce Parabix (Parallel Bitstream)
    23 technology, a software toolkit and execution framework that allows
    24 applications to exploit modern SIMD instructions extensions for high
    25 performance text processing. Parabix enables the application developer
    26 to write constructs assuming unlimited SIMD data parallelism and
    27 Parabix's bitstream translator generates code based on machine specifics
    28 (e.g., SIMD register widths).  The key insight into efficient text
    29 processing in Parabix is the data organization. Parabix transposes the
    30 sequence of character bytes into sets of 8 parallel bit streams which
    31 then enables us to operate on multiple characters with single
    32 bit-parallel SIMD operators. We demonstrate the features and
    33 efficiency of Parabix with a XML parsing application. We evaluate a
    34 Parabix-based XML parser against two widely used XML parsers, Expat
    35 and Apache's Xerces, and across three generations of x86 processors,
    36 including the new Intel \SB{}.  We show that Parabix's speedup is
    37 2$\times$--7$\times$ over Expat and Xerces. We observe that Parabix
    38 overall makes efficient use of intra-core parallel hardware on
    39 commodity processors and supports significant gains in energy. Using
    40 Parabix, we assess the scalability advantages of SIMD processor
    41 improvements across Intel processor generations, culminating with a
    42 look at the latest 256-bit AVX technology in \SB{} versus the now
    43 legacy 128-bit SSE technology. We also examine Parabix on mobile platforms
    44 using ARM processors with Neon SIMD extensions.  Finally, we partition the XML
    45 program into pipeline stages and demonstrate that thread-level
    46 parallelism exploits SIMD units scattered across the different cores
    47 and improves performance (2$\times$ on 4 cores) at same energy levels
    48 as the single-thread version.
     23technology, a software toolchain and execution framework that allows
     24applications to exploit modern SIMD instructions for high performance
     25text processing. Parabix enables the application developer to write
     26constructs assuming unlimited SIMD data parallelism and Parabix's
     27bitstream translator generates code based on machine specifics (e.g.,
     28SIMD register widths).  The key insight into efficient text processing
     29in Parabix is the data organization. Parabix transposes the sequence
     30of character bytes into sets of 8 parallel bit streams which then
     31enables us to operate on multiple characters with bit-parallel SIMD
     32operations. We demonstrate the features and efficiency of Parabix with
     33an XML parsing application. We evaluate the Parabix-based parser
     34against two widely used XML parsers, Expat and Apache's
     35Xerces. Parabix makes efficient use of intra-core SIMD hardware and
     36demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
     37energy efficiency compared to the conventional parsers. We assess the
     38scalability of SIMD implementations across three generations of x86
     39processors including the new \SB{}. We compare the 256-bit AVX
     40technology in Intel \SB{} versus the now legacy 128-bit SSE technology
     41and analyze the benefits and challenges of using the AVX
     42extensions.  Finally, we partition the XML program into pipeline stages
     43and demonstrate that thread-level parallelism enables the application
     44to exploits SIMD units scattered across the different cores and
     45improves performance (2$\times$ on 4 cores) at same energy levels as
     46the single-thread version for the XML application.
    4947
    5048
Note: See TracChangeset for help on using the changeset viewer.