# Changeset 1743 for docs/HPCA2012/final_ieee/00-abstract.tex

Ignore:
Timestamp:
Nov 30, 2011, 11:30:44 AM (8 years ago)
Message:

First pass final version [ashriram]

File:
1 edited

### Legend:

Unmodified
 r1733 % the single-thread version for the XML application. Modern applications employ text files widely for providing data storage in readable format for applications ranging from database systems to mobile phones. Traditional text processing tools are built around a byte-at-a-time sequential processing model, and introduce significant branch and cache miss penalty. Recently researchers have explored a transposed representation of text, Parabix (Parallel Bit Stream),  to improve the efficiency of text processing. Traditional text processing tools are built around a byte-at-a-time sequential processing model, which is hard to parallelize without special hardware. However, Parabix (Parallel Bit Stream) technology enables text processing applications to effectively use commodity processors. In this paper, we generalize Parabix into a software toolchain and execution framework that allows applications to exploit modern SIMD instructions for high performance text processing. This toolchain enables the application developer to write constructs assuming unlimited SIMD data parallelism and Parabix's bit stream translator generates code based on machine specifics (e.g., SIMD register widths). We demonstrate the features and efficiency of Parabix with an XML parsing application. We evaluate the Parabix-based parser against two widely used XML parsers, Expat and Apache's Xerces. Parabix makes efficient use of intra-core SIMD hardware and demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in energy efficiency compared to the conventional parsers. We assess the scalability of SIMD implementations across three generations of x86 processors including the new \SB{}. We compare the 256-bit AVX technology in Intel \SB{} versus the now legacy 128-bit SSE technology and analyze the benefits and challenges of using the AVX extensions.  Finally, we partition the XML program into pipeline stages and demonstrate that thread-level parallelism enables the application to exploits SIMD units scattered across the different cores and improves performance (2$\times$ on 4 cores) at same energy levels as the single-thread version for the XML application. In this paper, we explore a general programming framework based on Parabix and describe the software toolchain and execution framework that allows applications to exploit modern SIMD instructions for high performance text processing. The toolchain enables the application developer to write constructs assuming unbounded characters streams and Parabix's code translator generates code based on machine specifics (e.g., SIMD register widths). We demonstrate the features and efficiency of Parabix with an XML parsing application. Parabix exploits intra-core SIMD hardware and demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in energy efficiency compared to two widely used conventional software parsers, Expat and Apache-Xerces. We study SIMD implementations across three generations of x86 processors including the new \SB{}. We compare the 256-bit AVX technology in Intel \SB{} versus the now legacy 128-bit SSE technology and analyze the benefits and challenges 3-operand instruction formats and wider SIMD hardware.  Finally, we partition the XML program into pipeline stages and demonstrate that thread-level parallelism enables the application to exploits SIMD units scattered across the different cores and improves performance (2$\times$ on 4 cores) at same energy levels as the single-thread version for the XML application.