source: docs/HPCA2012/00-abstract.tex @ 1393

Last change on this file since 1393 was 1393, checked in by ashriram, 8 years ago

Minor bug fixes up to 04

File size: 2.7 KB
[1348]1In modern applications text files are employed widely. For example,
[1360]2XML files provide data storage in human readable format and are
[1393]3ubiquitous in applications ranging from database systems to mobile
4phone SDKs.  Traditional text processing tools are built around a
[1350]5byte-at-a-time processing model where each character token of a
6document is examined. The byte-at-a-time model is highly challenging
7for commodity processors. It includes many unpredictable
8input-dependent branches which cause pipeline squashes and
9stalls. Furthermore, typical text processing tools perform few
[1393]10operations per character and experience high cache miss
[1350]11rates. Overall, parsing text in important domains like XML processing
[1393]12requires high performance motivating the adoption of custom hardware
[1348]15% In this paper on commodity.
16% We expose through a toolchain.
17% We demonstrate what can be achieved with branches etc.
18% We study various tradeoffs.
19% Finally we show the benefits can be stacked
[1349]21In this paper, we enable text processing applications to effectively
[1348]22use commodity processors. We introduce Parabix (Parallel Bitstream)
[1393]23technology, a software toolchain and execution framework that allows
24applications to exploit modern SIMD instructions for high performance
25text processing. Parabix enables the application developer to write
26constructs assuming unlimited SIMD data parallelism and Parabix's
27bitstream translator generates code based on machine specifics (e.g.,
28SIMD register widths).  The key insight into efficient text processing
29in Parabix is the data organization. Parabix transposes the sequence
30of character bytes into sets of 8 parallel bit streams which then
31enables us to operate on multiple characters with bit-parallel SIMD
32operations. We demonstrate the features and efficiency of Parabix with
33an XML parsing application. We evaluate the Parabix-based parser
34against two widely used XML parsers, Expat and Apache's
35Xerces. Parabix makes efficient use of intra-core SIMD hardware and
36demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
37energy efficiency compared to the conventional parsers. We assess the
38scalability of SIMD implementations across three generations of x86
39processors including the new \SB{}. We compare the 256-bit AVX
40technology in Intel \SB{} versus the now legacy 128-bit SSE technology
41and analyze the benefits and challenges of using the AVX
42extensions.  Finally, we partition the XML program into pipeline stages
43and demonstrate that thread-level parallelism enables the application
44to exploits SIMD units scattered across the different cores and
45improves performance (2$\times$ on 4 cores) at same energy levels as
46the single-thread version for the XML application.
Note: See TracBrowser for help on using the repository browser.