source: docs/HPCA2012/final_ieee/00-abstract.tex @ 1743

Last change on this file since 1743 was 1743, checked in by ashriram, 8 years ago

First pass final version [ashriram]

File size: 4.5 KB
RevLine 
[1733]1
2% In modern applications text files are employed widely. For example,
3% XML files provide data storage in human readable format and are
4% ubiquitous in applications ranging from database systems to mobile
5% phone SDKs. 
6% Traditional text processing tools are built around a
7% byte-at-a-time processing model where each character token of a
8% document is examined. The byte-at-a-time model is highly challenging
9% for commodity processors. It includes many unpredictable
10% input-dependent branches which cause pipeline squashes and
11% stalls. Furthermore, typical text processing tools perform few
12% operations per character and experience high cache miss
13% rates. Overall, parsing text in important domains like XML processing
14% requires high performance motivating the adoption of custom hardware
15% solutions.
16%
17% % In this paper on commodity.
18% % We expose through a toolchain.
19% % We demonstrate what can be achieved with branches etc.
20% % We study various tradeoffs.
21% % Finally we show the benefits can be stacked
22%
23% In this paper, we enable text processing applications to effectively
24% use commodity processors. We introduce Parabix (Parallel Bit Stream)
25% technology, a software toolchain and execution framework that allows
26% applications to exploit modern SIMD instructions for high performance
27% text processing. Parabix enables the application developer to write
28% constructs assuming unlimited SIMD data parallelism and Parabix's
29% bit stream translator generates code based on machine specifics (e.g.,
30% SIMD register widths).  The key insight into efficient text processing
31% in Parabix is the data organization. Parabix transposes the sequence
32% of character bytes into sets of 8 parallel bit streams which then
33% enables us to operate on multiple characters with bit-parallel SIMD
34% operations. We demonstrate the features and efficiency of Parabix with
35% an XML parsing application. We evaluate the Parabix-based parser
36% against two widely used XML parsers, Expat and Apache's
37% Xerces. Parabix makes efficient use of intra-core SIMD hardware and
38% demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
39% energy efficiency compared to the conventional parsers. We assess the
40% scalability of SIMD implementations across three generations of x86
41% processors including the new \SB{}. We compare the 256-bit AVX
42% technology in Intel \SB{} versus the now legacy 128-bit SSE technology
43% and analyze the benefits and challenges of using the AVX
44% extensions.  Finally, we partition the XML program into pipeline stages
45% and demonstrate that thread-level parallelism enables the application
46% to exploits SIMD units scattered across the different cores and
47% improves performance (2$\times$ on 4 cores) at same energy levels as
48% the single-thread version for the XML application.
49
[1743]50Modern applications employ text files widely for providing data
51storage in readable format for applications ranging from database
52systems to mobile phones. Traditional text processing tools are built
53around a byte-at-a-time sequential processing model, and introduce
54significant branch and cache miss penalty. Recently researchers have
55explored a transposed representation of text, Parabix (Parallel Bit
56Stream),  to improve the efficiency of text processing.
[1733]57
[1743]58In this paper, we explore a general programming framework based on
59Parabix and describe the software toolchain and execution framework
60that allows applications to exploit modern SIMD instructions for high
61performance text processing. The toolchain enables the application
62developer to write constructs assuming unbounded characters streams
63and Parabix's code translator generates code based on machine
64specifics (e.g., SIMD register widths). We demonstrate the features
65and efficiency of Parabix with an XML parsing application. Parabix
66exploits intra-core SIMD hardware and demonstrates
672$\times$--7$\times$ speedup and 4$\times$ improvement in energy
68efficiency compared to two widely used conventional software parsers,
69Expat and Apache-Xerces. We study SIMD implementations across three
70generations of x86 processors including the new \SB{}. We compare the
71256-bit AVX technology in Intel \SB{} versus the now legacy 128-bit
72SSE technology and analyze the benefits and challenges 3-operand
73instruction formats and wider SIMD hardware.  Finally, we partition
74the XML program into pipeline stages and demonstrate that thread-level
75parallelism enables the application to exploits SIMD units scattered
76across the different cores and improves performance (2$\times$ on 4
77cores) at same energy levels as the single-thread version for the XML
78application.
Note: See TracBrowser for help on using the repository browser.