source: docs/HPCA2012/00-abstract.tex @ 1726

Last change on this file since 1726 was 1726, checked in by lindanl, 7 years ago

shorten abstract for HPCA

File size: 4.4 KB
Line 
1% In modern applications text files are employed widely. For example,
2% XML files provide data storage in human readable format and are
3% ubiquitous in applications ranging from database systems to mobile
4% phone SDKs. 
5% Traditional text processing tools are built around a
6% byte-at-a-time processing model where each character token of a
7% document is examined. The byte-at-a-time model is highly challenging
8% for commodity processors. It includes many unpredictable
9% input-dependent branches which cause pipeline squashes and
10% stalls. Furthermore, typical text processing tools perform few
11% operations per character and experience high cache miss
12% rates. Overall, parsing text in important domains like XML processing
13% requires high performance motivating the adoption of custom hardware
14% solutions.
15%
16% % In this paper on commodity.
17% % We expose through a toolchain.
18% % We demonstrate what can be achieved with branches etc.
19% % We study various tradeoffs.
20% % Finally we show the benefits can be stacked
21%
22% In this paper, we enable text processing applications to effectively
23% use commodity processors. We introduce Parabix (Parallel Bit Stream)
24% technology, a software toolchain and execution framework that allows
25% applications to exploit modern SIMD instructions for high performance
26% text processing. Parabix enables the application developer to write
27% constructs assuming unlimited SIMD data parallelism and Parabix's
28% bit stream translator generates code based on machine specifics (e.g.,
29% SIMD register widths).  The key insight into efficient text processing
30% in Parabix is the data organization. Parabix transposes the sequence
31% of character bytes into sets of 8 parallel bit streams which then
32% enables us to operate on multiple characters with bit-parallel SIMD
33% operations. We demonstrate the features and efficiency of Parabix with
34% an XML parsing application. We evaluate the Parabix-based parser
35% against two widely used XML parsers, Expat and Apache's
36% Xerces. Parabix makes efficient use of intra-core SIMD hardware and
37% demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
38% energy efficiency compared to the conventional parsers. We assess the
39% scalability of SIMD implementations across three generations of x86
40% processors including the new \SB{}. We compare the 256-bit AVX
41% technology in Intel \SB{} versus the now legacy 128-bit SSE technology
42% and analyze the benefits and challenges of using the AVX
43% extensions.  Finally, we partition the XML program into pipeline stages
44% and demonstrate that thread-level parallelism enables the application
45% to exploits SIMD units scattered across the different cores and
46% improves performance (2$\times$ on 4 cores) at same energy levels as
47% the single-thread version for the XML application.
48
49
50Traditional text processing tools are built around a byte-at-a-time
51sequential processing model, which is hard to parallelize without special hardware.
52However, Parabix (Parallel Bit Stream) technology
53enables text processing applications to effectively use commodity processors.
54In this paper, we generalize Parabix into a software toolchain and execution
55framework that allows applications to exploit modern SIMD instructions for high
56performance text processing. This toolchain enables the application developer
57to write constructs assuming unlimited SIMD data parallelism and Parabix's
58bit stream translator generates code based on machine specifics (e.g.,
59SIMD register widths). We demonstrate the features and efficiency of Parabix with
60an XML parsing application. We evaluate the Parabix-based parser
61against two widely used XML parsers, Expat and Apache's
62Xerces. Parabix makes efficient use of intra-core SIMD hardware and
63demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
64energy efficiency compared to the conventional parsers. We assess the
65scalability of SIMD implementations across three generations of x86
66processors including the new \SB{}. We compare the 256-bit AVX
67technology in Intel \SB{} versus the now legacy 128-bit SSE technology
68and analyze the benefits and challenges of using the AVX
69extensions.  Finally, we partition the XML program into pipeline stages
70and demonstrate that thread-level parallelism enables the application
71to exploits SIMD units scattered across the different cores and
72improves performance (2$\times$ on 4 cores) at same energy levels as
73the single-thread version for the XML application.
Note: See TracBrowser for help on using the repository browser.