source: docs/HPCA2012/00-abstract.tex @ 4490

Last change on this file since 4490 was 1731, checked in by ashriram, 8 years ago

test commit

File size: 4.4 KB
Line 
1
2% In modern applications text files are employed widely. For example,
3% XML files provide data storage in human readable format and are
4% ubiquitous in applications ranging from database systems to mobile
5% phone SDKs. 
6% Traditional text processing tools are built around a
7% byte-at-a-time processing model where each character token of a
8% document is examined. The byte-at-a-time model is highly challenging
9% for commodity processors. It includes many unpredictable
10% input-dependent branches which cause pipeline squashes and
11% stalls. Furthermore, typical text processing tools perform few
12% operations per character and experience high cache miss
13% rates. Overall, parsing text in important domains like XML processing
14% requires high performance motivating the adoption of custom hardware
15% solutions.
16%
17% % In this paper on commodity.
18% % We expose through a toolchain.
19% % We demonstrate what can be achieved with branches etc.
20% % We study various tradeoffs.
21% % Finally we show the benefits can be stacked
22%
23% In this paper, we enable text processing applications to effectively
24% use commodity processors. We introduce Parabix (Parallel Bit Stream)
25% technology, a software toolchain and execution framework that allows
26% applications to exploit modern SIMD instructions for high performance
27% text processing. Parabix enables the application developer to write
28% constructs assuming unlimited SIMD data parallelism and Parabix's
29% bit stream translator generates code based on machine specifics (e.g.,
30% SIMD register widths).  The key insight into efficient text processing
31% in Parabix is the data organization. Parabix transposes the sequence
32% of character bytes into sets of 8 parallel bit streams which then
33% enables us to operate on multiple characters with bit-parallel SIMD
34% operations. We demonstrate the features and efficiency of Parabix with
35% an XML parsing application. We evaluate the Parabix-based parser
36% against two widely used XML parsers, Expat and Apache's
37% Xerces. Parabix makes efficient use of intra-core SIMD hardware and
38% demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
39% energy efficiency compared to the conventional parsers. We assess the
40% scalability of SIMD implementations across three generations of x86
41% processors including the new \SB{}. We compare the 256-bit AVX
42% technology in Intel \SB{} versus the now legacy 128-bit SSE technology
43% and analyze the benefits and challenges of using the AVX
44% extensions.  Finally, we partition the XML program into pipeline stages
45% and demonstrate that thread-level parallelism enables the application
46% to exploits SIMD units scattered across the different cores and
47% improves performance (2$\times$ on 4 cores) at same energy levels as
48% the single-thread version for the XML application.
49
50
51Traditional text processing tools are built around a byte-at-a-time
52sequential processing model, which is hard to parallelize without special hardware.
53However, Parabix (Parallel Bit Stream) technology
54enables text processing applications to effectively use commodity processors.
55In this paper, we generalize Parabix into a software toolchain and execution
56framework that allows applications to exploit modern SIMD instructions for high
57performance text processing. This toolchain enables the application developer
58to write constructs assuming unlimited SIMD data parallelism and Parabix's
59bit stream translator generates code based on machine specifics (e.g.,
60SIMD register widths). We demonstrate the features and efficiency of Parabix with
61an XML parsing application. We evaluate the Parabix-based parser
62against two widely used XML parsers, Expat and Apache's
63Xerces. Parabix makes efficient use of intra-core SIMD hardware and
64demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
65energy efficiency compared to the conventional parsers. We assess the
66scalability of SIMD implementations across three generations of x86
67processors including the new \SB{}. We compare the 256-bit AVX
68technology in Intel \SB{} versus the now legacy 128-bit SSE technology
69and analyze the benefits and challenges of using the AVX
70extensions.  Finally, we partition the XML program into pipeline stages
71and demonstrate that thread-level parallelism enables the application
72to exploits SIMD units scattered across the different cores and
73improves performance (2$\times$ on 4 cores) at same energy levels as
74the single-thread version for the XML application.
Note: See TracBrowser for help on using the repository browser.