source: docs/HPCA2012/final_ieee/00-abstract.tex @ 1752

Last change on this file since 1752 was 1752, checked in by cameron, 8 years ago

Updates to intro; abstract

File size: 4.8 KB
2% In modern applications text files are employed widely. For example,
3% XML files provide data storage in human readable format and are
4% ubiquitous in applications ranging from database systems to mobile
5% phone SDKs. 
6% Traditional text processing tools are built around a
7% byte-at-a-time processing model where each character token of a
8% document is examined. The byte-at-a-time model is highly challenging
9% for commodity processors. It includes many unpredictable
10% input-dependent branches which cause pipeline squashes and
11% stalls. Furthermore, typical text processing tools perform few
12% operations per character and experience high cache miss
13% rates. Overall, parsing text in important domains like XML processing
14% requires high performance motivating the adoption of custom hardware
15% solutions.
17% % In this paper on commodity.
18% % We expose through a toolchain.
19% % We demonstrate what can be achieved with branches etc.
20% % We study various tradeoffs.
21% % Finally we show the benefits can be stacked
23% In this paper, we enable text processing applications to effectively
24% use commodity processors. We introduce Parabix (Parallel Bit Stream)
25% technology, a software toolchain and execution framework that allows
26% applications to exploit modern SIMD instructions for high performance
27% text processing. Parabix enables the application developer to write
28% constructs assuming unlimited SIMD data parallelism and Parabix's
29% bit stream translator generates code based on machine specifics (e.g.,
30% SIMD register widths).  The key insight into efficient text processing
31% in Parabix is the data organization. Parabix transposes the sequence
32% of character bytes into sets of 8 parallel bit streams which then
33% enables us to operate on multiple characters with bit-parallel SIMD
34% operations. We demonstrate the features and efficiency of Parabix with
35% an XML parsing application. We evaluate the Parabix-based parser
36% against two widely used XML parsers, Expat and Apache's
37% Xerces. Parabix makes efficient use of intra-core SIMD hardware and
38% demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
39% energy efficiency compared to the conventional parsers. We assess the
40% scalability of SIMD implementations across three generations of x86
41% processors including the new \SB{}. We compare the 256-bit AVX
42% technology in Intel \SB{} versus the now legacy 128-bit SSE technology
43% and analyze the benefits and challenges of using the AVX
44% extensions.  Finally, we partition the XML program into pipeline stages
45% and demonstrate that thread-level parallelism enables the application
46% to exploits SIMD units scattered across the different cores and
47% improves performance (2$\times$ on 4 cores) at same energy levels as
48% the single-thread version for the XML application.
50Modern applications employ text files widely for providing data
51storage in readable format for applications ranging from database
52systems to mobile phones. Traditional text processing tools are built
53around a byte-at-a-time sequential processing model, and introduce
54significant branch and cache miss penalties.  Recent work has
55explored an alternative, transposed representation of text, Parabix (Parallel Bit
56Streams), to accelerate scanning and parsing using SIMD facilities.
57This paper further advocates and develops Parabix as a general framework
58and toolkit, describing the software toolchain and run-time support
59that allows applications to exploit modern SIMD instructions for high
60performance text processing.   The goal is to generalize the techniques
61to ensure that they apply across a wide variety of applications
62and architectures.   The toolchain enables the application developer
63to write constructs assuming unbounded character streams and
64Parabix's code translator generates code based on machine
65specifics (e.g., SIMD register widths).   
67The general argument in support of Parabix technology is made by a detailed performance
68and energy study of XML parsing across a range of processor architectures.
69Parabix exploits intra-core SIMD hardware and demonstrates
702$\times$--7$\times$ speedup and 4$\times$ improvement in energy
71efficiency compared to two widely used conventional software parsers,
72Expat and Apache-Xerces. SIMD implementations across three
73generations of x86 processors are studied including the new \SB{}.
74The 256-bit AVX technology in Intel \SB{} is compared with the
75well-established 128-bit
76SSE technology to analyze the benefits and challenges of 3-operand
77instruction formats and wider SIMD hardware.  Finally,
78the XML program is partitioned into pipeline stages to demonstrate
79that thread-level parallelism enables the application to exploit SIMD units scattered
80across the different cores, achieving improved performance (2$\times$ on 4
81cores) at same energy levels as the single-thread version for the XML
Note: See TracBrowser for help on using the repository browser.