source: docs/HPCA2012/00-abstract.tex @ 1349

Last change on this file since 1349 was 1349, checked in by cameron, 8 years ago

Minor edits in abstract/intro

File size: 2.8 KB
Line 
1In modern applications text files are employed widely. For example,
2XML files provide data storage in human readable format and are widely
3used in web services, database systems, and mobile phone SDKs.
4Traditional text processing tools are built around a byte-at-a-time
5processing model where each character token of a document is
6examined. The byte-at-a-time model is highly challenging for commodity
7processors. It includes many unpredictable input-dependent branches
8which cause pipeline squashes and stalls. Furthermore, typical text
9processing tools perform few operations per processed character and
10experience high cache miss rates when parsing the file. Overall,
11parsing text in important domains like XML processing requires high
12performance motivating hardware designers to adopt customized hardware
13and ASIC solutions.
14
15% In this paper on commodity.
16% We expose through a toolchain.
17% We demonstrate what can be achieved with branches etc.
18% We study various tradeoffs.
19% Finally we show the benefits can be stacked
20
21In this paper, we enable text processing applications to effectively
22use commodity processors. We introduce Parabix (Parallel Bitstream)
23technology, a software runtime and execution model that allows applications
24to exploit modern SIMD instructions extensions for high performance
25text processing. Parabix enables the application developer to write
26constructs assuming unlimited SIMD data parallelism. Our runtime
27translator generates code based on machine specifics (e.g., SIMD
28register widths) to realize the programmer specifications.  The key
29insight into efficient text processing in Parabix is the data
30organization. It transposes the sequence of 8-bit characters into
31sets of 8 parallel bit streams which then enables us to operate on
32multiple characters with single bit-parallel SIMD operators. We
33demonstrate the features and efficiency of parabix with a XML parsing
34application. We evaluate a Parabix-based XML parser against two widely
35used XML parsers, Expat and Apache's Xerces, and across three
36generations of x86 processors, including the new Intel \SB{}.  We show
37that Parabix's speedup is 2$\times$--7$\times$ over Expat and
38Xerces. We observe that Parabix overall makes efficient use of
39intra-core parallel hardware on commodity processors and supports
40significant gains in energy. Using Parabix, we assess the scalability
41advantages of SIMD processor improvements across Intel processor
42generations, culminating with a look at the latex 256-bit AVX
43technology in \SB{} versus the now legacy 128-bit SSE technology. As
44part of this study we also preview the Neon extensions on ARM
45processors. Finally, we partition the XML program into pipeline stages
46and demonstrate that thread-level parallelism exploits SIMD units
47scattered across the different cores and improves performance
48(2$\times$ on 4 cores) at same energy levels as the single-thread
49version.
50
51
52
Note: See TracBrowser for help on using the repository browser.