source: docs/HPCA2012/00-abstract.tex @ 1421

Last change on this file since 1421 was 1421, checked in by ksherdy, 8 years ago
File size: 2.7 KB
Line 
1In modern applications text files are employed widely. For example,
2XML files provide data storage in human readable format and are
3ubiquitous in applications ranging from database systems to mobile
4phone SDKs.  Traditional text processing tools are built around a
5byte-at-a-time processing model where each character token of a
6document is examined. The byte-at-a-time model is highly challenging
7for commodity processors. It includes many unpredictable
8input-dependent branches which cause pipeline squashes and
9stalls. Furthermore, typical text processing tools perform few
10operations per character and experience high cache miss
11rates. Overall, parsing text in important domains like XML processing
12requires high performance motivating the adoption of custom hardware
13solutions.
14
15% In this paper on commodity.
16% We expose through a toolchain.
17% We demonstrate what can be achieved with branches etc.
18% We study various tradeoffs.
19% Finally we show the benefits can be stacked
20
21In this paper, we enable text processing applications to effectively
22use commodity processors. We introduce Parabix (Parallel Bit Stream)
23technology, a software toolchain and execution framework that allows
24applications to exploit modern SIMD instructions for high performance
25text processing. Parabix enables the application developer to write
26constructs assuming unlimited SIMD data parallelism and Parabix's
27bit stream translator generates code based on machine specifics (e.g.,
28SIMD register widths).  The key insight into efficient text processing
29in Parabix is the data organization. Parabix transposes the sequence
30of character bytes into sets of 8 parallel bit streams which then
31enables us to operate on multiple characters with bit-parallel SIMD
32operations. We demonstrate the features and efficiency of Parabix with
33an XML parsing application. We evaluate the Parabix-based parser
34against two widely used XML parsers, Expat and Apache's
35Xerces. Parabix makes efficient use of intra-core SIMD hardware and
36demonstrates 2$\times$--7$\times$ speedup and 4$\times$ improvement in
37energy efficiency compared to the conventional parsers. We assess the
38scalability of SIMD implementations across three generations of x86
39processors including the new \SB{}. We compare the 256-bit AVX
40technology in Intel \SB{} versus the now legacy 128-bit SSE technology
41and analyze the benefits and challenges of using the AVX
42extensions.  Finally, we partition the XML program into pipeline stages
43and demonstrate that thread-level parallelism enables the application
44to exploits SIMD units scattered across the different cores and
45improves performance (2$\times$ on 4 cores) at same energy levels as
46the single-thread version for the XML application.
47
48
49
Note: See TracBrowser for help on using the repository browser.