source: docs/HPCA2012/final_ieee/01-intro.tex @ 1752

Last change on this file since 1752 was 1752, checked in by cameron, 8 years ago

Updates to intro; abstract

File size: 8.2 KB
Line 
1\section{Introduction}
2Modern applications ranging from web search to analytics are mainly
3data centric operating over large swaths of data. Information expansion
4and diversification of data has resulted in multiple textual storage
5formats.  Of these, XML is one of the most widely used standards, providing
6a common framework for encoding and communicating data. 
7It is used in applications ranging
8from Office Open XML in Microsoft Office to NDFD XML of the NOAA
9National Weather Service, from KML in Google Earth to Castor XML in
10the Martian Rovers. To enable these diverse applications we need high
11performance, scalable, and energy efficient processing techniques
12for textual data in general, and XML, in particular.
13
14%; in server
15%workloads the key focus in on overall transactions per second, while
16%in applications for network switches and cell phones, latency and
17%energy are of paramount importance.
18
19
20Unfortunately, given the limited levels of parallelism that can be
21found in traditional text processing, it is not clear how this important
22class of application
23can benefit from the growth in multicore processors. As a widely cited
24Berkeley study~\cite{Asanovic:EECS-2006-183} reports, text processing
25applications represented by the ``thirteenth
26dwarf'' (parsers/finite state machines) are
27considered to be the hardest application class to parallelize and
28process efficiently.  Conventional software-based text parsers have
29many inefficiencies including considerable branch misprediction
30penalties due to complex input-dependent branching structures as well
31as poor use of caches and memory bandwidth due to byte-at-a-time
32processing. ASIC chips that process XML textual data have been around
33since early 2003, but typically lag behind CPUs in technology due to
34cost constraints~\cite{xmlchip}
35%They also focus mainly on speeding up
36%the computational work alone, without addressing problems of poor memory
37%behaviour.
38
39%
40% Introduce Parabix.
41%
42
43Parallel bit stream (Parabix) technology is a promising new approach
44for high performance text processing taking advantage of the SIMD
45capabilities of commodity processors.   Based on the transposition
46of byte-oriented character data into parallel bit streams each
47with one bit per input byte,
48first-generation Parabix technology has been
49applied to accelerate UTF-8 to UTF-16 transcoding \cite{Cameron2008} as
50well as exact string matching in protein identification \cite{JMBE:31@99}.
51It has also been applied to the problem of XML parsing using
52a traditional recursive-descent parser accelerated with
53sequential bit scans.  Most recently, the foundation of
54second-generation Parabix technology and the toolchain
55described in this paper has been established with the
56introduction of a parallel scanning primitive to replace
57sequential bit scans \cite{cameron-EuroPar2011}.
58
59
60
61
62
63
64%The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
65%which applies a sequential bit scan method, has already shown a
66%substantial improvement on performance. The latest version or the
67%second generation of Parabix XML parser~\cite{Cameron2010} introduced
68%a new idea, parallel bit scan, which provides us a more efficient
69%scanning and better utilization of the resources.
70
71
72\begin{figure}
73\begin{center}
74\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
75\end{center}
76\caption{XML Parser Technology Energy vs. Performance}
77\label{perf-energy}
78\end{figure}
79
80
81In this paper, we generalize parallel bitstreams and develop the
82Parabix programming framework to help programmers build text
83processing appliations. The programmers specify the operations on
84unbounded character lists using bitstreams in a python environment,
85while our code generation and runtime translate them into low-level
86C++ routines.  The Parabix routines exploit the SIMD extensions on
87commodity processors (SSE/AVX on x86, Neon on ARM) to process hundreds
88of character positions in an input stream simultaneously dramatically
89improving the execution efficiency. We describe the overall Parabix
90tool chain, a novel execution framework and a software build environment
91that enables text processing applications to effectively exploit
92commodity multicores.
93
94We study in detail the performance of Parabix technology
95in application to the problem of XML parsing on multiple
96architectures.
97Figure~\ref{perf-energy} showcases the overall efficiency of our
98framework, dramatically improving both performance and
99energy efficiency. The Parabix-XML parser exploits
100the bitstream technology to dramatically reduce branches in the
101parsing routines resulting in a more efficient pipeline. It also
102substantially improves register utilization which minimizes energy
103wasted on cache misses and data transfers.\footnote{The actual energy consumption of the XML
104  ASIC chips is not published by the companies.} 
105
106We make the following contributions:
107%
108
1091) We outline the Parabix architecture, code-generation tool chain and
110runtime environment and describe how it may be used to produce
111efficient XML parser implementations on a variety of commodity
112processors.  While studied in the context of XML parsing, the Parabix
113framework can be widely applied to many problems in text processing
114and parsing.  We have released Parabix as open source and are
115interested in exploring the applications that can take advantage of
116our tool chain (\textit{http://parabix.costar.sfu.ca/}).
117
118
1192) We compare the Parabix XML parser against conventional parsers and
120assess the improvement in overall performance and energy efficiency on
121variety of hardware platforms.  We use Parabix to study and
122contrast SSE/AVX extensions across multiple generation of Intel
123processors and show that there are performance challenges when using
124newer generation SIMD extensions. We compare the ARM Neon extensions
125against the x86 SIMD extensions and comment on the latency of SIMD
126operations.
127
1283) Finally, we multithread the Parabix XML parser to enable the different
129stages in the parser to exploit SIMD units across all the cores.
130This further improves performance while maintaining the energy consumption
131comparable with the sequential version.
132
133
134The remainder of this paper is organized as follows.
135Section~\ref{section:background} presents background material on XML
136parsing and provides insight into the inefficiency of traditional
137parsers.  Section~\ref{section:parabix} describes the Parabix
138architecture, tool chain and runtime environment.
139Section~\ref{section:parser} describes the design of an XML parser
140based on the Parabix framework.  Section~\ref{section:baseline}
141presents a detailed performance analysis of Parabix on a
142\CITHREE\ system using hardware performance counters.
143Section~\ref{section:scalability} compares the performance and energy
144efficiency of 128-bit SIMD extensions across three generations of
145Intel processors and includes a comparison with the ARM Cortex-A8
146processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
147AVX technology and comments on the benefits and challenges compared to
148the 128-bit SSE instructions.  Finally,
149Section~\ref{section:multithread} looks at multithreading to exploit
150the SIMD units scattered across multiple cores.
151
152
153
154
155
156
157%One area in which both servers and mobile devices devote considerable
158%computational effort into is in the processing of Extensible Markup
159%Language (XML) documents.  It was predicted that corporate servers
160%would see a ``growth in XML traffic\ldots from 15\% [of overall
161%network traffic] in 2004 to just under 48\% by 2008''
162%\cite{coyle2005}.  Further, ``from the point of view of server
163%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
164%computing workload'' \cite{leventhal2009}.  In other words, XML is the
165%quickly becoming the backbone of most server/server and client/server
166%%information exchanges.  Similarly, there is growing interest in the
167%use of mobile web services for personalization, context-awareness, and
168%content-adaptation of mobile web sites---most of which rely on XML
169%\cite{canali2009}.  Whether the end user realizes it or not, XML is
170%part of their daily life.
171
172%Why are XML parsers important ?
173%Talk about XML parsers and what they do in general.
174%Brief few lines about byte-at-time ?
175%What's new with Parabix style approach ?
176%Introduce Parabix1 and Parabix2 ?
177%Present overall quantiative improvements compared to other parsers.
178
179
180
181
Note: See TracBrowser for help on using the repository browser.