source: docs/HPCA2012/final_ieee/01-intro.tex @ 1747

Last change on this file since 1747 was 1747, checked in by cameron, 8 years ago

Abstract/title/intro cleanups.

File size: 8.3 KB
Line 
1\section{Introduction}
2Modern applications ranging from web search to analytics are mainly
3data centric operating large swathes of data. Information expansion
4and diversification of data has resulted in multiple textual storage
5formats. XML is a widely-used text-based data storage format. XML is a
6standard of the web consortium that provides a common framework for
7encoding and communicating data.  It is used in applications ranging
8from Office Open XML in Microsoft Office to NDFD XML of the NOAA
9National Weather Service, from KML in Google Earth to Castor XML in
10the Martian Rovers. To enable these diverse applications we need high
11performance, scalable, and energy efficient text processing stored in
12these XML documents.
13
14%; in server
15%workloads the key focus in on overall transactions per second, while
16%in applications for network switches and cell phones, latency and
17%energy are of paramount importance.
18
19
20Unfortunately, given the limited levels of parallelism that can be
21found in text processing, for example, XML parsing, is inherently
22sequential, it is not clear how this important class of application
23can benefit from the growth in multicore processors. As a widely cited
24Berkeley study~\cite{Asanovic:EECS-2006-183} reports, the ``thirteenth
25dwarf'' (parsers/finite state machines) which process text is
26considered to be the hardest application class to parallelize and
27process efficienctly.  Conventional software-based text parsers have
28many inefficiencies including considerable branch misprediction
29penalties due to complex input-dependent branching structures as well
30as poor use of caches and memory bandwidth due to byte-at-a-time
31processing. ASIC chips that process XML textual data have been around
32since early 2003, but typically lag behind CPUs in technology due to
33cost constraints~\cite{xmlchip}. They also focus mainly on speeding up
34the parser computation itself and are limited by the poor memory
35behaviour.
36
37
38
39%
40% Introduce Parabix.
41%
42
43
44
45
46%However, classical Dennard voltage scaling has reached its limits
47%which gives the traditional byte-at-a-time processing methods little
48%space for further improvement. An alternative is to increase energy
49%efficiency by operating at more optimal core frequencies and achieve
50%better performance with a larger number of cores.
51%~\cite{blake-isca-2010}, in scaling our
52%chips~\cite{esmaeilzadeh-isca-2011}
53
54Recently, we developed a novel representation of
55data~\cite{Cameron2008,CameronLin2009}, Parabix (Parallel bitstreams),
56to aid parsers and text processing tools. Parabix transposes
57byte-oriented character data into parallel bit streams, where each bit
58represents one character from the input data. We explored the use of
59Parabix representation in UTF-8 to UTF-16 conversion and in specific
60internal passes of an XML parser~\cite{cameron-EuroPar2011}.
61
62
63
64
65
66
67%The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
68%which applies a sequential bit scan method, has already shown a
69%substantial improvement on performance. The latest version or the
70%second generation of Parabix XML parser~\cite{Cameron2010} introduced
71%a new idea, parallel bit scan, which provides us a more efficient
72%scanning and better utilization of the resources.
73
74
75\begin{figure}
76\begin{center}
77\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
78\end{center}
79\caption{XML Parser Technology Energy vs. Performance}
80\label{perf-energy}
81\end{figure}
82
83
84
85
86
87
88
89
90
91
92In this paper, we generalize parallel bitstreams and develop the
93Parabix programming framework to help programmers build text
94processing appliations. The programmers specify the operations on
95unbounded character lists using bitstreams in a python environment,
96while our code generation and runtime translate them into low-level
97C++ routines.  The Parabix routines exploit the SIMD extensions on
98commodity processors (SSE/AVX on x86, Neon on ARM) to process hundreds
99of character positions in an input stream simultaneously dramatically
100improving the execution efficiency. We describe the overall Parabix
101tool chain, a novel execution framework and software build environment
102that enables text processing applications to effectively exploit
103commodity multicores.
104
105
106We apply Parabix technology to the problem of XML parsing.
107Figure~\ref{perf-energy} showcases the overall efficiency of our
108framework. The Parabix-XML parser improves the performance %by
109?$\times$ and energy efficiency
110%by ?$\times$ several-fold compared to widely-used software parsers,
111approaching the
112%?$cycles/input-byte$ performance of ASIC XML
113parsers~\cite{xmlchip,DaiNiZhu2010}. The Parabix-XML parser exploits
114the bitstream technology to dramatically reduce branches in the
115parsing routines resulting in a more efficient pipeline. It also
116substantially improves register utilization which minimizes energy
117wasted on cache misses and data transfers. We make the following contributions:
118
119
120\footnote{The actual energy consumption of the XML
121  ASIC chips is not published by the companies.}
122%
123
1241) We outline the Parabix architecture, code-generation tool chain and
125runtime environment and describe how it may be used to produce
126efficient XML parser implementations on a variety of commodity
127processors.  While studied in the context of XML parsing, the Parabix
128framework can be widely applied to many problems in text processing
129and parsing.  We have released Parabix completely open source and are
130interested in exploring the applications that can take advantage of
131our tool chain (\textit{http://parabix.costar.sfu.ca/}).
132
133
1342) We compare the Parabix XML parser against conventional parsers and
135assess the improvement in overall performance and energy efficiency on
136variety of hardware platforms.  We use Parabix to study and
137contrast SSE/AVX extensions across multiple generation of Intel
138processors and show that there are performance challenges when using
139newer generation SIMD extensions. We compare the ARM Neon extensions
140against the x86 SIMD extensions and comment on the latency of SIMD
141operations.
142
1433) Finally, we multithread the Parabix XML parser to enable the different
144stages in the parser to exploit SIMD units across all the cores.
145This further improves performance while maintaining the energy consumption
146comparable with the sequential version.
147
148
149The remainder of this paper is organized as follows.
150Section~\ref{section:background} presents background material on XML
151parsing and provides insight into the inefficiency of traditional
152parsers.  Section~\ref{section:parabix} describes the Parabix
153architecture, tool chain and runtime environment.
154Section~\ref{section:parser} describes the design of an XML parser
155based on the Parabix framework.  Section~\ref{section:baseline}
156presents a detailed performance analysis of Parabix on a
157\CITHREE\ system using hardware performance counters.
158Section~\ref{section:scalability} compares the performance and energy
159efficiency of 128-bit SIMD extensions across three generations of
160Intel processors and includes a comparison with the ARM Cortex-A8
161processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
162AVX technology and comments on the benefits and challenges compared to
163the 128-bit SSE instructions.  Finally,
164Section~\ref{section:multithread} looks at multithreading to exploit
165the SIMD units scattered across multiple cores.
166
167
168
169
170
171
172%One area in which both servers and mobile devices devote considerable
173%computational effort into is in the processing of Extensible Markup
174%Language (XML) documents.  It was predicted that corporate servers
175%would see a ``growth in XML traffic\ldots from 15\% [of overall
176%network traffic] in 2004 to just under 48\% by 2008''
177%\cite{coyle2005}.  Further, ``from the point of view of server
178%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
179%computing workload'' \cite{leventhal2009}.  In other words, XML is the
180%quickly becoming the backbone of most server/server and client/server
181%%information exchanges.  Similarly, there is growing interest in the
182%use of mobile web services for personalization, context-awareness, and
183%content-adaptation of mobile web sites---most of which rely on XML
184%\cite{canali2009}.  Whether the end user realizes it or not, XML is
185%part of their daily life.
186
187%Why are XML parsers important ?
188%Talk about XML parsers and what they do in general.
189%Brief few lines about byte-at-time ?
190%What's new with Parabix style approach ?
191%Introduce Parabix1 and Parabix2 ?
192%Present overall quantiative improvements compared to other parsers.
193
194
195
196
Note: See TracBrowser for help on using the repository browser.