source: docs/HPCA2012/final_ieee/01-intro.tex @ 1768

Last change on this file since 1768 was 1768, checked in by ashriram, 8 years ago

Version sent to Martha

File size: 8.0 KB
Line 
1\section{Introduction}
2Modern applications ranging from web search to analytics are mainly
3data centric operating over large swaths of data. Information expansion
4and diversification of data has resulted in multiple textual storage
5formats.  Of these, XML is one of the most widely used standards, providing
6a common framework for encoding and communicating data. 
7It is used in applications ranging
8from Office Open XML in Microsoft Office to NDFD XML of the NOAA
9National Weather Service, from KML in Google Earth to Castor XML in
10the Martian Rovers. To enable these diverse applications we need high
11performance, scalable, and energy efficient processing techniques
12for textual data in general, and XML, in particular.
13
14%; in server
15%workloads the key focus in on overall transactions per second, while
16%in applications for network switches and cell phones, latency and
17%energy are of paramount importance.
18
19
20Unfortunately, given the limited levels of parallelism that can be
21found in traditional text processing, it is not clear how this important
22class of application
23can benefit from the growth in multicore processors. As a widely cited
24Berkeley study~\cite{Asanovic:EECS-2006-183} reports, text processing
25applications represented by the ``thirteenth
26dwarf'' (parsers/finite state machines) are
27considered to be the hardest application class to parallelize and
28process efficiently.  Conventional software-based text parsers have
29many inefficiencies including considerable branch misprediction
30penalties due to complex input-dependent branching structures as well
31as poor use of caches and memory bandwidth due to byte-at-a-time
32processing. ASIC chips that process XML textual data have been around
33since early 2003, but typically lag behind CPUs in technology due to
34cost constraints~\cite{xmlchip}
35%They also focus mainly on speeding up
36%the computational work alone, without addressing problems of poor memory
37%behaviour.
38
39%
40% Introduce Parabix.
41%
42
43Parallel bit stream (Parabix) technology is a promising new approach
44for high performance text processing. The key insight is based on the transposition of
45byte-oriented character data into parallel bit streams (each with one
46bit per input byte) which permits text processing to exploit SIMD
47operations on modern processors. Our earlier work on inductive doubling
48instructions~\cite{CameronLin2009} discusses effective techniques to
49transform the text into the Parabix representation.  We
50have used Parabix to accelerate UTF-8 to UTF-16 transcoding
51\cite{Cameron2008}, string matching in protein identification
52\cite{JMBE:31@99}, and specific parts of a traditional
53recursive-descent XML parser~\cite{cameron-EuroPar2011}.
54
55
56
57
58
59
60%The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
61%which applies a sequential bit scan method, has already shown a
62%substantial improvement on performance. The latest version or the
63%second generation of Parabix XML parser~\cite{Cameron2010} introduced
64%a new idea, parallel bit scan, which provides us a more efficient
65%scanning and better utilization of the resources.
66
67
68\begin{figure}
69\begin{center}
70\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
71\end{center}
72\caption{XML Parser Technology Energy vs. Performance}
73\label{perf-energy}
74\end{figure}
75
76
77In this paper, we generalize parallel bitstreams and develop the
78Parabix programming framework to help programmers build text
79processing appliations. The programmers specify the operations on
80unbounded character lists using bitstreams in a python environment,
81while our code generation and runtime translate them into low-level
82C++ routines.  The Parabix routines exploit the SIMD extensions on
83commodity processors (SSE/AVX on x86, Neon on ARM) to process hundreds
84of character positions in an input stream simultaneously dramatically
85improving the execution efficiency. We describe the overall Parabix
86tool chain, a novel execution framework and a software build environment
87that enables text processing applications to effectively exploit
88commodity multicores.
89
90We study in detail the performance of Parabix technology
91in application to the problem of XML parsing on multiple
92architectures.
93Figure~\ref{perf-energy} showcases the overall efficiency of our
94framework, dramatically improving both performance and
95energy efficiency. The Parabix-XML parser exploits
96the bitstream technology to dramatically reduce branches in the
97parsing routines resulting in a more efficient pipeline. It also
98substantially improves register utilization which minimizes energy
99wasted on cache misses and data transfers.\footnote{The actual energy consumption of the XML
100  ASIC chips is not published by the companies.} 
101
102We make the following contributions:
103%
104
1051) We outline the Parabix architecture, code-generation tool chain and
106runtime environment and describe how it may be used to produce
107efficient XML parser implementations on a variety of commodity
108processors.  While studied in the context of XML parsing, the Parabix
109framework can be widely applied to many problems in text processing
110and parsing.  We have released Parabix as open source and are
111interested in exploring the applications that can take advantage of
112our tool chain (\textit{http://parabix.costar.sfu.ca/}).
113
114
1152) We compare the Parabix XML parser against conventional parsers and
116assess the improvement in overall performance and energy efficiency on
117variety of hardware platforms.  We use Parabix to study and
118contrast SSE/AVX extensions across multiple generation of Intel
119processors and show that there are performance challenges when using
120newer generation SIMD extensions. We compare the ARM Neon extensions
121against the x86 SIMD extensions and comment on the latency of SIMD
122operations.
123
1243) Finally, we multithread the Parabix XML parser to enable the different
125stages in the parser to exploit SIMD units across all the cores.
126This further improves performance while maintaining the energy consumption
127comparable with the sequential version.
128
129
130The remainder of this paper is organized as follows.
131Section~\ref{section:background} presents background material on XML
132parsing and provides insight into the inefficiency of traditional
133parsers.  Section~\ref{section:parabix} describes the Parabix
134architecture, tool chain and runtime environment.
135Section~\ref{section:parser} describes the design of an XML parser
136based on the Parabix framework.  Section~\ref{section:baseline}
137presents a detailed performance analysis of Parabix on a
138\CITHREE\ system using hardware performance counters.
139Section~\ref{section:scalability} compares the performance and energy
140efficiency of 128-bit SIMD extensions across three generations of
141Intel processors and includes a comparison with the ARM Cortex-A8
142processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
143AVX technology and comments on the benefits and challenges compared to
144the 128-bit SSE instructions.  Finally,
145Section~\ref{section:multithread} looks at multithreading to exploit
146the SIMD units scattered across multiple cores.
147
148
149
150
151
152
153%One area in which both servers and mobile devices devote considerable
154%computational effort into is in the processing of Extensible Markup
155%Language (XML) documents.  It was predicted that corporate servers
156%would see a ``growth in XML traffic\ldots from 15\% [of overall
157%network traffic] in 2004 to just under 48\% by 2008''
158%\cite{coyle2005}.  Further, ``from the point of view of server
159%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
160%computing workload'' \cite{leventhal2009}.  In other words, XML is the
161%quickly becoming the backbone of most server/server and client/server
162%%information exchanges.  Similarly, there is growing interest in the
163%use of mobile web services for personalization, context-awareness, and
164%content-adaptation of mobile web sites---most of which rely on XML
165%\cite{canali2009}.  Whether the end user realizes it or not, XML is
166%part of their daily life.
167
168%Why are XML parsers important ?
169%Talk about XML parsers and what they do in general.
170%Brief few lines about byte-at-time ?
171%What's new with Parabix style approach ?
172%Introduce Parabix1 and Parabix2 ?
173%Present overall quantiative improvements compared to other parsers.
174
175
176
177
Note: See TracBrowser for help on using the repository browser.