source: docs/HPCA2012/01-intro.tex @ 2270

Last change on this file since 2270 was 1743, checked in by ashriram, 8 years ago

First pass final version [ashriram]

File size: 7.9 KB
3As a result of information expansion and diversification of the data
4format, the demand for high performance and energy efficient text
5processing is rapidly rising. A widely-used text-based data storage
6format is XML. XML is a standard of the web consortium that provides a
7common framework for encoding and communicating data.  XML is used in
8applications ranging from Office Open XML in Microsoft Office to NDFD
9XML of the NOAA National Weather Service, from KML in Google Earth to
10Castor XML in the Martian Rovers. In a widely cited Berkeley
11study~\cite{Asanovic:EECS-2006-183}, the ``thirteenth dwarf''
12(parsers/finite state machines) which processes text is considered to be
13the hardest application class to parallelize.
16Given the limited levels of parallelism that can be found in text
17processing, for example, XML parsing, is inherently sequential, it is
18not certain how many cores can be productively used.  Conventional
19software-based XML parsers have many inefficiencies including
20considerable branch misprediction penalties due to complex
21input-dependent branching structures as well as poor use of caches and
22memory bandwidth due to byte-at-a-time processing. XML ASIC chips have
23been around since early 2003, but typically lag behind CPUs in
24technology due to cost constraints~\cite{xmlchip}. They also focus
25mainly on speeding up the parser computation itself and are limited by
26the poor memory behaviour.
29; in server
30workloads the key focus in on overall transactions per second, while
31in applications for network switches and cell phones, latency and
32energy are of paramount importance.
38% Introduce Parabix.
44%However, classical Dennard voltage scaling has reached its limits
45%which gives the traditional byte-at-a-time processing methods little
46%space for further improvement. An alternative is to increase energy
47%efficiency by operating at more optimal core frequencies and achieve
48%better performance with a larger number of cores.
49%~\cite{blake-isca-2010}, in scaling our
52A new technology, Parabix, was introduced to exploit the SIMD
53extensions on commodity processors to process hundreds of character
54positions in an input stream simultaneously~\cite{Cameron2008}.
55Parabix first transposes byte-oriented character data into parallel
56bit streams using sophisticated SIMD instructions that enable data
57elements to be packed into registers.  With the bit streams, where
58each bit represents one character from the input data, the text can
59then be processed in parallel within the SIMD registers.  This
60improves the overall cache behaviour of the application resulting in
61significantly fewer misses and better utilization.  Parabix also
62dramatically reduces branches in the parsing routines resulting in a
63more efficient pipeline and substantially improves register
64utilization which minimizes energy wasted on data transfers.
66We apply Parabix technology to the problem of XML parsing.
67Our focus is how much we can improve performance of
68the XML parser on commodity processors with Parabix technology.
70%The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
71%which applies a sequential bit scan method, has already shown a
72%substantial improvement on performance. The latest version or the
73%second generation of Parabix XML parser~\cite{Cameron2010} introduced
74%a new idea, parallel bit scan, which provides us a more efficient
75%scanning and better utilization of the resources.
82\caption{XML Parser Technology Energy vs. Performance}
87In this paper, We present Parabix tool chain, a novel execution framework
88and software runtime environment that can be used to dramatically improve
89the efficiency of text processing and parsing on commodity processors.
90Figure~\ref{perf-energy} showcases the overall efficiency of our
91framework. The Parabix-XML parser improves the
92performance %by ?$\times$
93and energy efficiency %by ?$\times$
94several-fold compared
95to widely-used software parsers, approaching the
97performance of ASIC XML
99\footnote{The actual energy consumption of the XML
100  ASIC chips is not published by the companies.}
102Overall we make the following contributions:
1041) We outline the Parabix architecture, tool chain and runtime
105environment and describe how it may be used to produce efficient
106XML parser implementations on a variety of commodity processors.
107While studied in the context of XML parsing, the Parabix framework
108can be widely applied to many problems in text processing and
109parsing.  We have released Parabix completely open source
110and are interested in exploring the applications that can take
111advantage of our tool chain (\textit{}).
1142) We compare the Parabix XML parser against conventional parsers and
115assess the improvement in overall performance and energy efficiency on
116variety of hardware platforms.  We are the first to compare and
117contrast SSE/AVX extensions across multiple generation of Intel
118processors and show that there are performance challenges when using
119newer generation SIMD extensions. We compare the ARM Neon extensions
120against the x86 SIMD extensions and comment on the latency of SIMD
1233) Finally, building on the SIMD parallelism of Parabix technology,
124we multithread the Parabix XML parser to enable the different
125stages in the parser to exploit SIMD units across all the cores.
126This further improves performance while maintaining the energy consumption
127constant with the sequential version.
130The remainder of this paper is organized as follows.
131Section~\ref{section:background} presents background material on XML
132parsing and provides insight into the inefficiency of traditional
133parsers.  Section~\ref{section:parabix} describes the Parabix
134architecture, tool chain and runtime environment.
135Section~\ref{section:parser} describes the design of an XML parser
136based on the Parabix framework. Section~\ref{section:methodology} describes the evaluation framework and Section~\ref{section:baseline}
137presents a detailed performance analysis of Parabix on a
138\CITHREE\ system using hardware performance counters.
139Section~\ref{section:scalability} compares the performance and energy
140efficiency of 128-bit SIMD extensions across three generations of
141Intel processors and includes a comparison with the ARM Cortex-A8
142processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
143AVX technology and comments on the benefits and challenges compared to
144the 128-bit SSE instructions.  Finally,
145Section~\ref{section:multithread} looks at the multithreading of the
146Parabix XML parser which seeks to exploit the SIMD units scattered
147across multiple cores.
154%One area in which both servers and mobile devices devote considerable
155%computational effort into is in the processing of Extensible Markup
156%Language (XML) documents.  It was predicted that corporate servers
157%would see a ``growth in XML traffic\ldots from 15\% [of overall
158%network traffic] in 2004 to just under 48\% by 2008''
159%\cite{coyle2005}.  Further, ``from the point of view of server
160%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
161%computing workload'' \cite{leventhal2009}.  In other words, XML is the
162%quickly becoming the backbone of most server/server and client/server
163%%information exchanges.  Similarly, there is growing interest in the
164%use of mobile web services for personalization, context-awareness, and
165%content-adaptation of mobile web sites---most of which rely on XML
166%\cite{canali2009}.  Whether the end user realizes it or not, XML is
167%part of their daily life.
169%Why are XML parsers important ?
170%Talk about XML parsers and what they do in general.
171%Brief few lines about byte-at-time ?
172%What's new with Parabix style approach ?
173%Introduce Parabix1 and Parabix2 ?
174%Present overall quantiative improvements compared to other parsers.
Note: See TracBrowser for help on using the repository browser.