source: docs/HPCA2012/01-intro.tex @ 1739

Last change on this file since 1739 was 1691, checked in by lindanl, 8 years ago

Change Introduction

File size: 7.9 KB
3As a result of information expansion and diversification of the data format,
4the demands of high performance and energy efficient text processing are rapidly increasing.
5However, classical Dennard voltage scaling has reached its limits
6which gives the traditional byte-at-a-time processing methods little space
7for further improvement. An alternative is to increase energy efficiency
8by operating at more optimal core frequencies and achieve better performance
9with a larger number of cores. Unfortunately, given the limited levels of parallelism
10that can be found in applications~\cite{blake-isca-2010}, especailly text processing,
11in which, many applications, for example, XML parsing, are sequential by nature,
12it is not certain how many cores can be productively used in scaling our
13chips~\cite{esmaeilzadeh-isca-2011}. In a widely cited Berkeley study~\cite{Asanovic:EECS-2006-183},
14the infamous ``thirteenth dwarf'' (parsers/finite state machines) is considered to be the hardest
15application class to parallelize.
17A new technology, Parabix, was introduced to exploit the SIMD extensions on commodity processors
18to process hundreds of character positions in an input stream simultaneously~\cite{Cameron2008}.
19Parabix first transposes byte-oriented character data into parallel bit streams
20using sophisticated SIMD instructions that enable data elements to be packed into registers.
21With the bit streams, where each bit represents one character from the input data, the text can then
22be processed in parallel within the SIMD registers.
23This improves the overall cache behaviour of the application resulting in significantly
24fewer misses and better utilization.  Parabix also dramatically
25reduces branches in the parsing routines resulting in a more efficient
26pipeline and substantially improves register utilization which
27minimizes energy wasted on data transfers.
29We apply Parabix technology to the problem of XML parsing.  XML is a
30standard of the web consortium that provides a common framework for
31encoding and communicating data.  XML provides critical data storage
32for applications ranging from Office Open XML in Microsoft Office to
33NDFD XML of the NOAA National Weather Service, from KML in Google
34Earth to Castor XML in the Martian Rovers.  XML parsing efficiency is
35important for multiple application areas; in server workloads the key
36focus in on overall transactions per second, while in applications for
37network switches and cell phones, latency and energy are of paramount
38importance.  Conventional software-based XML parsers have many
39inefficiencies including considerable branch misprediction penalties
40due to complex input-dependent branching structures as well as poor
41use of caches and memory bandwidth due to byte-at-a-time
42processing.  XML ASIC chips have been around since early 2003, but typically lag behind CPUs in technology due to
43cost constraints~\cite{xmlchip}. They also focus mainly on speeding up the parser
44computation itself and are limited by the poor memory behaviour.
45Our focus is how much we can improve performance of
46the XML parser on commodity processors with Parabix technology.
48The first generation of Parabix XML parser~\cite{CameronHerdyLin2008},
49which applies a sequential bit scan method, has already shown a
50substantial improvement on performance. The latest version or the
51second generation of Parabix XML parser~\cite{Cameron2010} introduced
52a new idea, parallel bit scan, which provides us a more efficient
53scanning and better utilization of the resources.
60\caption{XML Parser Technology Energy vs. Performance}
65In this paper, We present Parabix tool chain, a novel execution framework
66and software runtime environment that can be used to dramatically improve
67the efficiency of text processing and parsing on commodity processors.
68Figure~\ref{perf-energy} showcases the overall efficiency of our
69framework. The Parabix-XML parser improves the
70performance %by ?$\times$
71and energy efficiency %by ?$\times$
72several-fold compared
73to widely-used software parsers, approaching the
75performance of ASIC XML
77\footnote{The actual energy consumption of the XML
78  ASIC chips is not published by the companies.}
80Overall we make the following contributions:
821) We outline the Parabix architecture, tool chain and runtime
83environment and describe how it may be used to produce efficient
84XML parser implementations on a variety of commodity processors.
85While studied in the context of XML parsing, the Parabix framework
86can be widely applied to many problems in text processing and
87parsing.  We have released Parabix completely open source
88and are interested in exploring the applications that can take
89advantage of our tool chain (\textit{}).
922) We compare the Parabix XML parser against conventional parsers and
93assess the improvement in overall performance and energy efficiency on
94variety of hardware platforms.  We are the first to compare and
95contrast SSE/AVX extensions across multiple generation of Intel
96processors and show that there are performance challenges when using
97newer generation SIMD extensions. We compare the ARM Neon extensions
98against the x86 SIMD extensions and comment on the latency of SIMD
1013) Finally, building on the SIMD parallelism of Parabix technology,
102we multithread the Parabix XML parser to enable the different
103stages in the parser to exploit SIMD units across all the cores.
104This further improves performance while maintaining the energy consumption
105constant with the sequential version.
108The remainder of this paper is organized as follows.
109Section~\ref{section:background} presents background material on XML
110parsing and provides insight into the inefficiency of traditional
111parsers.  Section~\ref{section:parabix} describes the Parabix
112architecture, tool chain and runtime environment.
113Section~\ref{section:parser} describes the design of an XML parser
114based on the Parabix framework. Section~\ref{section:methodology} describes the evaluation framework and Section~\ref{section:baseline}
115presents a detailed performance analysis of Parabix on a
116\CITHREE\ system using hardware performance counters.
117Section~\ref{section:scalability} compares the performance and energy
118efficiency of 128-bit SIMD extensions across three generations of
119Intel processors and includes a comparison with the ARM Cortex-A8
120processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
121AVX technology and comments on the benefits and challenges compared to
122the 128-bit SSE instructions.  Finally,
123Section~\ref{section:multithread} looks at the multithreading of the
124Parabix XML parser which seeks to exploit the SIMD units scattered
125across multiple cores.
132%One area in which both servers and mobile devices devote considerable
133%computational effort into is in the processing of Extensible Markup
134%Language (XML) documents.  It was predicted that corporate servers
135%would see a ``growth in XML traffic\ldots from 15\% [of overall
136%network traffic] in 2004 to just under 48\% by 2008''
137%\cite{coyle2005}.  Further, ``from the point of view of server
138%efficiency[,] XML\ldots is the closest thing there is to a ubiquitous
139%computing workload'' \cite{leventhal2009}.  In other words, XML is the
140%quickly becoming the backbone of most server/server and client/server
141%%information exchanges.  Similarly, there is growing interest in the
142%use of mobile web services for personalization, context-awareness, and
143%content-adaptation of mobile web sites---most of which rely on XML
144%\cite{canali2009}.  Whether the end user realizes it or not, XML is
145%part of their daily life.
147%Why are XML parsers important ?
148%Talk about XML parsers and what they do in general.
149%Brief few lines about byte-at-time ?
150%What's new with Parabix style approach ?
151%Introduce Parabix1 and Parabix2 ?
152%Present overall quantiative improvements compared to other parsers.
Note: See TracBrowser for help on using the repository browser.