source: docs/HPCA2012/02-background.tex @ 1368

Last change on this file since 1368 was 1368, checked in by ksherdy, 8 years ago

large changes to section 2; minor edits to section 3

File size: 13.5 KB
Line 
1\section{Background}
2\label{section:background}
3
4% This section provides
5
6% Extensible Markup Language (XML) is a core technology standard
7% of the World Wide Web Consortium (W3C) that provides a common
8% framework for encoding and communicating structured information. 
9% In applications ranging from Office Open XML in
10% Microsoft Office to NDFD XML of the NOAA National Weather
11% Service, from KML in Google Earth to Castor XML in the Martian Rovers,
12% from ebXML for e-commerce data interchange to RSS for news feeds
13% from web sites everywhere, XML plays a ubiquitous role in providing
14% a common framework for data interoperability world-wide and beyond.
15% As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
16% "there is essentially no computer in the world, desk-top, hand-held,
17% or back-room, that doesn't process XML sometimes."
18
19
20
21%
22% With this focus on performance however, relatively little attention
23% has been paid on reducing energy consumption in XML processing.  For example, in addressing
24% performance through multicore parallelism, one generally must
25% pay an energy price for performance gains because of the
26% increased processing required for synchronization.   
27% This focus on reduction of energy consumption is a key topic in this
28% paper. We study the energy and performance
29% characteristics of several XML parsers across three generations
30% of x86-64 processor technology.  The parsers we consider are
31% the widely used byte-at-a-time parsers Expat and Xerces, as well the
32% Parabix1 and Parabix2 parsers based on parallel bit stream technology. 
33% A compelling result is that the performance benefits of parallel bit stream technology
34% translate directly and proportionally to substantial energy savings.
35
36
37\subsection{XML}
38% In 1998, the W3C officially adopted XML as a standard for data interchange.
39% Today, XML can be found everywhere from mobile websites and cloud computing to server-side database engines.
40
41Extensible Markup Language (XML) is a core technology standard of the World Wide Web Consortium (W3C);
42it provides a common framework for encoding and communicating structured and semi-structured information. 
43XML can represent virtually any type of information (i.e., content) in a descriptive fashion.
44XML markup encodes a description of an XML document's storage layout and logical structure.
45Because XML is intended to be human-readable, XML markup tags are often verbose by design \cite{TR:XML}.
46For example, Figure \ref{fig:sample_xml} provides a standard product list encapsulated within an XML document.
47All content is highlighted in bold. Anything that is not content is considered markup.
48
49\begin{figure}[h]
50
51{\scriptsize
52\begin{center}
53\begin{tabbing}
54\hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \\
55{\ttfamily{\verb`<`Products\verb`>`}}\+ \\
56{\ttfamily{\verb`<`Product ID=\verb`"`}}{\bfseries\ttfamily{0001}}{\ttfamily{\verb`"`\verb`>`}}\+ \\
57{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{English}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Widget}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
58{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{French}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Bitoniau}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
59{\ttfamily{\verb`<`Company\verb`>`}}{\bfseries\ttfamily{ABC}}{\ttfamily{\verb`<`/Company\verb`>` }}\\
60{\ttfamily{\verb`<`Price\verb`>`}}{\bfseries\ttfamily{\$19.95}}{\ttfamily{\verb`<`/Price\verb`>` }}\- \\
61{\ttfamily{\verb`<`/Product\verb`>` }}\- \\
62{\ttfamily{\verb`<`/Products\verb`>` }}\\
63\end{tabbing}
64\end{center}}
65
66\caption{Sample XML Document}\label{fig:sample_xml}
67\end{figure}
68
69
70
71\subsection{XML Parsers}
72% However, textual data tends to consist of variable-length items in generally unpredictable patterns \cite{Cameron2010}.
73Traditional XML parsers process an XML document sequentially, a single byte-at-a-time,
74from the first to the last character in the source text.
75Each character is examined to distinguish between the XML-specific markup,
76such as an opening angle bracket `\verb`<`', and the content held within the document.
77The character that the parser is currently processing is commonly referred to its {\it cursor position}. As the
78parser moves its cursor through the source document, it alternates between markup scanning / validation and
79content processing operations.
80In other words,
81traditional XML parsers operate as finite-state machines that use byte comparisons to transition
82between data and metadata states. Each state transition indicates the context in which to interpret the
83subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in
84generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until
85deemed otherwise.
86
87A major disadvantage of the sequential byte-at-a-time approach to XML parsing is that processing an XML document
88requires at least one conditional branch per byte of source text.
89For example, Xerces-C, which is the most popular open-source C++ based XML parser and the foundation of the
90Apache XML project \cite{xerces},
91uses a series of nested switch statements and state-dependent flag tests to control the parsing logic of the program.
92Our analysis, which we detail in {\bf SECTION XXX}, found that Xerces-C requires between 6 - 13 branches per byte
93of XML to support this form of control structure (depending on the ratio of markup vs. content).
94Cache utilization is also significantly reduced due to the manner in which markup and content must be scanned and buffered
95for future use.
96For instance, Xerces-C incurs $\sim$100 L1 cache misses per 1000 bytes of XML.
97% Branch mispredictions are known
98% to signficantly degrade XML parsing performance in proportion to the markup density of the source document
99% \cite{CameronHerdyLin2008} (i.e., the ratio of markup vs. content).
100
101
102
103
104
105
106% Switch block logical
107% Branches
108% Cache utilization
109
110% \subsection {Parallel XML Parsing}
111%
112%
113% In general, parallel acceleration methods come in two forms: multithreaded approaches and
114% SIMD-based techniques. Multithreaded XML parsers take advantage of multiple cores via number of strategies.
115% Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09}
116% and speculative p-DFAs \cite{ZhangPanChiu09}.
117
118% To accelerate XML parsing performance, most of the recent
119% work has focused on parallelization through the use of multicore
120% parallelism for chip multiprocessors \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
121
122
123% SIMD XML parsers leverage the SIMD registers to overcome the
124% performance limitations of the sequential byte-at-a-time processing model and its inherently data dependent
125% branch misprediction rates.  Further, data parallel SIMD instructions allow the processor to perform the same
126% operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
127% fall under the SIMD classification. The Parabix parser versions studied are described in further detail in
128% Section \ref{section:parabix}.
129
130% while SIMD (Single Instruction Multiple Data) parallelism
131% has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
132% , as well as to the developers of parallel bit stream technology
133% \cite{CameronHerdyLin2008, Cameron2009, Cameron2010}.
134% Each of these approaches has shown considerable performance
135% benefits over traditional sequential parsing techniques that follow the
136% byte-at-a-time model.
137
138%\subsection {SIMD Operations}
139% Two such SIMD XML parsers, Parabix1 and Parabix2, utilizes parallel bit stream processing technology.
140% Extract section 2.2 and merge into 3.   Add a new subsection
141% in section 2 saying a bit about SIMD.   Say a bit about pure SIMD vertical
142% operations and then mention the pack operations that allow
143% us to implement transposition efficiently in parallel. 
144% Also note that the SIMD registers support bitwise logic across
145% their full width and that this is extensively used in our work.
146% \subsection{Parallel XML Parsing}
147%
148% Parallel XML processing generally comes in one of two forms: multithreading and SIMD. Multithreaded XML parsers take advantage of parallelism by first quickly preparsing the XML file to locate the key markup entities and determine the best workload distribution in which process the XML file using $n$-cores \cite{ZhangPanChiu09}. SIMD XML parsers leverage the SIMD registers to overcome the performance limitations of the sequential paradigm and inherently data dependent branch misprediction rates \cite{Cameron2010}. Two such SIMD XML parsers, Parabix1 and Parabix2, utilizes parallel bit stream processing technology. With this method, byte-oriented character data is first transposed to eight parallel bit streams, one for each bit position within the character code units (bytes). These bit streams are then loaded into SIMD registers of width $W$ (e.g., 64-bit, 128-bit, 256-bit, etc). This allows $W$ consecutive code units to be represented and processed at once. Bitwise logic and shift operations, bit scans, population counts and other bit-based operations are then used to carry out the work in parallel \cite{CameronLin2009}.
149%
150% \subsubsection{Parabix1}
151%
152% Our first generation parallel bit stream XML parser---Parabix1---uses employs a less conventional approach of SIMD technology to represent text in parallel bit streams. Bits of each stream are in one-to-one-correspondence with the bytes of a character stream. A transposition step first transforms sequential byte stream data into eight basis bit streams for the bits of each byte. Bitwise logical combinations of these basis bit streams can then be used to classify bytes in various ways, while the bit scan operations common to commodity processors can be used for fast sequential scanning. At a high level, Parabix1 processes source XML in a functionally equivalent manner as a traditional processor. That is, Parabix1 moves sequentially through the source document, maintaining a single cursor scanning position throughout the parse. However, this scanning operation itself is accelerated significantly which leads to dramatic performance improvements, since bit scan operations can perform up to general register width (32-bit, 64-bit) finite state transitions per clock cycle. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, CameronLin2009, Cameron2010}.
153%
154% \subsubsection{Parabix2}
155%
156% In our second generation XML parser---Parabix2---we address the replacement of sequential parsing using bit scan instructions with a parallel parsing method using bitstream addition. Unlike the single cursor approach of Parabix1 and conceptually of traditional sequential approach, in Parabix2 multiple cursors positions are processed in parallel. To deal with these parallel cursors, three additional categories of bit streams are introduced. Marker bit streams are used to represent positions of interest in the parsing of a source data stream \cite{Cameron2010}. The appearance of a 1 at a position in a marker bitstream could, for example, denote the starting position an XML tag in the data stream. In general, the set of bit positions in a marker bitstream may be considered to be the current parsing positions of multiple parses taking place in parallel throughout the source data stream. A further aspect of the parallel method is that conditional branch statements used to identify syntax error at each each parsing position are eliminated. Instead, error bit streams are used to identify the position of parsing or well-formedness errors during the parsing process. Error positions are gathered and processed in as a final post processing step. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing as well as significanlty reduces branch misprediction penalty.
157
158%
159% --- Not Sure ---
160%
161% The first two parsers employ traditional byte-at-a-time methods of
162% parsing; these parsers were selected based on their popularity in the
163% marketplace and the availability of source code for deeper analysis.
164% The Fluke i410 current clamp is a digital multimeter that reads the
165% magnetic field of a live electrical cable to determine the current
166% passing through it without affecting the underlying hardware.
167
168% \textbf{No Need to talk about resource usage until  later}
169
170% In order to determine how and which performance factors influence energy consumption,
171% we intend to use the Fluke i410 current clamp in conjunction with PMCs to compare the per parser invocation and per source XML byte energy % usage of three XML parsers:
172% Expat 2.0.1, Xerces-C++ 3.1.1 (SAX2), and Parabix2. All three parsers are C/C++ based, event-driven, stream-oriented XML parsers.
173
174% Think the person reading doesn't know much about XML parsers.
175%
176%
177% Need gory details on byte at a time parsers. Pictures.
178% Xerces. Explain overall data flow and control flow for these parsers.
179% Briefly highlight inefficiencies.
180%
181%
182%
183% Talk about the usage of SIMD instructions and how it might help. Lead
184% onto briefly describe the key technology behind parabix.
185
186%
187% --- Section Advice ---
188
189% 1. Think the person reading doesn't know much about XML parsers.
190%
191% 2. Need gory details on byte at a time parsers. Pictures.
192% Xerces. Explain overall dataflow and control flow for these parsers.
193% Briefly highlight inefficiencies.
194%
195% 3. Talk about the usage of SIMD instructions and how it might help. Lead
196% onto briefly describe the key technology behind parabix.
197%
Note: See TracBrowser for help on using the repository browser.