source: docs/HPCA2012/02-background.tex @ 1421

Last change on this file since 1421 was 1414, checked in by ksherdy, 8 years ago

replaced 6 --- 13 with 6 -- 13 and 1000 bytes for kilobyte (kB) for consistency

File size: 13.5 KB
Line 
1\section{Background}
2\label{section:background}
3
4% This section provides
5
6% Extensible Markup Language (XML) is a core technology standard
7% of the World Wide Web Consortium (W3C) that provides a common
8% framework for encoding and communicating structured information. 
9% In applications ranging from Office Open XML in
10% Microsoft Office to NDFD XML of the NOAA National Weather
11% Service, from KML in Google Earth to Castor XML in the Martian Rovers,
12% from ebXML for e-commerce data interchange to RSS for news feeds
13% from web sites everywhere, XML plays a ubiquitous role in providing
14% a common framework for data interoperability world-wide and beyond.
15% As XML 1.0 editor Tim Bray is quoted in the W3C celebration of XML at 10 years,
16% "there is essentially no computer in the world, desk-top, hand-held,
17% or back-room, that doesn't process XML sometimes."
18
19
20
21%
22% With this focus on performance however, relatively little attention
23% has been paid on reducing energy consumption in XML processing.  For example, in addressing
24% performance through multicore parallelism, one generally must
25% pay an energy price for performance gains because of the
26% increased processing required for synchronization.   
27% This focus on reduction of energy consumption is a key topic in this
28% paper. We study the energy and performance
29% characteristics of several XML parsers across three generations
30% of x86-64 processor technology.  The parsers we consider are
31% the widely used byte-at-a-time parsers Expat and Xerces, as well the
32% Parabix1 and Parabix2 parsers based on parallel bit stream technology. 
33% A compelling result is that the performance benefits of parallel bit stream technology
34% translate directly and proportionally to substantial energy savings.
35
36
37\subsection{XML}
38% In 1998, the W3C officially adopted XML as a standard for data interchange.
39% Today, XML can be found everywhere from mobile websites and cloud computing to server-side database engines.
40
41Extensible Markup Language (XML) is a core technology standard of the World Wide Web Consortium (W3C);
42it provides a common framework for encoding and communicating structured and semi-structured information. 
43XML can represent virtually any type of information (i.e., content) in a descriptive fashion.
44XML markup encodes a description of an XML document's storage layout and logical structure.
45Since XML is intended to be human-readable, markup tags are often verbose by design \cite{TR:XML}.
46For example, Figure \ref{fig:sample_xml} provides a standard product list encapsulated within an XML document.
47All content is highlighted in bold. Anything that is not content is considered markup.
48
49\begin{figure}[h]
50
51{\footnotesize
52\begin{center}
53\begin{tabbing}
54\hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \hspace*{1em} \= \\
55{\ttfamily{\verb`<`Products\verb`>`}}\+ \\
56{\ttfamily{\verb`<`Product ID=\verb`"`}}{\bfseries\ttfamily{0001}}{\ttfamily{\verb`"`\verb`>`}}\+ \\
57{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{English}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Widget}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
58{\ttfamily{\verb`<`ProductName Language=\verb`"`}}{\bfseries\ttfamily{French}}{\ttfamily{\verb`"`\verb`>`}}{\bfseries\ttfamily{Bitoniau}}{\ttfamily{\verb`<`/ProductName\verb`>` }}\\
59{\ttfamily{\verb`<`Company\verb`>`}}{\bfseries\ttfamily{ABC}}{\ttfamily{\verb`<`/Company\verb`>` }}\\
60{\ttfamily{\verb`<`Price\verb`>`}}{\bfseries\ttfamily{\$19.95}}{\ttfamily{\verb`<`/Price\verb`>` }}\- \\
61{\ttfamily{\verb`<`/Product\verb`>` }}\- \\
62{\ttfamily{\verb`<`/Products\verb`>` }}\\
63\end{tabbing}
64\end{center}}
65
66\caption{Sample XML Document}\label{fig:sample_xml}
67\end{figure}
68
69
70
71\subsection{XML Parsers}
72% However, textual data tends to consist of variable-length items in generally unpredictable patterns \cite{Cameron2010}.
73Traditional XML parsers process an XML document sequentially, a single
74byte-at-a-time, from the first to the last character in the source
75text.  Each character is examined to distinguish between the
76XML-specific markup, such as an left angle bracket `\verb`<`', and the
77content held within the document.  The character that the parser is
78currently interpreting is commonly referred to its {\it cursor}. As
79the parser moves its cursor through the document, it alternates
80between markup scanning, validation, and content processing
81operations.  In other words, traditional XML parsers operate as
82finite-state machines that use byte comparisons to transition between
83data and metadata states. Each state transition indicates the context
84for subsequent characters. Unfortunately, textual data tends to
85consist of variable-length items sequenced in generally unpredictable
86patterns; thus any character could be a state transition until deemed
87otherwise.
88
89A major disadvantage of the sequential byte-at-a-time approach to XML
90parsing is that processing an XML document requires at least one
91conditional branch per byte of source text.  For example, Xerces-C,
92which forms the foundation for widely deployed the Apache XML project
93\cite{xerces}, uses a series of nested switch statements and
94state-dependent flag tests to control the parsing logic of the
95program. Xerces's complex data dependent control flow requires between
966 -- 13 branches per byte of XML input, depending on the markup in
97the file (details in Section~\ref{section:XML-branches}).  Cache
98utilization is also significantly reduced due to the manner in which
99markup and content must be scanned and buffered for future use.  For
100instance, Xerces incurs $\sim$100 L1 cache misses per kilobyte (kB) of
101XML data. In general, while microarchitectural improvements may help the
102parser tide over some of these challenges (e.g., cache misses), the
103fundamental data and control flow in the parsers are ill suited for
104commodity processors and experience significant overhead.
105
106
107
108
109% Switch block logical
110% Branches
111% Cache utilization
112
113% \subsection {Parallel XML Parsing}
114%
115%
116% In general, parallel acceleration methods come in two forms: multithreaded approaches and
117% SIMD-based techniques. Multithreaded XML parsers take advantage of multiple cores via number of strategies.
118% Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09}
119% and speculative p-DFAs \cite{ZhangPanChiu09}.
120
121% To accelerate XML parsing performance, most of the recent
122% work has focused on parallelization through the use of multicore
123% parallelism for chip multiprocessors \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
124
125
126% SIMD XML parsers leverage the SIMD registers to overcome the
127% performance limitations of the sequential byte-at-a-time processing model and its inherently data dependent
128% branch misprediction rates.  Further, data parallel SIMD instructions allow the processor to perform the same
129% operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
130% fall under the SIMD classification. The Parabix parser versions studied are described in further detail in
131% Section \ref{section:parabix}.
132
133% while SIMD (Single Instruction Multiple Data) parallelism
134% has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
135% , as well as to the developers of parallel bit stream technology
136% \cite{CameronHerdyLin2008, Cameron2009, Cameron2010}.
137% Each of these approaches has shown considerable performance
138% benefits over traditional sequential parsing techniques that follow the
139% byte-at-a-time model.
140
141%\subsection {SIMD Operations}
142% Two such SIMD XML parsers, Parabix1 and Parabix2, utilizes parallel bit stream processing technology.
143% Extract section 2.2 and merge into 3.   Add a new subsection
144% in section 2 saying a bit about SIMD.   Say a bit about pure SIMD vertical
145% operations and then mention the pack operations that allow
146% us to implement transposition efficiently in parallel. 
147% Also note that the SIMD registers support bitwise logic across
148% their full width and that this is extensively used in our work.
149% \subsection{Parallel XML Parsing}
150%
151% Parallel XML processing generally comes in one of two forms: multithreading and SIMD. Multithreaded XML parsers take advantage of parallelism by first quickly preparsing the XML file to locate the key markup entities and determine the best workload distribution in which process the XML file using $n$-cores \cite{ZhangPanChiu09}. SIMD XML parsers leverage the SIMD registers to overcome the performance limitations of the sequential paradigm and inherently data dependent branch misprediction rates \cite{Cameron2010}. Two such SIMD XML parsers, Parabix1 and Parabix2, utilizes parallel bit stream processing technology. With this method, byte-oriented character data is first transposed to eight parallel bit streams, one for each bit position within the character code units (bytes). These bit streams are then loaded into SIMD registers of width $W$ (e.g., 64-bit, 128-bit, 256-bit, etc). This allows $W$ consecutive code units to be represented and processed at once. Bitwise logic and shift operations, bit scans, population counts and other bit-based operations are then used to carry out the work in parallel \cite{CameronLin2009}.
152%
153% \subsubsection{Parabix1}
154%
155% Our first generation parallel bit stream XML parser---Parabix1---uses employs a less conventional approach of SIMD technology to represent text in parallel bit streams. Bits of each stream are in one-to-one-correspondence with the bytes of a character stream. A transposition step first transforms sequential byte stream data into eight basis bit streams for the bits of each byte. Bitwise logical combinations of these basis bit streams can then be used to classify bytes in various ways, while the bit scan operations common to commodity processors can be used for fast sequential scanning. At a high level, Parabix1 processes source XML in a functionally equivalent manner as a traditional processor. That is, Parabix1 moves sequentially through the source document, maintaining a single cursor scanning position throughout the parse. However, this scanning operation itself is accelerated significantly which leads to dramatic performance improvements, since bit scan operations can perform up to general register width (32-bit, 64-bit) finite state transitions per clock cycle. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, CameronLin2009, Cameron2010}.
156%
157% \subsubsection{Parabix2}
158%
159% In our second generation XML parser---Parabix2---we address the replacement of sequential parsing using bit scan instructions with a parallel parsing method using bitstream addition. Unlike the single cursor approach of Parabix1 and conceptually of traditional sequential approach, in Parabix2 multiple cursors positions are processed in parallel. To deal with these parallel cursors, three additional categories of bit streams are introduced. Marker bit streams are used to represent positions of interest in the parsing of a source data stream \cite{Cameron2010}. The appearance of a 1 at a position in a marker bitstream could, for example, denote the starting position an XML tag in the data stream. In general, the set of bit positions in a marker bitstream may be considered to be the current parsing positions of multiple parses taking place in parallel throughout the source data stream. A further aspect of the parallel method is that conditional branch statements used to identify syntax error at each each parsing position are eliminated. Instead, error bit streams are used to identify the position of parsing or well-formedness errors during the parsing process. Error positions are gathered and processed in as a final post processing step. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing as well as significanlty reduces branch misprediction penalty.
160
161%
162% --- Not Sure ---
163%
164% The first two parsers employ traditional byte-at-a-time methods of
165% parsing; these parsers were selected based on their popularity in the
166% marketplace and the availability of source code for deeper analysis.
167% The Fluke i410 current clamp is a digital multimeter that reads the
168% magnetic field of a live electrical cable to determine the current
169% passing through it without affecting the underlying hardware.
170
171% \textbf{No Need to talk about resource usage until  later}
172
173% In order to determine how and which performance factors influence energy consumption,
174% we intend to use the Fluke i410 current clamp in conjunction with PMCs to compare the per parser invocation and per source XML byte energy % usage of three XML parsers:
175% Expat 2.0.1, Xerces-C++ 3.1.1 (SAX2), and Parabix2. All three parsers are C/C++ based, event-driven, stream-oriented XML parsers.
176
177% Think the person reading doesn't know much about XML parsers.
178%
179%
180% Need gory details on byte at a time parsers. Pictures.
181% Xerces. Explain overall data flow and control flow for these parsers.
182% Briefly highlight inefficiencies.
183%
184%
185%
186% Talk about the usage of SIMD instructions and how it might help. Lead
187% onto briefly describe the key technology behind parabix.
188
189%
190% --- Section Advice ---
191
192% 1. Think the person reading doesn't know much about XML parsers.
193%
194% 2. Need gory details on byte at a time parsers. Pictures.
195% Xerces. Explain overall dataflow and control flow for these parsers.
196% Briefly highlight inefficiencies.
197%
198% 3. Talk about the usage of SIMD instructions and how it might help. Lead
199% onto briefly describe the key technology behind parabix.
200%
Note: See TracBrowser for help on using the repository browser.