source: docs/Working/icXML/icxml-main.tex @ 2524

Last change on this file since 2524 was 2524, checked in by lindanl, 6 years ago

category & general terms

File size: 6.4 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[10pt,preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30\usepackage{subfigure}
31\usepackage{amsmath}
32\usepackage{graphicx}
33\usepackage{CJKutf8}
34\usepackage{morefloats}
35\begin{document}
36
37\conferenceinfo{EuroSys '13}{date, City.} 
38\copyrightyear{2013} 
39\copyrightdata{[to be supplied]} 
40
41% \titlebanner{banner above paper title}        % These are ignored unless
42% \preprintfooter{short description of paper}   % 'preprint' option specified.
43
44\def \icXML {icXML}
45\def \icXMLp {icXML-p}
46\def \PS {Parabix Subsystem}
47\def \MP {Markup Processor}
48\def \wrt {with respect to}
49
50\title{\icXML{}:  Accelerating a Commercial XML Parser Using SIMD and Multicore Technologies}
51%\subtitle{Subtitle Text, if any}
52\authorinfo{Anonymous Hackers}
53
54% \authorinfo{Nigel Medforth \and Dan Lin \and Kenneth S. Herdy \and Arrvindh Shriraman \and Robert D. Cameron }
55%            {International Characters, Inc., and Simon Fraser University}
56%            {\{nmedfort,lindanl,ksherdy,ashriram,cameron\}@cs.sfu.ca}
57
58\maketitle
59
60\begin{abstract}
61\input{abstract.tex}
62\end{abstract}
63
64\category{D.1.3}{Programming Techniques}{Parallel programming}
65
66\terms
67Algorithm, Design, Measurement, Performance
68
69\keywords
70SIMD, parallel bitstream
71
72\section{Introduction}
73
74Parallelization and acceleration of XML parsing is a widely
75studied problem that has seen the development of a number
76of interesting research prototypes using both SIMD and
77multicore parallelism.   Most works have investigated
78data parallel solutions on multicore
79architectures using various strategies to break input
80documents into segments that can be allocated to different cores.
81For example, one possibility for data
82parallelization is to add a pre-parsing step to compute
83a skeleton tree structure of an  XML document \cite{GRID2006}.
84The parallelization of the pre-parsing stage itself can be tackled with
85state machines \cite{E-SCIENCE2007, IPDPS2008}.
86Methods without pre-parsing have used speculation \cite{HPCC2011} or post-processing that
87combines the partial results \cite{ParaDOM2009}.
88A hybrid technique that combines data and pipeline parallelism was proposed to
89hide the latency of a ``job'' that has to be done sequentially \cite{ICWS2008}.
90
91Fewer efforts have investigated SIMD parallelism, although this approach
92has the potential advantage of improving single core performance as well
93as offering savings in energy consumption \cite{HPCA2012}.
94Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension
95and showed how they can be used to improve the performance of XML parsing \cite{XMLSSE42}.
96The Parabix framework uses generic SIMD extensions and bit parallel methods to
97process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
98Parabix prototypes have also combined SIMD methods with thread-level parallelism to
99achieve further acceleration on multicore systems \cite{HPCA2012}.
100
101In this paper, we move beyond research prototypes to consider
102the detailed integration of both SIMD and multicore parallelism into the
103Xerces-C++ parser of the Apache Software Foundation, an existing
104standards-compliant open-source parser that is widely used
105in commercial practice.    The challenge of this work is
106to parallelize the Xerces parser in such a way as to
107preserve the existing APIs as well as offering worthwhile
108end-to-end acceleration of XML processing.   
109To achieve the best results possible, we undertook
110a comprehensive restructuring of the Xerces-C++ parser,
111seeking to expose as many critical aspects of XML parsing
112as possible for parallelization.   Overall, we have
113employed Parabix-style methods in transcoding, tokenization
114and tag parsing,  parallel string comparison methods in symbol
115resolution, bit parallel methods in namespace processing, as well as staged
116processing with pipeline parallelism to take advantage of
117multiple cores.   
118
119The remainder of this paper is organized as follows.   
120Section \ref{background} discusses the structure of the Xerces and Parabix XML parsers and the fundamental
121differences between the two parsing models.   
122Section \ref{architecture} then presents the \icXML{} design based on a restructured Xerces architecture to
123incorporate SIMD parallelism using Parabix methods.   
124Section \ref{multithread} moves on to consider the multithreading of the \icXML{} architecture
125using the pipeline parallelism model. 
126Section \ref{performance} analyzes the performance of both the single-threaded and
127multi-threaded versions of \icXML{} in comparison to original Xerces,
128demonstrating substantial end-to-end acceleration of
129a GML-to-SVG translation application written against the Xerces API.
130Section \ref{conclusion} concludes the paper with a discussion of future work and the potential for
131applying the techniques discussed herein in other application domains.
132
133\section{Background}
134\label{background}
135
136\input{background-xerces}
137\input{background-parabix}
138\input{background-fundemental-differences.tex}
139
140\section{Architecture}
141\label{architecture}
142
143\input{arch-overview.tex}
144
145\input{arch-charactersetadapters.tex}
146
147\input{parfilter.tex}
148
149\input{arch-namespace.tex}
150
151\input{arch-errorhandling.tex}
152
153\section{Leveraging Pipeline Parallelism}
154\label{multithread}
155
156\input{multithread.tex}
157
158\section{Performance}
159\label{performance}
160\input{performance.tex}
161
162\section{Conclusion and Future Work}
163\label{conclusion}
164\input{conclusion.tex}
165
166% We recommend abbrvnat bibliography style.
167
168\bibliographystyle{abbrvnat}
169
170% The bibliography should be embedded for final submission.
171
172\bibliography{reference}
173
174
175\end{document}
Note: See TracBrowser for help on using the repository browser.