source: docs/Working/icXML/icxml-main.tex @ 2496

Last change on this file since 2496 was 2496, checked in by nmedfort, 7 years ago

temp checkin

File size: 6.2 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[10pt,preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30
31\usepackage{amsmath}
32\usepackage{graphicx}
33\usepackage{CJKutf8}
34\usepackage{morefloats}
35\begin{document}
36
37\conferenceinfo{EuroSys '13}{date, City.} 
38\copyrightyear{2013} 
39\copyrightdata{[to be supplied]} 
40
41\titlebanner{banner above paper title}        % These are ignored unless
42\preprintfooter{short description of paper}   % 'preprint' option specified.
43
44\def \icXML {icXML}
45\def \PS {Parabix Subsystem}
46\def \MP {Markup Processor}
47
48\title{\icXML{}:  Accelerating a Commercial XML Parser Using SIMD and Multicore Technologies}
49%\subtitle{Subtitle Text, if any}
50\authorinfo{Anonymous Hackers}
51
52% \authorinfo{Nigel Medforth \and Dan Lin \and Kenneth S. Herdy \and Arrvindh Shriraman \and Robert D. Cameron }
53%            {International Characters, Inc., and Simon Fraser University}
54%            {\{nmedfort,lindanl,ksherdy,ashriram,cameron\}@cs.sfu.ca}
55
56\maketitle
57
58\begin{abstract}
59\input{abstract.tex}
60\end{abstract}
61
62\category{CR-number}{subcategory}{third-level}
63
64\terms
65term1, term2
66
67\keywords
68keyword1, keyword2
69
70\section{Introduction}
71
72Parallelization and acceleration of XML parsing is a widely
73studied problem that has seen the development of a number
74of interesting research prototypes using both SIMD and
75multicore parallelism.   Most works have investigated
76data parallel solutions on multicore
77architectures using various strategies to break input
78documents into segments that can be allocated to different cores.
79For example, one possibility for data
80parallelization is to add a pre-parsing step to compute
81a skeleton tree structure of an  XML document \cite{GRID2006}.
82The parallelization of the pre-parsing stage itself can be tackled with
83state machines \cite{E-SCIENCE2007, IPDPS2008}.
84Methods without pre-parsing have used speculation \cite{HPCC2011} or post-processing that
85combines the partial results \cite{ParaDOM2009}.
86A hybrid method that combines data parallelism and pipeline parallelism is proposed to
87hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
88
89Fewer efforts have investigated SIMD parallelism, although this approach
90has the potential advantage of improving single core performance as well
91as offering savings in energy consumption.
92Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension
93and showed how they can be used to improve the performance of XML parsing \cite{XMLSSE42}.
94The Parabix framework uses generic SIMD extensions and bit parallel methods to
95process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
96Parabix prototypes have also combined SIMD methods with thread-level parallelism to
97achieve further acceleration on multicore systems \cite{HPCA2012}.
98
99In this paper, we move beyond research prototypes to consider
100the detailed integration of both SIMD and multicore parallelism into the
101Xerces-C++ parser of the Apache Software Foundation, an existing
102standards-compliant open-source parser that is widely used
103in commercial practice.    The challenge of this work is
104to parallelize the Xerces parser in such a way as to
105preserve the existing APIs as well as offering worthwhile
106end-to-end acceleration of XML processing.   
107To achieve the best results possible, we have undertaken
108a comprehensive restructuring of the Xerces-C++ parser,
109seeking to expose as many critical aspects of XML parsing
110as possible for parallelization.   Overall, we have
111employed Parabix-style methods in transcoding, tokenization
112and tag parsing,  parallel string comparison methods in symbol
113resolution, bit parallel methods in namespace processing, as well as staged
114processing with pipeline parallelism to take advantage of
115multiple cores.   
116
117The remainder of this paper is organized as follows.   Section 2 discusses
118the structure of the Xerces and Parabix XML parsers and the fundamental
119differences between the two parsing models.   Section 3 then presents
120the \icXML{} design based on a restructured Xerces architecture to
121incorporate SIMD parallelism using Parabix methods.   Section 4 presents a performance
122study demonstrating substantial end-to-end acceleration of
123a GML-to-SVG translation application written against the Xerces API.
124Section 5 moves on to consider the multithreading of the \icXML{} architecture
125using the pipeline parallelism model.  Section 6 concludes the
126paper with a discussion of future work and the potential for
127applying the techniques discussed herein in other application domains.
128
129\section{Background}
130\label{background}
131
132\input{background-xerces}
133\input{background-parabix}
134\input{background-fundemental-differences.tex}
135
136\section{Architecture}
137
138\input{arch-overview.tex}
139
140\input{arch-charactersetadapters.tex}
141
142\input{parfilter.tex}
143
144\input{arch-namespace.tex}
145
146\input{arch-errorhandling.tex}
147
148\section{Performance}
149
150\icXML{} vs. Original Xerces
151
152 -- SAXCount
153 -- GML2SVG?
154
155 -- simulated performance on AVX2???
156
157
158
159\input{multithread.tex}
160
161\section{}
162Research in Progress:  Parallel Validation of datatypes, content models
163  with bitstreams
164
165\appendix
166\section{Appendix Title}
167
168This is the text of the appendix, if you need one.
169
170\acks
171
172Acknowledgments, if needed.
173
174% We recommend abbrvnat bibliography style.
175
176\bibliographystyle{abbrvnat}
177
178% The bibliography should be embedded for final submission.
179
180\bibliography{reference}
181
182
183\end{document}
Note: See TracBrowser for help on using the repository browser.