source: docs/Working/icXML/icxml-main.tex @ 2471

Last change on this file since 2471 was 2471, checked in by nmedfort, 7 years ago

some edits

File size: 5.1 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30
31\usepackage{amsmath}
32\usepackage{graphicx}
33\usepackage{CJKutf8}
34\begin{document}
35
36\conferenceinfo{PPoPP '13}{date, City.} 
37\copyrightyear{2013} 
38\copyrightdata{[to be supplied]} 
39
40\titlebanner{banner above paper title}        % These are ignored unless
41\preprintfooter{short description of paper}   % 'preprint' option specified.
42
43\title{ICXML:  Accelerating a Commercial XML Parser Using Parallel Technologies}
44%\subtitle{Subtitle Text, if any}
45\authorinfo{Anonymous Hackers}
46{}
47{}
48% \authorinfo{Nigel Medforth \and Dan Lin \and Rob Cameron \and Arrvindh Shriraman}
49%            {Simon Fraser University}
50%            {\{nmedfort,lindanl,cameron,ashriram\}@cs.sfu.ca}
51
52\maketitle
53
54\def \icXML {icXML}
55\def \PS {Parabix Subsystem}
56\def \MP {Markup Processor}
57
58\begin{abstract}
59\input{abstract.tex}
60\end{abstract}
61
62\category{CR-number}{subcategory}{third-level}
63
64\terms
65term1, term2
66
67\keywords
68keyword1, keyword2
69
70\section{Introduction}
71
72Paragraph 1: 
73
74Parallelization and acceleration of XML parsing is a widely
75studied problem that has seen the development of a number
76of interesting research prototypes.
77One possibility to data parallelizing the parsing process is by adding a
78pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}.
79The pre-parsing stage can also be parallelized using state machines \cite{E-SCIENCE2007, IPDPS2008}.
80Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that
81combines the partial results \cite{ParaDOM2009}.
82A hybrid method that combines data parallelism and pipeline parallelism is proposed to
83hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
84Intel introduced new string processing instructions in the SSE 4.2 instruction set extension
85and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}.
86Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters
87simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
88Parabix can also be combined with thread-level parallelism to achieve further improvement
89on multicore systems \cite{HPCA2012}.
90
91Paragraph 2:
92In this paper, we move beyond research prototypes to consider
93the detailed integration of parallel methods into the Xerces-C++
94parser of the Apache Software Foundation, an existing
95standards-compliant open-source parser that is widely used
96in commercial practice.    Surprisingly, our results show
97that a speed-up of more than 100\% can be achieved in some
98applications, in apparent defiance of simple calculations
99based on Amdahl's law.  [Write text on these calculations
100based on reported costs of XML tokenization  (30\%?), transcoding...]
101
102Symbol table lookup: more than 15\%, compute key:3\% \cite{ZhaoBhuyan06} 
103
104Schema valiation double, triple or quadruple the parsing cost. \cite{NicolaJohn03}
105
106Transcoding:  about 50\% \cite{Perkins05}
107
108Paragraph 3:
109To achieve the best results possible, we have undertaken
110a comprehensive restructuring of the Xerces-C++ parser,
111seeking to expose as many critical aspects of XML parsing
112as possible for parallelization.   Overall, we have
113employed parabix-style methods in transcoding, tokenization
114and tag parsing,  parallel string comparison methods in symbol
115resolution, bit parallel methods in namespace processing, as well as staged
116processing with pipeline parallelism to take advantage of
117multiple cores.
118
119\section{Background}
120
121\input{background-xerces}
122\input{background-parabix}
123\input{background-fundemental-differences.tex}
124
125\section{Architecture}
126
127\input{arch-overview.tex}
128
129\input{arch-charactersetadapters.tex}
130
131\input{parfilter.tex}
132
133\input{arch-namespace.tex}
134
135\input{arch-errorhandling.tex}
136
137\section{Performance}
138
139\icXML{} vs. Original Xerces
140
141 -- SAXCount
142 -- GML2SVG?
143
144 -- simulated performance on AVX2???
145
146
147
148\input{multithread.tex}
149
150\section{}
151Research in Progress:  Parallel Validation of datatypes, content models
152  with bitstreams
153
154\appendix
155\section{Appendix Title}
156
157This is the text of the appendix, if you need one.
158
159\acks
160
161Acknowledgments, if needed.
162
163% We recommend abbrvnat bibliography style.
164
165\bibliographystyle{abbrvnat}
166
167% The bibliography should be embedded for final submission.
168
169\bibliography{reference}
170
171
172\end{document}
Note: See TracBrowser for help on using the repository browser.