source: docs/Working/PPoPP/icxml-main.tex @ 2302

Last change on this file since 2302 was 2302, checked in by lindanl, 7 years ago

Modification to icxml and xerces structure figures

File size: 5.7 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30
31\usepackage{amsmath}
32\usepackage{graphicx}
33\usepackage{CJKutf8}
34\begin{document}
35
36\conferenceinfo{PPoPP '13}{date, City.} 
37\copyrightyear{2013} 
38\copyrightdata{[to be supplied]} 
39
40\titlebanner{banner above paper title}        % These are ignored unless
41\preprintfooter{short description of paper}   % 'preprint' option specified.
42
43\title{ICXML:  Accelerating a Commercial XML Parser Using Parallel Technologies}
44%\subtitle{Subtitle Text, if any}
45\authorinfo{Anonymous Hackers}
46{}
47{}
48% \authorinfo{Nigel Medforth \and Dan Lin \and Rob Cameron \and Arrvindh Shriraman}
49%            {Simon Fraser University}
50%            {\{nmedfort,lindanl,cameron,ashriram\}@cs.sfu.ca}
51
52\maketitle
53
54\begin{abstract}
55\input{abstract.tex}
56\end{abstract}
57
58\category{CR-number}{subcategory}{third-level}
59
60\terms
61term1, term2
62
63\keywords
64keyword1, keyword2
65
66\section{Introduction}
67
68Paragraph 1: 
69
70Parallelization and acceleration of XML parsing is a widely
71studied problem that has seen the development of a number
72of interesting research prototypes.
73One possibility to data parallelizing the parsing process is by adding a pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}.
74The pre-parsing stage can also be parallelized using state machines \cite{E-SCIENCE2007, IPDPS2008}.
75Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that combines the partial results \cite{ParaDOM2009}.
76A hybrid method that combines data parallelism and pipeline parallelism is proposed to hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
77Intel introduced new string processing instructions in the SSE 4.2 instruction set extension and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}.
78Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
79Parabix can also be combined with thread-level parallelism to achieve further improvement on multicore systems \cite{HPCA2012}.
80
81Paragraph 2:
82In this paper, we move beyond research prototypes to consider
83the detailed integration of parallel methods into the Xerces-C++
84parser of the Apache Software Foundation, an existing
85standards-compliant open-source parser that is widely used
86in commercial practice.    Surprisingly, our results show
87that a speed-up of more than 100\% can be achieved in some
88applications, in apparent defiance of simple calculations
89based on Amdahl's law.  [Write text on these calculations
90based on reported costs of XML tokenization  (30\%?), transcoding...]
91
92Symbol table lookup: more than 15\%, compute key:3\% \cite{ZhaoBhuyan06} 
93
94Schema valiation double, triple or quadruple the parsing cost. \cite{NicolaJohn03}
95
96Transcoding:  about 50\% \cite{Perkins05}
97
98Paragraph 3:
99To achieve the best results possible, we have undertaken
100a comprehensive restructuring of the Xerces-C++ parser,
101seeking to expose as many critical aspects of XML parsing
102as possible for parallelization.   Overall, we have
103employed parabix-style methods in transcoding, tokenization
104and tag parsing,  parallel string comparison methods in symbol
105resolution, bit parallel methods in namespace processing, as well as staged
106processing with pipeline parallelism to take advantage of
107multiple cores.
108
109\section{Background}
110
111\input{background-parabix}
112\input{background-xerces}
113
114\section{Architecture}
115
116\input{parfilter.tex}
117
118
119
120
121\begin{figure}
122\begin{center}
123\includegraphics[width=0.15\textwidth]{plots/xerces.pdf}
124\label{xerces_structure}
125\caption{} 
126\end{center}
127
128\end{figure}
129\begin{figure}
130\includegraphics[width=0.50\textwidth]{plots/icxml.pdf}
131\label{icxml_structure}
132\caption{}
133\end{figure}
134  - Philosophy:  Maximizing Bit Stream Processing
135
136  - Character Set Adapters vs. Transcoding
137  - Bitstreams 1: Charset Validation and Transcoding equations
138  - Bitstreams 2: Parabix style parsing and validation
139
140  - Bitstreams 3: Parallel filtering and normalization
141          - LB normalization
142          - reference compression -> single code unit speculation
143          - parallel string termination
144
145  - Bitstreams 4: Symbol processing
146
147  - From bit streams to doublebyte streams: the content buffer
148     
149  - Namespace Processing: A Bitset approach.
150
151\section{Performance}
152
153ICXML vs. Original Xerces
154
155 -- SAXCount
156 -- GML2SVG?
157
158 -- simulated performance on AVX2???
159
160
161
162\section{}
163Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism
164
165\section{}
166Research in Progress:  Parallel Validation of datatypes, content models
167  with bitstreams
168
169\appendix
170\section{Appendix Title}
171
172This is the text of the appendix, if you need one.
173
174\acks
175
176Acknowledgments, if needed.
177
178% We recommend abbrvnat bibliography style.
179
180\bibliographystyle{abbrvnat}
181
182% The bibliography should be embedded for final submission.
183
184\bibliography{reference}
185
186
187\end{document}
Note: See TracBrowser for help on using the repository browser.