source: docs/Working/icXML/icxml-main.tex @ 2453

Last change on this file since 2453 was 2453, checked in by lindanl, 7 years ago

Figures for multithread section

File size: 5.0 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30
31\usepackage{amsmath}
32\usepackage{graphicx}
33\usepackage{CJKutf8}
34\begin{document}
35
36\conferenceinfo{PPoPP '13}{date, City.} 
37\copyrightyear{2013} 
38\copyrightdata{[to be supplied]} 
39
40\titlebanner{banner above paper title}        % These are ignored unless
41\preprintfooter{short description of paper}   % 'preprint' option specified.
42
43\title{ICXML:  Accelerating a Commercial XML Parser Using Parallel Technologies}
44%\subtitle{Subtitle Text, if any}
45\authorinfo{Anonymous Hackers}
46{}
47{}
48% \authorinfo{Nigel Medforth \and Dan Lin \and Rob Cameron \and Arrvindh Shriraman}
49%            {Simon Fraser University}
50%            {\{nmedfort,lindanl,cameron,ashriram\}@cs.sfu.ca}
51
52\maketitle
53
54\begin{abstract}
55\input{abstract.tex}
56\end{abstract}
57
58\def \icXML {icXML}
59
60\category{CR-number}{subcategory}{third-level}
61
62\terms
63term1, term2
64
65\keywords
66keyword1, keyword2
67
68\section{Introduction}
69
70Paragraph 1: 
71
72Parallelization and acceleration of XML parsing is a widely
73studied problem that has seen the development of a number
74of interesting research prototypes.
75One possibility to data parallelizing the parsing process is by adding a
76pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}.
77The pre-parsing stage can also be parallelized using state machines \cite{E-SCIENCE2007, IPDPS2008}.
78Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that
79combines the partial results \cite{ParaDOM2009}.
80A hybrid method that combines data parallelism and pipeline parallelism is proposed to
81hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
82Intel introduced new string processing instructions in the SSE 4.2 instruction set extension
83and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}.
84Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters
85simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
86Parabix can also be combined with thread-level parallelism to achieve further improvement
87on multicore systems \cite{HPCA2012}.
88
89Paragraph 2:
90In this paper, we move beyond research prototypes to consider
91the detailed integration of parallel methods into the Xerces-C++
92parser of the Apache Software Foundation, an existing
93standards-compliant open-source parser that is widely used
94in commercial practice.    Surprisingly, our results show
95that a speed-up of more than 100\% can be achieved in some
96applications, in apparent defiance of simple calculations
97based on Amdahl's law.  [Write text on these calculations
98based on reported costs of XML tokenization  (30\%?), transcoding...]
99
100Symbol table lookup: more than 15\%, compute key:3\% \cite{ZhaoBhuyan06} 
101
102Schema valiation double, triple or quadruple the parsing cost. \cite{NicolaJohn03}
103
104Transcoding:  about 50\% \cite{Perkins05}
105
106Paragraph 3:
107To achieve the best results possible, we have undertaken
108a comprehensive restructuring of the Xerces-C++ parser,
109seeking to expose as many critical aspects of XML parsing
110as possible for parallelization.   Overall, we have
111employed parabix-style methods in transcoding, tokenization
112and tag parsing,  parallel string comparison methods in symbol
113resolution, bit parallel methods in namespace processing, as well as staged
114processing with pipeline parallelism to take advantage of
115multiple cores.
116
117\section{Background}
118
119\input{background-xerces}
120\input{background-parabix}
121\input{background-fundemental-differences.tex}
122
123\section{Architecture}
124
125\input{arch-overview.tex}
126
127\input{parfilter.tex}
128
129\input{arch-namespace.tex}
130
131\input{arch-errorhandling.tex}
132
133\section{Performance}
134
135\icXML{} vs. Original Xerces
136
137 -- SAXCount
138 -- GML2SVG?
139
140 -- simulated performance on AVX2???
141
142
143
144\input{multithread.tex}
145
146\section{}
147Research in Progress:  Parallel Validation of datatypes, content models
148  with bitstreams
149
150\appendix
151\section{Appendix Title}
152
153This is the text of the appendix, if you need one.
154
155\acks
156
157Acknowledgments, if needed.
158
159% We recommend abbrvnat bibliography style.
160
161\bibliographystyle{abbrvnat}
162
163% The bibliography should be embedded for final submission.
164
165\bibliography{reference}
166
167
168\end{document}
Note: See TracBrowser for help on using the repository browser.