source: docs/Working/PPoPP/icxml-main.tex @ 2257

Last change on this file since 2257 was 2257, checked in by lindanl, 7 years ago

Add PPoPP

File size: 4.4 KB
Line 
1%-----------------------------------------------------------------------------
2%
3%               Template for sigplanconf LaTeX Class
4%
5% Name:         sigplanconf-template.tex
6%
7% Purpose:      A template for sigplanconf.cls, which is a LaTeX 2e class
8%               file for SIGPLAN conference proceedings.
9%
10% Guide:        Refer to "Author's Guide to the ACM SIGPLAN Class,"
11%               sigplanconf-guide.pdf
12%
13% Author:       Paul C. Anagnostopoulos
14%               Windfall Software
15%               978 371-2316
16%               paul@windfall.com
17%
18% Created:      15 February 2005
19%
20%-----------------------------------------------------------------------------
21
22
23\documentclass[preprint]{sigplanconf}
24
25% The following \documentclass options may be useful:
26%
27% 10pt          To set in 10-point type instead of 9-point.
28% 11pt          To set in 11-point type instead of 9-point.
29% authoryear    To obtain author/year citation style instead of numeric.
30
31\usepackage{amsmath}
32
33\begin{document}
34
35\conferenceinfo{WXYZ '05}{date, City.} 
36\copyrightyear{2005} 
37\copyrightdata{[to be supplied]} 
38
39\titlebanner{banner above paper title}        % These are ignored unless
40\preprintfooter{short description of paper}   % 'preprint' option specified.
41
42\title{ICXML:  Accelerating a Commercial XML Parser Using Parallel Technologies}
43%\subtitle{Subtitle Text, if any}
44
45% \authorinfo{Nigel Medforth \and Dan Lin \and Rob Cameron \and Arrvindh Shriraman}
46%            {Simon Fraser University}
47%            {\{nmedfort,lindanl,cameron,ashriram\}@cs.sfu.ca}
48
49\maketitle
50
51\begin{abstract}
52This is the text of the abstract.
53\end{abstract}
54
55\category{CR-number}{subcategory}{third-level}
56
57\terms
58term1, term2
59
60\keywords
61keyword1, keyword2
62
63\section{Introduction}
64
65Paragraph 1: 
66
67Parallelization and acceleration of XML parsing is a widely
68studied problem that has seen the development of a number
69of interesting research prototypes.
70
71
72[Review the previous literature on various parallelization methods.
73     - Scarpazzi XML tokenization, parabix1 and 2, Intel SSE4.2 \cite{XMLSSE42},
74   Kenneth Chiu's work,  other data parallelism work (Balisage 08)...
75]
76
77Paragraph 2:
78In this paper, we move beyond research prototypes to consider
79the detailed integration of parallel methods into the Xerces-C++
80parser of the Apache Software Foundation, an existing
81standards-compliant open-source parser that is widely used
82in commercial practice.    Surprisingly, our results show
83that a speed-up of more than 100\% can be achieved in some
84applications, in apparent defiance of simple calculations
85based on Amdahl's law.  [Write text on these calculations
86based on reported costs of XML tokenization  (30\%?), transcoding...]
87 
88Paragraph 3:
89To achieve the best results possible, we have undertaken
90a comprehensive restructuring of the Xerces-C++ parser,
91seeking to expose as many critical aspects of XML parsing
92as possible for parallelization.   Overall, we have
93employed parabix-style methods in transcoding, tokenization
94and tag parsing,  parallel string comparison methods in symbol
95resolution, bit parallel methods in namespace processing, as well as staged
96processing with pipeline parallelism to take advantage of
97multiple cores.
98
99\section{Background}
100\subsection{Parabix Technology}
101
102\subsection{Xerces}
103
104\section{Architecture}
105
106  - Philosophy:  Maximizing Bit Stream Processing
107
108  - Character Set Adapters vs. Transcoding
109  - Bitstreams 1: Charset Validation and Transcoding equations
110  - Bitstreams 2: Parabix style parsing and validation
111
112  - Bitstreams 3: Parallel filtering and normalization
113          - LB normalization
114          - reference compression -> single code unit speculation
115          - parallel string termination
116
117  - Bitstreams 4: Symbol processing
118
119  - From bit streams to doublebyte streams: the content buffer
120     
121  - Namespace Processing: A Bitset approach.
122
123\section{Performance}
124
125ICXML vs. Original Xerces
126
127 -- SAXCount
128 -- GML2SVG?
129
130 -- simulated performance on AVX2???
131
132
133
134\section{}
135Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism
136
137\section{}
138Research in Progress:  Parallel Validation of datatypes, content models
139  with bitstreams
140
141\appendix
142\section{Appendix Title}
143
144This is the text of the appendix, if you need one.
145
146\acks
147
148Acknowledgments, if needed.
149
150% We recommend abbrvnat bibliography style.
151
152\bibliographystyle{abbrvnat}
153
154% The bibliography should be embedded for final submission.
155
156\bibliography{reference}
157
158
159\end{document}
Note: See TracBrowser for help on using the repository browser.