source: docs/Working/icXML/arch-namespace.tex @ 2449

Last change on this file since 2449 was 2449, checked in by nmedfort, 7 years ago

progress on namespace section; started error handling

File size: 4.5 KB
Line 
1\subsection{Namespace Handling}
2\label{section:arch:namespacehandling}
3
4% Should we mention canonical bindings or speculation? it seems like more of an optimization than anything.
5
6In XML, namespaces prevents naming conflicts when multiple vocabularies are used together.
7It is especially important when a vocabulary application-dependant meaning, such as when
8XML or SVG documents are embedded within XHTML files.
9Namespaces are bound to uniform resource identifiers (URIs), which are strings used to identify
10specific names or resources.
11On line 1 of Figure \ref{fig:namespace1}, the \verb|xmlns| attribute instructs the XML
12processor to bind the prefix \verb|p| to the URI ``\verb|pub.net|'' and the default (empty)
13prefix to ``\verb|book.org|''. Thus to the XML processor, the \verb|title| on line 2 and
14\verb|price| on line 4 both read as \verb|"book.org":title| and \verb|"book.org":price|
15respectively, whereas on line 3 and 5, \verb|p:name| and \verb|price| are seen as
16\verb|"pub.net":name| and \verb|"pub.net":price|. Even though the actual element name
17\verb|price|, due to namespace scoping rules they are viewed as two uniquely-named items
18because the current vocabulary is determined by the namespace(s) that are in-scope.
19
20\begin{figure}[h]
21\begin{tabular}{l|l}
221. & \verb|<book xmlns:p="pub.net" xmlns="book.org">| \\
232. & \verb|  <title>BOOK NAME</title>| \\
243. & \verb|  <p:name>PUBLISHER NAME</p:name>| \\
254. & \verb|  <price>X</price>| \\
265. & \verb|  <price xmlns="publisher.net">Y</price>| \\
276. & \verb|</book>| \\
28\end{tabular}
29\label {fig:namespace1}
30\caption{XML Namespace Example}
31\end{figure}
32
33
34In Xerces, every URI is mapped to a unique URI ID number.
35These IDs persist throughout the lifetime of the application.
36Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs
37in the document. Because a namespace declaration affects the entire element, it must be processed prior to
38grammar validation. This is a costly process considering that a typical namespaced XML document only comes
39in one of two forms:
40(1) those that declare a set of namespaces upfront and never change them, and
41(2) those that repeatidly modify the namespace scope within the document in predictable patterns.
42
43\begin{table}[h]
44\begin{center}
45\begin{tabular}{|c||c|c|c|c|}\hline
46NSID & Prefix & URI & Prefix ID & URI ID \\ \hline\hline
470 & {\tt p} & {\tt pub.net} & 0 & 0 \\ \hline
481 & {\tt xmlns} & {\tt books.org} & 1 & 1 \\ \hline
492 & {\tt xmlns} & {\tt pub.net} & 1 & 0 \\ \hline
50\end{tabular}
51\caption{Namespace Binding Table Example}
52\label{tbl:namespace1}
53\end{center}
54\end{table}
55
56For that reason, ICXML contains an independent namespace stack and utilizes bit vectors to cheaply perform
57% speculation and
58scope resolution options with a single XOR operation---even if many alterations are performed.
59% performance advantage figure?? average cycles/byte cost?
60When a prefix is declared (e.g., \verb|xmlns:p="pub.net"|), a namespace binding is created that maps
61the prefix, which are assigned prefix ids in the symbol resolution process, to the URI.
62Each unique URI is provided with an URI ID through the use of a global URI pool, similar to Xerces.
63Each unique namespace binding has a unique namespace id (NSID) and every prefix contains a bit vector marking every
64NSID that has ever been associated with it within the document. For example, in Table \ref{tbl:namespace1}, the
65prefix binding set of \verb|p| and \verb|xmlns| would be \verb|01| and \verb|11| respectively.
66To resolve the in-scope namespace binding for each prefix, a bit vector of the currently visible namespaces is
67maintained by the system. By ANDing the prefix bit vector with the currently visible namespaces, the in-scope
68NSID can be found using a bit scan instruction. A namespace binding table, similar to Table \ref{tbl:namespace1},
69provides the actual URI ID.
70
71% PrefixBindings = PrefixBindingTable[prefixID];
72% VisiblePrefixBinding = PrefixBindings & CurrentlyVisibleNamespaces;
73% NSid = bitscan(VisiblePrefixBinding);
74% URIid = NameSpaceBindingTable[NSid].URIid;
75
76To ensure that scoping rules are adhered to,
77whenever a start tag is encountered, any modification to the currently visible namespaces is calculated and stored
78within a stack of bit vectors denoting the locally modified namespace bindings. When an end tag is found, the
79currently visible namespaces is XORed with the vector at the top of the stack.
80% Speculation can be handled by probing the historical information within the stack but that goes beyond the scope of this paper.
Note: See TracBrowser for help on using the repository browser.