# Changeset 2449 for docs/Working/icXML/arch-namespace.tex

Ignore:
Timestamp:
Oct 13, 2012, 3:36:24 PM (7 years ago)
Message:

progress on namespace section; started error handling

File:
1 edited

### Legend:

Unmodified
 r2439 \label{section:arch:namespacehandling} % Xerces stack-oriented vs icXML's bit-field oriented approach % Should we mention canonical bindings or speculation? it seems like more of an optimization than anything. In XML, namespaces prevents naming conflicts when multiple vocabularies are used together. It is especially important when a vocabulary application-dependant meaning, such as when XML or SVG documents are embedded within XHTML files. Namespaces are bound to uniform resource identifiers (URIs), which are strings used to identify specific names or resources. On line 1 of Figure \ref{fig:namespace1}, the \verb|xmlns| attribute instructs the XML processor to bind the prefix \verb|p| to the URI \verb|pub.net|'' and the default (empty) prefix to \verb|book.org|''. Thus to the XML processor, the \verb|title| on line 2 and \verb|price| on line 4 both read as \verb|"book.org":title| and \verb|"book.org":price| respectively, whereas on line 3 and 5, \verb|p:name| and \verb|price| are seen as \verb|"pub.net":name| and \verb|"pub.net":price|. Even though the actual element name \verb|price|, due to namespace scoping rules they are viewed as two uniquely-named items because the current vocabulary is determined by the namespace(s) that are in-scope. \begin{figure}[h] \begin{tabular}{l|l} 1. & \verb|| \\ 2. & \verb|  BOOK NAME| \\ 3. & \verb|  PUBLISHER NAME| \\ 4. & \verb|  X| \\ 5. & \verb|  Y| \\ 6. & \verb|| \\ \end{tabular} \label {fig:namespace1} \caption{XML Namespace Example} \end{figure} In Xerces, every URI is mapped to a unique URI ID number. These IDs persist throughout the lifetime of the application. Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs in the document. Because a namespace declaration affects the entire element, it must be processed prior to grammar validation. This is a costly process considering that a typical namespaced XML document only comes in one of two forms: (1) those that declare a set of namespaces upfront and never change them, and (2) those that repeatidly modify the namespace scope within the document in predictable patterns. \begin{table}[h] \begin{center} \begin{tabular}{|c||c|c|c|c|}\hline NSID & Prefix & URI & Prefix ID & URI ID \\ \hline\hline 0 & {\tt p} & {\tt pub.net} & 0 & 0 \\ \hline 1 & {\tt xmlns} & {\tt books.org} & 1 & 1 \\ \hline 2 & {\tt xmlns} & {\tt pub.net} & 1 & 0 \\ \hline \end{tabular} \caption{Namespace Binding Table Example} \label{tbl:namespace1} \end{center} \end{table} For that reason, ICXML contains an independent namespace stack and utilizes bit vectors to cheaply perform % speculation and scope resolution options with a single XOR operation---even if many alterations are performed. % performance advantage figure?? average cycles/byte cost? When a prefix is declared (e.g., \verb|xmlns:p="pub.net"|), a namespace binding is created that maps the prefix, which are assigned prefix ids in the symbol resolution process, to the URI. Each unique URI is provided with an URI ID through the use of a global URI pool, similar to Xerces. Each unique namespace binding has a unique namespace id (NSID) and every prefix contains a bit vector marking every NSID that has ever been associated with it within the document. For example, in Table \ref{tbl:namespace1}, the prefix binding set of \verb|p| and \verb|xmlns| would be \verb|01| and \verb|11| respectively. To resolve the in-scope namespace binding for each prefix, a bit vector of the currently visible namespaces is maintained by the system. By ANDing the prefix bit vector with the currently visible namespaces, the in-scope NSID can be found using a bit scan instruction. A namespace binding table, similar to Table \ref{tbl:namespace1}, provides the actual URI ID. % PrefixBindings = PrefixBindingTable[prefixID]; % VisiblePrefixBinding = PrefixBindings & CurrentlyVisibleNamespaces; % NSid = bitscan(VisiblePrefixBinding); % URIid = NameSpaceBindingTable[NSid].URIid; To ensure that scoping rules are adhered to, whenever a start tag is encountered, any modification to the currently visible namespaces is calculated and stored within a stack of bit vectors denoting the locally modified namespace bindings. When an end tag is found, the currently visible namespaces is XORed with the vector at the top of the stack. % Speculation can be handled by probing the historical information within the stack but that goes beyond the scope of this paper.