Changeset 2449


Ignore:
Timestamp:
Oct 13, 2012, 3:36:24 PM (7 years ago)
Author:
nmedfort
Message:

progress on namespace section; started error handling

Location:
docs/Working/icXML
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/arch-errorhandling.tex

    r2439 r2449  
    33
    44% Challenges / Line Col Tracker
     5
     6Xerces outputs error messages in one of two ways: through the programmer API and as a thrown errors for fatal messages.
     7ICXML emits errors in the similar manner---but how they determine the line/column number of the error, which is a necessary
     8component of the error message, differs substantially.
     9Recall that in Figure \ref{fig:icxml-arch}, ICXML is divided into two sections: the Parabix subsystem and
     10the markup processor.
     11Within Parabix, all computations are performed in parallel at a block at a time. Errors are derived as artifacts of
     12bit stream equations, with a 1-bit marking the position of an error in a block.
     13
     14
     15
     16\begin{figure}[h]
     17{\bf TODO: An example of a skip mask, error mask, and the raw data and transcoded data for it.
     18Should a multi-byte character be used and/or some CRLFs to show the difficulties?}
     19\label{fig:error_mask}
     20\caption{}
     21\end{figure}
  • docs/Working/icXML/arch-namespace.tex

    r2439 r2449  
    22\label{section:arch:namespacehandling}
    33
    4 % Xerces stack-oriented vs icXML's bit-field oriented approach
     4% Should we mention canonical bindings or speculation? it seems like more of an optimization than anything.
    55
     6In XML, namespaces prevents naming conflicts when multiple vocabularies are used together.
     7It is especially important when a vocabulary application-dependant meaning, such as when
     8XML or SVG documents are embedded within XHTML files.
     9Namespaces are bound to uniform resource identifiers (URIs), which are strings used to identify
     10specific names or resources.
     11On line 1 of Figure \ref{fig:namespace1}, the \verb|xmlns| attribute instructs the XML
     12processor to bind the prefix \verb|p| to the URI ``\verb|pub.net|'' and the default (empty)
     13prefix to ``\verb|book.org|''. Thus to the XML processor, the \verb|title| on line 2 and
     14\verb|price| on line 4 both read as \verb|"book.org":title| and \verb|"book.org":price|
     15respectively, whereas on line 3 and 5, \verb|p:name| and \verb|price| are seen as
     16\verb|"pub.net":name| and \verb|"pub.net":price|. Even though the actual element name
     17\verb|price|, due to namespace scoping rules they are viewed as two uniquely-named items
     18because the current vocabulary is determined by the namespace(s) that are in-scope.
     19
     20\begin{figure}[h]
     21\begin{tabular}{l|l}
     221. & \verb|<book xmlns:p="pub.net" xmlns="book.org">| \\
     232. & \verb|  <title>BOOK NAME</title>| \\
     243. & \verb|  <p:name>PUBLISHER NAME</p:name>| \\
     254. & \verb|  <price>X</price>| \\
     265. & \verb|  <price xmlns="publisher.net">Y</price>| \\
     276. & \verb|</book>| \\
     28\end{tabular}
     29\label {fig:namespace1}
     30\caption{XML Namespace Example}
     31\end{figure}
     32
     33
     34In Xerces, every URI is mapped to a unique URI ID number.
     35These IDs persist throughout the lifetime of the application.
     36Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs
     37in the document. Because a namespace declaration affects the entire element, it must be processed prior to
     38grammar validation. This is a costly process considering that a typical namespaced XML document only comes
     39in one of two forms:
     40(1) those that declare a set of namespaces upfront and never change them, and
     41(2) those that repeatidly modify the namespace scope within the document in predictable patterns.
     42
     43\begin{table}[h]
     44\begin{center}
     45\begin{tabular}{|c||c|c|c|c|}\hline
     46NSID & Prefix & URI & Prefix ID & URI ID \\ \hline\hline
     470 & {\tt p} & {\tt pub.net} & 0 & 0 \\ \hline
     481 & {\tt xmlns} & {\tt books.org} & 1 & 1 \\ \hline
     492 & {\tt xmlns} & {\tt pub.net} & 1 & 0 \\ \hline
     50\end{tabular}
     51\caption{Namespace Binding Table Example}
     52\label{tbl:namespace1}
     53\end{center}
     54\end{table}
     55
     56For that reason, ICXML contains an independent namespace stack and utilizes bit vectors to cheaply perform
     57% speculation and
     58scope resolution options with a single XOR operation---even if many alterations are performed.
     59% performance advantage figure?? average cycles/byte cost?
     60When a prefix is declared (e.g., \verb|xmlns:p="pub.net"|), a namespace binding is created that maps
     61the prefix, which are assigned prefix ids in the symbol resolution process, to the URI.
     62Each unique URI is provided with an URI ID through the use of a global URI pool, similar to Xerces.
     63Each unique namespace binding has a unique namespace id (NSID) and every prefix contains a bit vector marking every
     64NSID that has ever been associated with it within the document. For example, in Table \ref{tbl:namespace1}, the
     65prefix binding set of \verb|p| and \verb|xmlns| would be \verb|01| and \verb|11| respectively.
     66To resolve the in-scope namespace binding for each prefix, a bit vector of the currently visible namespaces is
     67maintained by the system. By ANDing the prefix bit vector with the currently visible namespaces, the in-scope
     68NSID can be found using a bit scan instruction. A namespace binding table, similar to Table \ref{tbl:namespace1},
     69provides the actual URI ID.
     70
     71% PrefixBindings = PrefixBindingTable[prefixID];
     72% VisiblePrefixBinding = PrefixBindings & CurrentlyVisibleNamespaces;
     73% NSid = bitscan(VisiblePrefixBinding);
     74% URIid = NameSpaceBindingTable[NSid].URIid;
     75
     76To ensure that scoping rules are adhered to,
     77whenever a start tag is encountered, any modification to the currently visible namespaces is calculated and stored
     78within a stack of bit vectors denoting the locally modified namespace bindings. When an end tag is found, the
     79currently visible namespaces is XORed with the vector at the top of the stack.
     80% Speculation can be handled by probing the historical information within the stack but that goes beyond the scope of this paper.
Note: See TracChangeset for help on using the changeset viewer.