Ignore:
Timestamp:
Oct 17, 2012, 2:54:12 PM (7 years ago)
Author:
nmedfort
Message:

some edits

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/arch-errorhandling.tex

    r2470 r2471  
    22\label{section:arch:errorhandling}
    33
    4 % Challenges / Line Col Tracker
    5 
     4% XML errors are rare but they do happen, especially with untrustworthy data sources.
    65Xerces outputs error messages in two ways: through the programmer API and as thrown objects for fatal errors.
    76As Xerces parses a file, it uses context-dependant logic to assess whether the next character is legal or not;
    87if not, the current state determines the type and severity of the error.
    9 ICXML emits errors in the similar manner---but how it discovers them differs substantially.
    10 Recall that in Figure \ref{fig:icxml-arch}, ICXML is divided into two sections: the Parabix subsystem and
    11 the markup processor. Each section has its own system for producing the error messages, geared towards the type
     8ICXML emits errors in the similar manner---but how it discovers them is substantially different.
     9
     10Recall that in Figure \ref{fig:icxml-arch}, ICXML is divided into two sections: the \PS{} and
     11the \MP{}. Each section has its own system for producing the error messages, geared towards the type
    1212of processing handled by the module.
    1313
    14 Within the Parabix subsystem, all computations are performed in parallel, a block at a time.
     14Within the \PS{}, all computations are performed in parallel, a block at a time.
    1515Errors are derived as artifacts of bit stream calculations, with a 1-bit marking the byte-position of an error within a block,
    1616and the type of error is determined by the equation that discovered it.
     
    5151\end{figure}
    5252
    53 The Markup Processor is a state-driven machine. As such, error detection within it is very similar to Xerces.
    54 However, line/column tracking within it is a much more difficult problem. The Markup Processor parses the content stream,
    55 which is a series of tagged UTF-16 strings. Each string is normalized in accordance with the XML specification. All symbol
    56 data and unnecessary whitespace is eliminated from the stream.
    57 This means it is impossible to directly assess the current location with only the content stream.
    58 To calculate this, the Markup Processor borrows three additional pieces of information from the Parabix subsystem:
    59 the line-feed, skip mask, and a {\it deletion mask stream}, which is a bit stream that denotes every code-unit that
    60 was surpressed from the raw data during the production of the content stream.
    61 
    62 
    63 Armed with the cursor position in
    64 the content stream,
    65 
     53The \MP{} is a state-driven machine. As such, error detection within it is very similar to Xerces.
     54However, reporting the correct line/column is a much more difficult problem.
     55The \MP{} parses the content stream, which is a series of tagged UTF-16 strings.
     56Each string is normalized in accordance with the XML specification.
     57All symbol data and unnecessary whitespace is eliminated from the stream.
     58This means it is impossible to directly assess the current location using only the cursor position within the content stream.
     59To calculate the location, the \MP{} borrows three additional pieces of information from the \PS{}:
     60the line-feed, skip mask, and a {\it deletion mask stream}, which is a bit stream denoting the (code-unit) position of every
     61datum that was surpressed from the source during the production of the content stream.
     62Armed with these, it is possible to calculate the actual line/column using
     63the same system as the \PS{} until the sum of the negated deletion mask stream is equal to the cursor position.
Note: See TracChangeset for help on using the changeset viewer.