Ignore:
Timestamp:
Oct 19, 2012, 3:01:59 PM (7 years ago)
Author:
nmedfort
Message:

temp checkin

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/arch-errorhandling.tex

    r2471 r2496  
    44% XML errors are rare but they do happen, especially with untrustworthy data sources.
    55Xerces outputs error messages in two ways: through the programmer API and as thrown objects for fatal errors.
    6 As Xerces parses a file, it uses context-dependant logic to assess whether the next character is legal or not;
     6As Xerces parses a file, it uses context-dependant logic to assess whether the next character is legal;
    77if not, the current state determines the type and severity of the error.
    8 ICXML emits errors in the similar manner---but how it discovers them is substantially different.
    9 
    10 Recall that in Figure \ref{fig:icxml-arch}, ICXML is divided into two sections: the \PS{} and
    11 the \MP{}. Each section has its own system for producing the error messages, geared towards the type
    12 of processing handled by the module.
     8\icXML{} emits errors in the similar manner---but how it discovers them is substantially different.
     9Recall that in Figure \ref{fig:icxml-arch}, \icXML{} is divided into two sections: the \PS{} and \MP{},
     10each with its own system for detecting and producing error messages.
    1311
    1412Within the \PS{}, all computations are performed in parallel, a block at a time.
     
    2119(2) column position is counted in characters, not bytes or code units;
    2220thus multi-code-unit code-points and surrogate character pairs are all counted as a single column position.
    23 Exacerbating these problems is the fact that typical XML documents are error-free but the calculation of the
    24 line/column position is a constant overhead in Xerces that must be maintained in the case that one occurs.
    25 To reduce this overhead, ICXML pushes the bulk cost of the line/column calculation to the occurence of the error and
    26 performs the minimal amount of book-keeping necessary to facilitate the function.
    27 ICXML leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
     21Note that typical XML documents are error-free but the calculation of the
     22line/column position is a constant overhead in Xerces. % that must be maintained in the case that one occurs.
     23To reduce this, \icXML{} pushes the bulk cost of the line/column calculation to the occurrence of the error and
     24performs the minimal amount of book-keeping necessary to facilitate it.
     25\icXML{} leverages the byproducts of the Character Set Adapter (CSA) module and amalgamates the information
    2826within the Line Column Tracker (LCT).
    29 One of the CSA's major responsibilities is transcoding an input text from some encoding format to near-output-ready UTF-16.
     27One of the CSA's major responsibilities is transcoding an input text. % from some encoding format to near-output-ready UTF-16.
    3028During this process, white-space normalization rules are applied and multi-code-unit and surrogate characters are detected
    3129and validated.
     
    4442column number.
    4543
    46 \begin{figure}[h]
     44\begin{figure}[ht]
    4745{\bf TODO: An example of a skip mask, error mask, and the raw data and transcoded data for it.
    4846Should a multi-byte character be used and/or some CRLFs to show the difficulties?}
Note: See TracChangeset for help on using the changeset viewer.