Changeset 1396


Ignore:
Timestamp:
Aug 30, 2011, 4:52:44 PM (8 years ago)
Author:
ksherdy
Message:

general edits to improve flow and define scope and name check streams

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/03b-research.tex

    r1393 r1396  
    1212This section describes the implementation of the Parabix XML parser.
    1313Figure \ref{parabix_arch} shows its overall structure set up for
    14 well-formedness checking.  The input file is
    15 processed using 11 functions organized into 7 modules.  In the first
    16 module, the Read\_Data function loads data blocks from an input file
    17 to data\_buffer.  The data is then transposed to eight parallel basis
    18 bitstreams (basis\_bits) in the Transposition module.  The eight
    19 bitstreams are used in the Classification function to generate all the
    20 XML lexical item streams (lex) as well as in the U8\_Validation module
    21 to validate UTF-8 characters.  The lexical item streams and scope
    22 streams (scope) that are generated in Gen\_Scope function are supplied
    23 to the parsing module, which consists three functions, Parse\_CtCDPI,
    24 Parse\_Ref and Parse\_tag.  These functions deal with the parsing of
    25 comments, CDATA sections, processing instructions, references and
    26 tags.  After this, information is gathered by Name\_Validation and
    27 Err\_Check functions, producing name check streams and error streams.
    28 These are then passed to the final module for Postprocessing.  All the
    29 possible errors that cannot be conveniently detected by bitstreams are
    30 checked in this last module.  The final output reports any
    31 well-formedness error detected and its position within the input file.
     14well-formedness checking. 
     15The input file is processed using 11 functions organized into 7 modules. 
     16In the first module, {\tt Read\_Data}, the input file is loaded into the
     17data\_buffer. The data is then transposed to eight parallel basis
     18bit streams (basis\_bits) in the {\tt Transposition} module. 
     19The basis\_bits are used in by the {\tt U8\_Validation} module to validate
     20UTF-8 characters, and by the {\tt Classification} and {\tt Gen\_Scope} module
     21to generate all the XML lexical item streams (lex) and scope streams (scope).
     22Scope streams are a simplified subset of lex streams in which the legal yet
     23insignificant cursors have been removed. Both the lex and scope streams
     24are supplied to the parsing module, which consists of three functions:
     25(1) {\tt Parse\_CtCDPI}, (2) {\tt Parse\_Ref} and (3) {\tt Parse\_tag};
     26these functions deal with the parsing of
     27(1) comments, CDATA sections, and processing instructions;
     28(2) references, and
     29(3) start tags, end tags, and empty tags as well as any related attributes.
     30Afterwards, the information is gathered by the {\tt Name\_Validation} and
     31{\tt Err\_Check} functions, producing name check streams and error streams.
     32Name check streams are weak error streams that verify each character used in a
     33name is valid according to the XML 1.0 specification.
     34These are then passed to the final {\tt Postprocessing} module.
     35Any error that cannot be conveniently detected in bit space are
     36checked here. The final output reports any
     37well-formedness error and its position within the input file.
    3238
    33 Within this structure, all functions in the four shaded modules
    34 consist entirely of parallel bit stream operations.  Of these, the
     39Using this structure, all of the functions in the four shaded modules
     40consist entirely of parallel bit stream operations. Of these, the
    3541Classification function consists of XML character class definitions
    3642that are generated using our character class compiler \textit{ccc}, while much of the U8\_Validation
    3743similarly consists of UTF-8 byte class definitions that are also
    3844generated by ccc.  The remainder of these functions are programmed
    39 using our unbounded bitstream language following the logical
     45using our unbounded bit stream language following the logical
    4046requirements of XML parsing.  All the functions in the four shaded
    4147modules are then compiled to low-level C/C++ code using our Pablo
Note: See TracChangeset for help on using the changeset viewer.