Changeset 2490


Ignore:
Timestamp:
Oct 19, 2012, 10:44:29 AM (7 years ago)
Author:
cameron
Message:

Section 1 and 2 clean-ups.

Location:
docs/Working/icXML
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/background-fundemental-differences.tex

    r2471 r2490  
    1818% Parallel: blocks/segments/buffers through layers
    1919Parabix-style XML parsers utilize a concept of layers:
    20 as block of source text is transformed into a set of lexical bit streams,
    21 it undergoes a series of operations that can be grouped together as a logical
    22 layer, such as transposition, character classification, and the lexical analysis
    23 phases. Each layer is pipeline parallel, as they require no speculation nor
    24 pre-parsing stages\cite{HPCA2012}.
    25 The disadvantage of this approach is that, taken individually, the resultant parallel
    26 bit streams may out-of-order w.r.t. the source document and must be amalgamated and
    27 iterated through to produce sequential output.
     20as each block of source text is transformed into a set of lexical bit streams,
     21it undergoes a series of operations that can be grouped together in logical
     22layers, such as transposition, character classification, and the lexical analysis
     23phases. Each layer is pipeline parallel, requiring no speculation nor
     24pre-parsing\cite{HPCA2012}.
     25In adapting to the requirements of the Xerces sequential parsing API,
     26however, the resultant parallel
     27bit streams, taken individually, may out-of-order with respect to the source
     28document.  They hence must be amalgamated and iterated through to produce
     29sequential output.
    2830% The end user should not be expected to work with out-of-order data ...
    2931
  • docs/Working/icXML/background-parabix.tex

    r2483 r2490  
    8080opener (i.e., ``\verb:/:'') or not.  The remaining three
    8181lines show streams that can be computed in subsequent
    82 parsing, namely streams marking the element names,
     82parsing (using the technique
     83of bitstream addition \cite{cameron-EuroPar2011}), namely streams marking the element names,
    8384attribute names and attribute values of tags. 
    84 
    85 {\it Do we need to explain how those can be computed from the input text or do we simply refer them to prior papers?}
    8685
    8786Two intuitions may help explain how the Parabix approach can lead
     
    9695is the scan complete at this position yet?  Rather than
    9796computing these individual decision-bits, an approach that computes
    98 many of them in parallel (e.g., 128) should provide substantial benefit.
     97many of them in parallel (e.g., 128 bytes at a time using 128-bit registers)
     98should provide substantial benefit.
    9999
    100100Previous studies have shown Parabix approach improves many aspects of XML processing,
  • docs/Working/icXML/background-xerces.tex

    r2483 r2490  
    2323% Should we show a val-grind summary of a few files in a linechart form?
    2424
    25 Xerces, like all traditional parsers, process XML documents sequentially a byte-at-a-time from the
    26 first to the last byte of input data. Each byte passes through several processing layers and are
     25Xerces, like all traditional parsers, processes XML documents sequentially a byte-at-a-time from the
     26first to the last byte of input data. Each byte passes through several processing layers and is
    2727classified and eventually validated within the context of the document state.
    2828This introduces implicit dependencies between the various tasks within the application that make it
  • docs/Working/icXML/icxml-main.tex

    r2483 r2490  
    7474of interesting research prototypes using both SIMD and
    7575multicore parallelism.   Most works have investigated
    76 strategies for data parallel solutions on multicore
     76data parallel solutions on multicore
    7777architectures using various strategies to break input
    7878documents into segments that can be allocated to different cores.
     
    102102standards-compliant open-source parser that is widely used
    103103in commercial practice.    The challenge of this work is
    104 to incorporate parallelize the Xerces parser in such a way as to
     104to parallelize the Xerces parser in such a way as to
    105105preserve the existing APIs as well as offering worthwhile
    106106end-to-end acceleration of XML processing.   
     
    109109seeking to expose as many critical aspects of XML parsing
    110110as possible for parallelization.   Overall, we have
    111 employed parabix-style methods in transcoding, tokenization
     111employed Parabix-style methods in transcoding, tokenization
    112112and tag parsing,  parallel string comparison methods in symbol
    113113resolution, bit parallel methods in namespace processing, as well as staged
  • docs/Working/icXML/reference.bib

    r2300 r2490  
    471471 author = {Blake, Geoffrey and Dreslinski, Ronald G. and Mudge, Trevor and Flautner, Kriszti\'{a}n},
    472472 title = {Evolution of thread-level parallelism in desktop applications},
    473  booktitle = {Proc. 37th Annual Int'l Symposium on Computer architecture},
     473 booktitle = {Proc. 37th Annual Int'l Symposium on Computer Architecture},
    474474 series = {ISCA '10},
    475475 year = {2010}
     
    479479 author = {Esmaeilzadeh, Hadi and Blem, Emily and St. Amant, Renee and Sankaralingam, Karthikeyan and Burger, Doug},
    480480 title = {Dark silicon and the end of multicore scaling},
    481  booktitle = {Proc. 38th Annual Int'l Symposium on Computer architecture},
     481 booktitle = {Proc. 38th Annual Int'l Symposium on Computer Architecture},
    482482 series = {ISCA '11},
    483483 year = {2011}
     
    487487 author = {Lu, Wei and Chiu, Kenneth and Pan, Yinfei},
    488488 title = {A Parallel Approach to {XML} Parsing},
    489  booktitle = {Proceedings of the 7th IEEE/ACM International Conference on Grid Computing},
     489 booktitle = {Proceedings of the 7th {IEEE/ACM} International Conference on Grid Computing},
    490490 series = {GRID '06},
    491491 year = {2006},
     
    531531author = {Dan Lin and Nigel Medforth and Kenneth S. Herdy and Arrvindh Shriraman and Rob Cameron},
    532532title = {Parabix: Boosting the efficiency of text processing on commodity processors},
    533 booktitle ={High-Performance Computer Architecture, International Symposium on},
     533booktitle ={International Symposium on High-Performance Computer Architecture},
    534534year = {2012},
    535535pages = {1-12},
     
    540540author = {Cheng-Han You and Sheng-De Wang},
    541541title = {A Data Parallel Approach to {XML} Parsing and Query},
    542 booktitle ={10th IEEE International Conference on High Performance Computing and Communications},
     542booktitle ={10th {IEEE} International Conference on High Performance Computing and Communications},
    543543year = {2011},
    544544pages = {520-527},
     
    560560author = {Yinfei Pan and Ying Zhang and Kenneth Chiu},
    561561title = {Hybrid Parallelism for {XML SAX} Parsing},
    562 booktitle ={IEEE International Conference on Web Services},
     562booktitle ={{IEEE} International Conference on Web Services},
    563563year = {2008},
    564564pages = {505-512},
     
    590590 address = {Washington, DC, USA},
    591591}
     592
     593@book{HackersDelight,
     594author = "{Henry S. Warren}",   
     595title = "{Hacker's Delight}",
     596publisher = "Addison-Wesley",
     597year=2002
     598}
Note: See TracChangeset for help on using the changeset viewer.