Changeset 1186 for docs

May 22, 2011, 7:56:57 PM (8 years ago)

Notes on parallel bitstream-based length sorting.

3 added
6 edited


  • docs/EuroPar2011/europar-cameron.aux

    r1185 r1186  
    4141\@writefile{toc}{\contentsline {section}{\numberline {7}Conclusion}{11}}
    42 \bibstyle{plain}
    43 \bibdata{xmlperf}
  • docs/EuroPar2011/europar-cameron.tex

    r1185 r1186  
    5252\author{Robert D. Cameron
    53 %\thanks{}
    5453\and Ehsan Amiri \and Kenneth S. Herdy \and Dan Lin \and Thomas C. Shermer \and Fred P. Popowich}
    57 \authorrunning{Cameron {\em et al}}
     56\authorrunning{Cameron, Amiri, Herdy, Lin, Shermer and Popowich}
    5957% the affiliations are given next; don't give your e-mail address
    6058% unless you accept that it will be published
    216213of character-class streams in this way is an overhead on parallel bit
    217214stream applications, in general.   However, using the SIMD
    218 capabilities of current commodity processors, these operations are quite
     215capabilities of current commodity processors, these operations are
    219216fast, with an amortized overhead of about 1 CPU cycle per byte for
    220217transposition and less than 1 CPU cycle per byte for all the character
    287284addition also produces some additional bits that are
    288285not involved in the scan operation.   
    289 However, these are easily removed as shown in the fifth row,
     286These are easily removed as shown in the fifth row,
    290287by applying bitwise logic to mask
    291288off any bits from the digit bitstream; these can never
    373370result of the marker advance operation $n(M_0)$ to
    374371produce the new marker bitstream $M_1$.  At this point,
    375 a hash mark is required, so the first error bitstream $E_0$ is
     372the grammar requires a hash mark, so the first error bitstream $E_0$ is
    376373formed using a bitwise ``and'' operation combined with negation,
    377374to indicate violations of this condition.
    530527Following the pattern shown here, the remaining syntactic
    531528features of XML markup can similarly be parsed with
    532 bitstream based methods.   One complexity is that the
     529bitstream based methods.   One complication is that the
    533530parsing of comments,
    534531CDATA sections and processing instructions must be
    536533within which ordinary XML markups are not parsed (i.e.,
    537534within each of these types of construct.   This is handled
    538 by first performance the parsing of these structures and
     535by first parsing these structures and
    539536then forming a {\em mask bitstream}, that is, a stream that
    540537identifies spans of text to be excluded from parsing
    604601It is too expensive to check every instance of an XML name
    605602against the full range of possible values.   However, it is
    606 possible and quite inexpensive to use parallel bitstream
     603possible and inexpensive to use parallel bitstream
    607604techniques to verify that any ASCII characters within a name
    608605are indeed legal name start characters or name characters.
    611608it useful to define a name scan character class to include all the legal ASCII characters
    612609for names as well as all non-ASCII characters. 
    613 A namecheck character class bitstream will then be defined to identify nonASCII
     610A namecheck character class bitstream will then be defined to identify non-ASCII
    614611characters found within namescans.   In most documents
    615612this bitstream will be all $0$s; even in documents with substantial
    774771the processor carry flags and add-with-carry instructions on
    77577264-bit general registers.  In both cases, the overall
    776 performance is quite impressive, with the increased
     773performance is impressive, with the increased
    777774parallelism of parallel bit scans clearly paying off in
    778775improved performance for dense markup.
    847 \bibliographystyle{plain}
    849 \bibliography{xmlperf}
     851Krste Asanovic, Ras Bodik, Bryan~Christopher Catanzaro, Joseph~James Gebis,
     852  Parry Husbands, Kurt Keutzer, David~A. Patterson, William~Lester Plishker,
     853  John Shalf, Samuel~Webb Williams, and Katherine~A. Yelick.
     854\newblock The landscape of parallel computing research: A view from {Berkeley}.
     855\newblock Technical Report UCB/EECS-2006-183, EECS Department, University of
     856  California, Berkeley, Dec 2006.
     859Tim Bray, Jean Paoli, C.~M. Sperberg-McQueen, Eve Maler, and François Yergeau.
     860\newblock Extensible markup language ({XML}) 1.0 (fifth edition).
     861\newblock W3C Recommendation, 2008.
     864Robert~D. Cameron.
     865\newblock {A Case Study in SIMD Text Processing with Parallel Bit Streams}.
     866\newblock In {\em {ACM Symposium on Principles and Practice of Parallel
     867  Programming (PPoPP)}}, {Salt Lake City, Utah}, 2008.
     870Robert~D. Cameron, Kenneth~S. Herdy, and Dan Lin.
     871\newblock High performance {XML} parsing using parallel bit stream technology.
     872\newblock In {\em {CASCON} '08: Proceedings of the 2008 conference of the
     873  center for advanced studies on collaborative research}, pages 222--235, New
     874  York, NY, USA, 2008. ACM.
     877Zefu Dai, Nick Ni, and Jianwen Zhu.
     878\newblock A 1 cycle-per-byte {XML} parsing accelerator.
     879\newblock In {\em FPGA '10: Proceedings of the 18th Annual ACM/SIGDA
     880  International Symposium on Field Programmable Gate Arrays}, pages 199--208,
     881  New York, NY, USA, 2010. ACM.
     884Kenneth~S. Herdy, David~S. Burggraf, and Robert~D. Cameron.
     885\newblock High performance {GML} to {SVG} transformation for the visual
     886  presentation of geographic data in web-based mapping systems.
     887\newblock In {\em Proceedings of {SVG} Open 2008}, August 2008.
     890M.~G. Kostoulas, M.~Matsa, N.~Mendelsohn, E.~Perkins, A.~Heifets, and
     891  M.~Mercaldi.
     892\newblock {{XML} Screamer: An Integrated Approach to High Performance {XML}
     893  Parsing, Validation and Deserialization}.
     894\newblock In {\em Proceedings of the 15th International Conference on World
     895  Wide Web (WWW '06)}, pages 93--102, 2006.
     898Michael Leventhal and Eric Lemoine.
     899\newblock The {XML} chip at 6 years.
     900\newblock In {\em International Symposium on Processing {XML} Efficiently:
     901  Overcoming Limits on Space, Time, or Bandwidth}, August 2009.
     904Bhavik Shah, Praveen Rao, Bongki Moon, and Mohan Rajagopalan.
     905\newblock A data parallel algorithm for {XML DOM} parsing.
     906\newblock In Zohra BellahsÚne, Ela Hunt, Michael Rys, and Rainer Unland,
     907  editors, {\em Database and {XML} Technologies}, volume 5679 of {\em Lecture
     908  Notes in Computer Science}, pages 75--90. Springer Berlin / Heidelberg, 2009.
     911Ying Zhang, Yinfei Pan, and Kenneth Chiu.
     912\newblock Speculative p-{DFA}s for parallel {XML} parsing.
     913\newblock In {\em High Performance Computing (HiPC), 2009 International
     914  Conference on}, pages 388--397, December 2009.
  • docs/PACT2011/03-research.tex

    r1104 r1186  
    102102$M_0 = \texttt{[<]}$            & \verb`1...........1......`\\
    103103$M_1 = advance(M_0)$            & \verb`.1...........1.....`\\
    104 $M_2 = scanto(M_1, N)$          & \verb`...1.............1.`\\
    105 $M_3 = scanto(M_2, N)$          & \verb`...1..............1`
     104$M_2 = scanthru(M_1, A)$                & \verb`...1.............1.`\\
  • docs/PACT2011/07-avx.tex

    r1120 r1186  
    6666base instruction count of the Parabix2 binary compiled in SSE-only
    6767mode is shown with the caption ``sse,'' the version obtained by
    68 simple recompilation with AVX-mode enabled is labeled ``avx 128-bit,''
     68simple recompilation with AVX-mode enabled is labeled ``128-bit avx,''
    6969and the version reimplemented to use 256-bit operations wherever
    70 possible is labelled ``avx 256-bit.''    The instruction counts
     70possible is labelled ``256-bit avx.''    The instruction counts
    7171are divided into three classes.  The ``non-SIMD'' operations
    7272are the general purpose instructions that use neither SSE nor
    8989reduction in bitwise logic was not achieved.
    91 The ``other SIMD'' class shows a substantial ``30\%-35\%'' reduction
     91The ``other SIMD'' class shows a substantial 30\%-35\% reduction
    9292with AVX 128-bit technology compared to SSE.  This reduction is
    9393due to eliminated copies or reloads when SIMD operations
  • docs/PACT2011/reference.bib

    r1115 r1186  
    260 author = "{Perkins, E. and Kostoulas, M. and Heifets, A. and Matsa, M. and Mendelsohn, N.}",
     260author = "{E. Perkins and M. Kostoulas and A. Heifets and M. Matsa and N. Mendelsohn}",
    261261title = "{Performance Analysis of {XML} APIs}",
    262262booktitle = {XML 2005}, address = {Atlanta, Georgia}, month=nov, year=2005}
Note: See TracChangeset for help on using the changeset viewer.