Version 17 (modified by cameron, 8 years ago) (diff)


Welcome to the Development Home of Parabix - World's Fastest XML Software

Parabix (Parallel bit streams for XML - TM) demonstrates a fundamentally new way to perform high-speed parsing of XML documents. Byte-oriented character stream data is first transformed into eight parallel bit streams, each bit stream comprising one bit per character code unit. Code units may be ASCII characters or UTF-8 bytes, for example, with one parallel bit stream defined for each of bit 0 through bit 7 of each code unit. Given such a representation, the 128-bit SIMD (single-instruction multiple-data) registers of the SSE (Intel architecture SIMD technology) or Altivec (Power PC architecture) may be used to process 128 code unit positions at a time.

Applications of parallel bit stream technology to various text/Unicode/XML processing applications is the ongoing research of Prof. Rob Cameron of the School of Computing Science at Simon Fraser University. See also the u8u16 project for application of parallel bit stream technology to high-speed UTF-8 to UTF-16 transcoding.

Parabix software as provided here as open source software under Open Software License 3.0. Patents pending. Commercial licensing is available through International Characters, Inc., an SFU spin-off company based on Prof. Cameron's research. Parabix is a trademark of International Characters.

Parabix 2

Our research program is presently moving on to the development of Parabix 2 - a second version of Parabix with four technical advances over our first version.

  1. Parallel Parsing with Bitstream Addition

This is a new technique which is demonstrated in our python prototype.

  1. Compiler Technology

We are developing compiler technology to automatically generate much of the low-level SIMD code for parallel bit stream processing. Although the first version of Parabix used character class compilation to produce bitlex.c, Parabix 2 will employ compilation techniques much more broadly.

  1. Array Set Models

Array set models are a technique for representing XML infomation items using sets of arrays. The primary goal is to enable the use of high-performance techniques in Java through bulk array transport across the JNI boundary.

  1. Length-Sorted Symbol Tables

By using bit-scan instructions to cheaply determine the length of names without touching each byte, name occurrences can first be presorted according to length. Separate loops or routines are then used for processing the name occurrences of each length; these routines can simply load and compare all the bytes of names without byte-at-a-time loops.

Cameron, Rob, Ken Herdy and Ehsan Amiri. Parallel Bit Stream Technology as a Foundation for XML Parsing Performance. Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:10.4242/BalisageVol4.Cameron01.

Robert D. Cameron, Kenneth S. Herdy and Dan Lin, High Performance XML Parsing Using Parallel Bit Stream Technology, Proceedings of CASCON 2008, Toronto, Ontario, October 27-30, 2008.

Kenneth S. Herdy, David S. Burggraf, and Robert D. Cameron, High Performance GML to SVG Transformation for the Visual Presentation of Geographic Data in Web-Based Mapping Systems, Proceedings of SVG Open, Nuremberg, Germany, August 26-28, 2008.

January 1, 2010: XmlStringValueExtraction

IDISA Project