wiki:WikiStart

Welcome to the Parabix Technology Home Page

Parabix technology is a high-performance programming framework for streaming text processing applications, leveraging both SIMD and multicore parallel processing features.

  • September, 2018: ICU 42 presentation on AVX-512/Parabix
  • June, 2018: AVX-512 support for Parabix/icgrep
  • June, 2018: u32u8 application
  • April, 2018: File glob parsing using Parabix methods
  • May 2, 2016: Check out the ParabixOS project!
  • November 18-20, 2015: Join us in Zhangjiajie, China for our presentation at ICA3PP 2015.
  • October 28-29, 2015: Join us at the 2015 LLVM Developers' Meeting for Parabix-LLVM discussion.
  • October 26-27, 2015: Join us at Unicode Conference 39 for our presentation of Unicode regular expression matching in icgrep.
  • September 2015: Look at our plans for additional Parabix regular expression facilities in the Parabix Regular Expression Road Map
  • February 2015: Check out icGrep 1.0 offering Gigabyte Per Second Performance!

Parabix Transform

The Parabix framework is based on the concept of parallel bit streams, a fundamentally new transform representation of text. Byte-oriented character stream data is first transformed into eight parallel bit streams, each bit stream comprising one bit per character code unit. Code units may be ASCII characters or UTF-8 bytes, for example, with one parallel bit stream defined for each of bit 0 through bit 7 of each code unit. Given such a representation, the 128-bit SIMD (single-instruction multiple-data) registers of the SSE (Intel architecture SIMD technology) or Altivec (Power PC architecture) may be used to process 128 code unit positions at a time.

See the Parabix Transform page for details.

Alphabets, Character Classes, Unicode

The Parabix framework contains many facilities for working with character representations of various kinds.

A fundamental notion is the character class bitstream. This is a stream of bits in one-to-one correspondence with some input character code units, such that 1 bits indicate characters within the class and 0 bits indicate characters outside of the class. Often we use regular-expression notation to identify character classes, such as [abc] for the class containing the three lower-case letters "a", "b", and "c", and [0-9] as the class for decimal digits. The following example shows an input character stream and the corresponding bit streams for the [abc] and [0-9] streams, respectively. We conventionally mark 0 bits with periods (".") to make the 1 bits stand out.

input:   This is just 1 abbreviated example of character stream input containing 24 instances of the [abc] class and 6 instances of the [0-9] class. 
[abc]:   ...............111....1......1........1.1.11........1........1...1.............1.1...........111..1.1...1.........1.1................1.1...
[0-9]:   .............1..........................................................11..................................1...................1.1........

Read about the Parabix Character Class Compilers for more information.

(Note: this is an example of aligned streams display.)

Useful Debugging Options

See UsefulDebuggingOptions.

The Pablo Language and Compilers

The Pablo Language allows parallel bit stream programs to be conveniently written using primitives that manipulate arbitrary-length bitstreams. Pablo comes in two different forms: the python Pablo language for direct generation of C++ programs from Pablo language source files, and the Pablo intermediate representation that can be translated to LLVM IR.

IDISA Run-Time Libraries

The IDISA project defines an abstraction for portable SIMD programming featuring support for operations at all power-of-2 field widths as well as transitions between those field-widths (inductive doubling architecture).

Parabix with LLVM

The Parabix-LLVM project is investigating the use of LLVM as a back-end for Parabix tools and applications.

ICXML: Incorporating Parabix Technology into the Xerces-C XML Parser

ICXML (TM) is a highly-accelerated version of the widely-reknowned Xerces-C XML Parser. Developed by International Characters, Inc., in partnership with our SFU research lab, ICXML systematically incorporates Parabix and other acceleration technologies into Xerces. Speedups of 1.5X and more have been measured with various applications on single core, while our experimental dual core applications have seen acceleration of more than 2X over Xerces.

We are pleased that International Characters, Inc. is releasing ICXML as open source software hosted here under the Open Software License 3.0.

icGrep: Gigabyte Per Second Regular Expression Search

icGrep is the latest demonstration project for Parabix technology, offering a full-featured grep program with world-beating performance and broad support for Unicode regular expression matching.

Licensing

Parabix software as provided here as open source software under Open Software License 3.0. Commercial licensing is available through International Characters, Inc., an SFU spin-off company based on Prof. Cameron's research. Parabix is a trademark of International Characters. International Characters, Inc., holds several patents on Parabix technology, but has dedicated those patents free for use in open-source software, teaching and research.

http://www.international-characters.com/

Last modified 12 days ago Last modified on Sep 11, 2018, 11:12:16 AM