Changes between Initial Version and Version 1 of ArraySet

Sep 23, 2008, 7:29:30 AM (11 years ago)



  • ArraySet

    v1 v1  
     1= Parabix ArraySet Model =
     3== Introduction and Rationale ==
     5The Parabix ArraySet Model is an array-oriented model for representing
     6information extracted from XML documents, including information satisfying
     7the full InfoSet requirements.  It may be contrasted with a more traditional
     8object-oriented model in which an XML document is directly represented as
     9a tree of nodes.
     11The ArraySet Model represents information using a number of arrays, each
     12of which holds information of a particular kind extracted from the document.
     13The array elements generally consist of simple numeric or character values,
     14extracted from the document in document order.
     15For example, the CT_pos array holds an array of document positions at
     16which XML comments enclosed in <!-- and --> occur.
     18The primary purpose of the ArraySet model is to support high-performance
     19XML processing in consideration of the software and hardware resources
     20typically available in commodity processing environments.
     22 1.  Prefetching.  Commodity processors commonly support hardware and/or software
     23 prefetching to ensure that data is available in a processor cache when it is needed.
     24 In general, prefetching is most effective in conjunction with the continuous sequential
     25 memory access patterns associated with array processing.
     27 2.  DMA.   Some processing environments provide Direct Memory Access (DMA) controllers
     28 for block data movement in parallel with computation.  For example, the Cell Broadband
     29 Engine uses DMA controllers to move the data to and from the local stores of
     30 the synergistic processing units.  Arrays of contiguous data elements
     31 are well suited to bulk data movement using DMA.
     33 3.  SIMD.  Single Instruction Multiple Data (SIMD) capabilities of modern processor
     34 instruction sets allow simultaneous application of particular instructions to
     35 sets of elements from parallel arrays.  For effective use of SIMD capabilities,
     36 an SoA (Structure of Arrays) model is preferrable to an AoS (Array of Structures)
     37 model.
     39 4.  Multicore processors.  Array-oriented processing can enable the effective
     40  distribution of work to the individual cores of a multicore system in two
     41  distinct ways.  First, provided that sequential dependencies can be minimized or
     42  eliminated, large arrays can be divided into separate segments to be processed
     43  in parallel on each core.   Second, pipeline parallelism can be used to
     44  implement efficient multipass processing with each pass
     45  consisting of a processing kernel with array-based input and array-based output.
     47 5.  Streaming Buffers for Large XML Documents.  In the event that an XML
     48  document is larger than can be reasonably represented entirely within processor
     49  memory, a buffer-based streaming model can be applied to work through a
     50  document using sliding windows over arrays of elements stored in document order.
     52 6.  JNI.  The Java Native Interface (JNI) allows communication between a Java
     53 runtime environment and native processing resources on a host machine, but can
     54 impose substantial overhead with each call.  In addition,  data type
     55 conversion may be needed for all but the simplest data types.  Bulk transfer of arrays of
     56 simple types (e.g., integers) can minimize both the the number of JNI invocations
     57 and the cost of data conversion.