source: u8u16/trunk/README @ 5877

Last change on this file since 5877 was 5877, checked in by cameron, 17 months ago

Adding old u8u16 for Teradata

File size: 2.1 KB
Line 
1u8u16 is a high-speed UTF-8 to UTF-16 transcoding program,
2with an iconv-compatible interface.
3
4Different versions of u8u16 can be created using the following
5commands.
6
7make u8u16_g4
8  creates a version for the Power PC G4 under Mac OS X, using
9  the Altivec SIMD capabilities
10
11make u8u16_p4
12  creates a version using the SSE SIMD capabilities
13  of P4 or equivalent processors.
14
15make u8u16_p4_ideal
16  creates a P4 version that simulates the best algorithms
17  for an idealized SIMD processor implementing an inductive
18  doubling architecture
19
20make u8u16_mmx
21  creates a version for Pentium or equivalent processors using
22  MMX facilities; this runs, for example on AMD Geode
23
24make iconv_u8u16
25  creates an equivalent transcoding program that calls the OS-provided
26  iconv routine, for comparison purposes
27
28All versions are compiled to measure and report a histogram of cycle counts
29per 1000 UTF-8 code units processed.   This instrumentation may be deleted
30by eliminating the flag -DBUFFER_PROFILING from the Makefile.
31
32http://download.wikimedia.org/ is a good source of XML test data
33for performance tests.   Depending on architecture and UTF-8 data
34characteristics, the high-speed u8u16 transcoder has been found to
35perform 3X to 15X faster than iconv.
36
37Correctness testing of a particular version may be carried out
38by changing to the QA directory and executing the run_all script
39as in the following example.
40./run_all ../u8u16_mmx
41
42Correctness testing of iconv implementations has shown errors in
43Linux and Mac OS X environments, due to incorrect reporting of
44some erroneous UTF-8 sequences as "incomplete" (when they occur at
45the end of file).
46
47u8u16 is a demonstration program for the ongoing research work
48of Prof. Rob Cameron of Simon Fraser University into high-speed
49character processing using parallel bit streams.   International
50Characters, Inc., an SFU spin-off company makes it available as
51open source software under Open Software License 3.0.   Commercial
52licenses are available as well.
53
54The u8u16 program is written using the cweb literate programming
55system of Knuth and Levy.  See src/libu8u16.pdf for the program
56documentation that results.
Note: See TracBrowser for help on using the repository browser.