Alphabets, Character Classes and Unicode


In formal language theory, an alphabet is the set of all characters that may be used to form strings of the language.

For example, in strings represent DNA sequences, the alphabet may be {A, C, G, T}, the set of four single-letter codes for the distinct nucleotides that make up DNA. In legacy computer applications, the alphabet for character strings may consist of the 127 non-null characters of the 7-bit ASCII code, with null bytes used as string terminators. String processing in modern applications often uses Unicode as the alphabet for character strings, allowing the full set of characters that have been identified and registered from any of the world's languages and notation systems.