EPICS: An Efficient, Programmable and
Interchangeable Code System for WWW

Noritaka OSAWA and Toshitsugu YUBA

Graduate School of Information Systems
The University of Electro-Communications
1-5-1 Chofugaoka, Chofu-shi, Tokyo 182, Japan

{osawa,yuba}@is.uec.ac.jp

Abstract

This paper proposes and evaluates a character or symbol code system called EPICS for internationalization of the WWW. EPICS integrates a variable-length coding system using 16-bit units and a smart virtual machine that executes inputs as instructions and is dynamically customizable. EPICS enhances the interchangeability of data. The variable-length coding system provides a huge code space. This huge space can include not only standardized code sets but also user-specific codes. The smart virtual machine allows us to define and modify instructions during runtime. Customization makes it possible for a sender to express his intentions in data and for a receiver to process the data depending on his needs. This customization also enables one to send compressed data and decompression programs incrementally and efficiently without predefined decompression algorithms. The length of an English document encoded in EPICS is shorter than that in UCS-2. The length of a Japanese and English document in EPICS is shorter than that in UTF-8.

1. Introduction

Use of the World Wide Web (WWW) is becoming wide spread. The WWW is used by people in a lot of nations and the number of WWW users is growing rapidly. Therefore multilingual processing has become more important. In addition to scientists and engineers, a lot of people use it as a media for exchanging information. Business users use the WWW on not only the Internet but also intranets. On intranets, company-specific or personal symbols are needed in order to communicate with each other efficiently. It is desirable that those symbols can be exchanged with people outside intranets. There are problems to be solved.

Unicode[16] and ISO 10646[6] are expected to promote the handling of a lot of characters that have been standardized. However, we think that static character code sets like Unicode are not sufficient for internationalization of the WWW and the multilingual WWW. Existing character code standards intentionally avoid the specific handling of private or personal characters or symbols. They specify only code regions of private characters. Thus existing standards do not promote the international circulation of data to support humane studies and interdisciplinary studies which use user-specific symbols. However more and more researchers in those fields of study are using the WWW. Therefore a new framework to process and exchange user-specific symbols easily is needed since standardization of user-specific symbols is impractical. The framework should not require centralized registration. We chose a method that decreases the possibility of overlapping code points by using a huge code space.

This paper proposes a dynamic symbol (character) code system capable of handling general symbols in addition to currently used characters. It is called EPICS (Efficient, Programmable and Interchangeable Code System). EPICS is programmable and is also a universal symbol code system that enables us to exchange data efficiently and flexibly. Programmability of EPICS enables us to exchange compressed WWW data without a special decompression program. It will be shown that EPICS can be more efficient than UCS-2 in English text and can be as efficient as UTF-8 in text which includes Japanese and English. Not only characters in plain text but also tags in rich text can be included in EPICS. In this paper, a character and a symbol represent the same thing.

2. EPICS

EPICS is a symbol (or character) code system that integrates a variable-length (multi-byte) code system called EPIC (Extensible Process-Internal Code)[12], whose unit is 16 bits, and a smart virtual machine[14] called EpicVM.

EPIC was originally designed to be used in an easy-to-use programming language that handles multilingual characters. When the programming language interpreter system was developed, 16 bit wide characters were not as popular. Therefore EPIC was designed for internal use. However, 16 bit wide characters are becoming popular because of the wide character (wchat_t) in the C programming language [7] and Unicode. Although a symbol in EPICS is a multi-byte character, EPICS can be used efficiently not only as codes for exchange but also internal processing because of the encoding design of symbols.

EpicVM is a smart virtual machine whose instructions are customizable dynamically. When we proposed PivotVM[14], we categorized it into a smart virtual machine. A smart virtual machine is a generic term and does not represent a specific virtual machine.

EPICS provides a framework where not only standardized character code sets but also symbols for research and user-specific symbols can be included without overlapping code points. Various types of symbol processing like sorting and searching can be done using a general software tool in the framework. For example, if one writes a text searching program for EPICS, the program can handle both standardized symbols and user-specific symbols. Special tools for ancient and user-specific symbols are not needed. EPICS reduces the work necessary for making software tools for symbol processing.

EPICS pays serious attention to both intentions of an information sender and requirements of a receiver. The sender can use arbitrary symbols and specify alternatives for these arbitrary symbols in EPICS. In other words, a sender can send his intentions to a receiver. The receiver can normalize data depending on his needs. The receiver may use alternative symbols that are specified by a sender, or may ignore the alternatives and map them to other symbols. We think normalization depends on the users' requirements. A single canonical mapping as in Unicode is not suitable in all situations.

EPICS allows a user to define a code sequence at a code point. When a symbol is inputted, a specified code sequence is invoked. For example, if a user specifies normalization of external user-specific symbols, the inputted external symbols are converted to normalized symbols. Not only mapping of 1 symbol to 1 symbol but also mapping of 1 symbol to a string is possible. This function accomplishes naturally the expansion of compressed data using dictionary-based coding like the LZ78 algorithm[17] if a routine that generates a string is specified at a code point. EpicVM can not only expand a symbol code to a string but also support more general programming because it is a virtual machine. By utilizing EpicVM, symbol images, font images and so on can be defined and transferred.

3. Efficient Variable-length Encoding

A unit of EPICS is a 16 bit long or wide character. A wide character in the C programming language and Unicode is becoming more and more popular. Processing of 16 bit characters is not a problem now.

We refer to a unit of 16 bits as EPICU. The most significant bit is BIT 16 in EPICU and the least significant bit is BIT 1. The two most significant bits in a unit indicate if the unit is the head of a symbol or the tail of it. If BIT 16 is 0 in an EPICU, the EPICU is the tail of a symbol. An EPICU whose BIT 15 is 0 is the head of a symbol. If both BIT 16 and 15 of an EPICU are 0, the EPICU is a symbol itself. This coding makes locating boundaries of a symbol easy and efficient. We show the format of EPICU in Table 1. Table 2 shows character formats composed of between 1 and 3 units. Figure 1 also shows extension methods of EPICS.

Table 1: Format in EPICU.
X represents either 0 or 1.

MSB LSB
BIT position 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Tail EPICU 0 X X X X X X X X X X X X X X X
Head EPICU X 0 X X X X X X X X X X X X X X



Table 2: Formats of symbols composed of between 1 and 3 units.
X represents either 0 or 1. Each EPICU is represented in binary.

1st EPICU 2nd EPICU 3rd EPICU
1-EPICU symbol 00XXXXXX XXXXXXXX
2-EPICU symbol 10XXXXXX XXXXXXXX 01XXXXXX XXXXXXXX
3-EPICU symbol 10XXXXXX XXXXXXXX 11XXXXXX XXXXXXXX 01XXXXXX XXXXXXXX




Figure 1: Relationship between Most Significant Bits and Symbol Length.
If BIT 16 is 1, each point in EPICU has successive units. If BIT 16 is 0, there are no more units.

Locating boundaries of a character is important in editor and viewer programs. In multi-byte codes of ISO 2022, it may be impossible to distinguish whether a byte is the first byte or the last byte in a 2 byte code on the basis of only the data of the byte. Incremental confirmation is needed from a confirmed point in the worst case. In EPICS, a header unit, an intermediate unit or a tail unit can be easily distinguished on the basis of data of the unit alone.

EPICS pays attention to string matching. Existing string matching algorithms can be naturally applied to data encoded in EPICS when a unit is 16 bits. Special handling depending on the length of a code is not needed. Pattern matching using regular expressions can also be applied easily where 16 bit data is one unit.

Some people who have made programs that handle ISO 2022 believe that the use of variable-length codes makes programming difficult. However, the main reason for the difficulty of handling ISO 2022 is not variable length but state management of ISO 2022 characters. Handling of ISO 2022 needs extra state management because a code point is multiplexed by different code sets. EPICS assigns different symbols to unique code points and thus does not require extra state management.

In the C++ programming language, 'a smart pointer' [15] helps C/C++ language programmers write programs that handle EPICS in the usual way. A smart pointer makes it possible to use EPICU in the C++ language like 'char' type in the C programming language. From our experiences when variable-length codes and smart pointers are used to make multilingual programming (script) language systems[12][13], handling of EPICS using smart pointers is as easy as that of fixed-length codes. In languages that do not allow pointer arithmetic, like the Java language[3], programmers do not need to be aware of the length of a character code.

4. A Huge Code Space

Variable-length coding using 16-bit units makes a very huge code space available. A huge code space with variable-length coding makes overlapping of code points of user-specific symbols less likely. Even if a registry administration of symbols does not exist, the possibility of overlapping code points would be made sufficiently low by using a sufficiently long code value and an appropriate hashing function that determines the prefix part of a code value.

We do not think surrogate characters in Unicode expand a code space sufficiently. One million code points made by surrogate pairs are too few to keep user-specific symbols from overlapping and interchangeable without explicit coordination.

A symbol code space of EPICS can be divided into subspaces. There are standardized character set subspaces, EpicVM subspaces, user-specific subspaces and temporary use subspaces. Symbol code values composed of one or two EPICUs are used for standardized characters and EpicVM instructions. 3-EPICU symbols are reserved for future standardized characters. Symbol code values composed of 4 or more EPICUs can be utilized for user-specific or temporary symbols. However, we recommend the use of symbols whose length is 5- or more EPICU for user-specific symbols.

Following Unicode standard, the character code value of Unicode is represented by U+nnnn where nnnn is a four digit number in hexadecimal notation. A symbol code value of EPICS is represented by "P+" and 4-digit hexadecimal numbers with dots as separators. For example, an EPICS symbol composed of 1 EPICU is represented by P+nnnn, and a 2-EPICU symbol is represented by P+mmmm.nnnn.

4.1 Standardized Character Set Subspaces

Some parts of EPICS are based on Unicode. Lower code values of Unicode are identical to code values of EPICS except unified CJK (Chinese, Japanese and Korean) misc. characters. The relationship between Unicode and EPICS is shown in Table 3 and Figure 2. For example, codes between U+0000 and U+2FFF correspond to codes between P+0000 and P+2FFF respectively, and the code region between U+3000 and U+3FFF are mapped to P+8000.7000 and P+8000.7FFF.

Table 3 Mapping Characters in Unicode to Symbols in EPICS.

Unicode range EPICS range
U+0000 -> U+2FFF P+0000 -> P+2FFF
U+3000 -> U+3FFF P+8000.7000 -> P+8000.7FFF
U+4000 -> U+7FFF P+8001.4000 -> P+8001.7FFF
U+8000 -> U+BFFF P+8002.4000 -> P+8002.7FFF
U+C000 -> U+D7FF P+8003.4000 -> P+8003.57FF
Surrogate Pairs P+9800.4C00 -> P+9B00.4FFF
U+E000 -> U+FFFD P+8003.6000 -> P+8003.7FFD




Figure 2: Relationship between Unicode and EPICS

Character code sets registered at ECMA (European Computer Manufacturers' Association) based on ISO 2022[5] are also mapped into EPICS for compatibility. The value of a final character to designate a coded character set is added to P+8100, and the result is used as the prefix of a symbol. Examples are shown in Table 4. Although ISO 2022 based characters can be included in EPICS strings, we recommend the use of mapped versions of Unicode characters instead of mapped versions of ISO 2022 based characters unless special intentions are involved.

Table 4: Relationship between character sets based on ISO 2022 and prefixes in EPICS

ISO 2022 EPICS
Character Set Final Character prefix
JIS X 0208 4/2 P+8142
CNS 11634-1 4/7 P+8147

4.2 EpicVM instructions

The code region between P+3000 and P+3FFF is used and reserved for EpicVM instructions and integer representation. EpicVM will be described in the next section.

The code region between P+3000 and P+3CFF is available for user-defined EpicVM instructions. Not only a code point in that region but also a code point in other unused regions can be used for a user-defined EpicVM instruction, however, unassigned code points of 1-EPICU symbol exist only in the above code region. The code region between P+3D00 and P+3DFF is used for exception handlers. The code region between P+3E00 and P+3EFF is used for predefined EpicVM instructions.

The code region between P+3F00 and P+3FFF represents the range of integers between -128 and 127. Integer representation can be extended to hold a larger value based on Table 5 and Table 6.

Table 5: Prefixes of EpicVM instructions.
X represents either 0 or 1.
Prefixes are represented in binary.

Type Prefix - 1st EPICU
Exception handlers X011 1101 XXXX XXXX
EpicVM standard instructions X011 1110 XXXX XXXX
Integer X011 1111 XXXX XXXX



Table 6: Extensions of Integer Representation.

Integer EPICS range
8-bit signed integer (8 bits) P+3F00 -> P+3F7F
22-bit signed integer (8+14 bits) P+BF00.4000 -> P+BFFF.7FFF
36-bit signed integer (8+14+14 bits) P+BF00.C000.4000 -> P+BFFF.FFFF.7FFF

5. EpicVM

EpicVM is a smart virtual machine and is also a stack-based virtual machine. It is a new type of virtual machine. EpicVM decodes an input symbol as an instruction and executes it. EpicVM allows one to define or modify its instructions using instructions that have been defined during runtime. On the other hand, a usual virtual machine like Smalltalk bytecode machine[2] and Java virtual machine[8] have a fixed instruction set, and they do not allow one to change instructions dynamically.

The internal structures of EpicVM are shown in Figure 3. EpicVM has a small number of registers. They are an input code register, an output code register, a stack pointer, a frame pointer and a current offset pointer. EpicVM has a data stack that a program manipulates. A unit on the stack is a symbol whose length is variable. This is different from other usual stack-based machines.

Each code point has a maximum of 128 attributes. Each attribute can contain a symbol or a code sequence (a routine). Attribute 0 of a symbol is usually used to store a code sequence to be invoked when the symbol is inputted.


Figure 3: Structures of Code Points and EpicVM.

5.1 Instructions of EpicVM

EpicVM allows one to define a sequence of program codes at a code point. Jumps in the sequence are restricted to relative jumps. Absolute jumps cannot be made on EpicVM. The range of a relative jump must be within the defined sequence. If the target address of a jump is out of range, an exception is raised. An exception causes a corresponding exception handler to be invoked. An exception handler is defined at a fixed code point. A user can define the exception handler. Codes in a defined sequence may be instructions. In other words, instructions at a code point can call already defined instructions. This makes it possible to invoke instructions as functions or procedures without absolute jumps. When an instruction is invoked, registers are saved on a system stack. Saved values are restored to the registers when control returns from the instruction.

Most instructions of EpicVM are general in a stack-based virtual machine like Smalltalk-80 bytecode machine[2] or Java virtual machine[8]. However, instructions to define or modify an instruction or an attribute are specific to a smart virtual machine like EpicVM. Basic instructions includes add, sub, compare, branch, push-in, push-sp, push-fp, put, get, define and so on. Add, sub and compare represents addition, subtraction and comparison of two values on the stack respectively. branch is a relative-jump instruction. Push-in, push-sp and push-fp represent pushing the value of input register, stack pointer and frame pointer onto the stack respectively. Put and get are instructions to put and get an attribute at a code point respectively. Define is an instruction to define a new instruction. The general format to define a new symbol or instruction is as follows.

define <symbol-code-value> <length-in-byte> <code-string>

Let us define a string "EpicVM" at P+3120. The code sequence to define the string is shown in Table 7. When P+3120 is inputted after this definition, the code P+3120 is expanded to "EpicVM".

Table 7: Code sequence to define a string "EpicVM" at P+3120.

1st 2nd 3rd 4th 5th 6th 7th 8th 9th
Meaning define P+3120 12 bytes E p i c V M
Code P+3ED3 P+3120 P+3F0C P+0045 P+0070 P+0069 P+0063 P+0056 P+004D

5.2 A Default Handler

When an input symbol is not defined as an instruction, a default handler is invoked conceptually. A default handler is defined at a fixed code point (P+3DFF). In plain EpicVM, code sequences are not defined at code points except for EpicVM instructions and integer representations. The default handler simply passes the input symbol to the output. Conceptually the default handler contains the following code sequence.

push-in pop-out

The sequence pushes an input symbol to the stack and pops the stack top to the output. In an actual implementation, the above code sequence does not need to be executed. If EpicVM knows that default handler is unchanged and an instruction sequence is not defined at an input symbol code point, it may simply output the input symbol. In other words, the overhead of default processing of an input symbol is only to check if the symbol is defined or not. The overhead is very low because the checking can be performed using hashing, or with computational complexity of O(1). EpicVM does not slow down the processing of usual symbols at a client.

5.3 Compression

Use of variable-length codes may make the number of bytes per symbol longer. Under such conditions, data compression by defining codes in EPICS increases the density of data. A sender can choose an appropriate algorithm for data contents if the sender sends a decompression program with compressed data. For example, a sender can send a decompression program like LZ78[17] at the head of data and follow it with compressed data.

It is also possible for a sender to gradually send program fragments and compressed data that uses defined codes, and for a receiver to expand compressed data gradually. This method requires code definitions to be sent explicitly and its compression ratio may be worse than that of LZ78 when a decompression program is installed on the receiver side. However, using this method, one can choose an algorithm suitable for data. One does not need to send a program at the head of transmission but it is necessary to send a code definition just before the code is invoked. This method reduces the latency of recovering symbols from compressed data on a stream-type communication which protocols on the WWW usually use. Moreover, when transmission is aborted, this method can reduce the transfer of unused parts of a decompression program.

We have made a prototype program which compresses data written in EPICS, and produces compressed data and incremental decompression routines in EPICS. A draft (epics.txt) of this paper written in English and HTML, and a manuscript about PivotVM[14] (pivot-vm-j.txt) that includes Japanese and English are used as sample texts. Table 8 shows the length of compressed text in EPICS and the length of the text in other formats. The length of epics.txt encoded in EPICS is shorter than the length of the text encoded in UCS-2. The length of pivot-vm-j.txt in EPICS is shorter than the length in UTF-8. Although EPICS supports a huge code space, EPICS is efficient. It is possible to exchange encoded data efficiently without special decompression programs. Our compression program is a prototype. We think the compression ratio could be improved if the compression program is better tuned.

Table 8: Comparison of length of data in various code systems and encoding.
The unit is a byte.

Format ASCII EUC-JP UCS-2 UTF-8 EPICS
epics.txt 35979 - 71958 35979 45894
pivot-vm-j.txt - 16334 18128 23604 23178

6. Discussions

6.1 Ancient Symbols

It is difficult to standardize ancient characters which are not used in daily life but are being studied. Examples of ancient characters are hieroglyphs in Egypt and pictographs in China. If researchers have different opinions about identities of symbols, standardization is impossible or at least difficult. If most researchers are able to agree with each other in the future, ancient symbols will be standardized. However, researchers can not wait for full standardization. EPICS allows researchers who have different opinions about the identification of symbols to assign symbols to different code points and proceed with their studies. Once standardization has been completed, an EpicVM in EPICS can be customized to map old code points to standardized ones. Data encoded in EPICS does not need special conversion software nor special searching software.

6.2 Combining Characters

Unicode uses combining characters. This is partly because the code space size of Unicode is insufficient. If all combinations are defined, they do not fit the 16-bit code space. Therefore Unicode uses an incomplete repertoire of composite characters. EPICS can have an infinite code space size although a practical limit should be imposed. Every combination of combining characters can be assigned to a different code point in EPICS. When composite characters are used, locating boundaries of characters becomes simple. Moreover rendering of a composite character is more systematic than rendering of combined characters.

6.3 Internationalization of tags

Documents usually consist of not only characters but also language tags and formatting tags. All of them should be internationalized. Standard General Markup Language (SGML) [4] and Hyper Text Markup Language (HTML) [1] are examples of use of tags or markups in fancy text. However, tags in HTML are based on English words. We do not think that internationalization of SGML and HTML is enough.

We can assign a symbol code value in EPICS to every tag which is composed of characters in markup languages because the code space of EPICS is huge. Internationalization of tags can be accomplished using EpicVM which maps the symbols to character strings in a user's native language. Tags in binary representation and a special transformation program could accomplish the same internationalization as EPICS. However, EPICS can accomplish internationalization of tags in the same framework as that of the characters. Since the code space of Unicode is not huge, it is difficult to treat a lot of tags in various markup languages and software as characters in Unicode. Therefore, string processing such as string matching needs special handling of tags, and thus it complicates a document processing system for internationalization. EPICS simplifies the processing system.

7. Related Work

Unicode is important as a base for multilingual processing. However, the encoding of Unicode is static and inflexible. A character set based on ISO 2022 is specified by each nation's government. Therefore, separation between languages and characters is insufficient. Thus an identical character can have different code points in different code sets. ISO 2022 uses a small code space and switches code sets mapped to the space. This complicates state management of characters. ISO 2022 is unsuitable for the internal processing of characters. ISO 2022 was standardized when computer resources were limited. Designation and invocation are main controls. We do not need to restrict control capabilities of a character code system because computational power and memory capacity have been enhanced recently. We think a simple and smart virtual machine should be included in a character code system standard.

Arena i18n [9][10] uses fixed-length internal codes. The unit is 4 bytes. 4-byte fixed-length codes are easy to handle within a program. However, when a code is exported outside the program, that encoding is inefficient and code conversion is probably needed.

Internal codes of Mule (MULtilingual Enhancement to GNU Emacs) [11] are mainly based on ISO 2022. Thus separation between languages and characters is insufficient. The length of a code is variable and the unit is a byte. A character code in a character set is prefixed with information identifying the character set. This type of encoding does not allow one to apply existing matching algorithms for fixed-width encoding to data simply using a byte as a unit because the existing matching algorithms do not recognize boundaries of a variable-length character properly.

8. Concluding Remarks

In this paper, we have presented a new symbol code system called EPICS. We think that EPICS promotes efficient internationalization and multilingualism of the WWW without imposing fixed character sets on people. Moreover, EPICS makes compressed data transfer possible without installing special decompression programs at clients. EPICS is derived from a unique combination of a variable-length coding system and a smart virtual machine, EpicVM.

In EPICS, a variable-length coding system makes it possible to include various characters needed for internationalization efficiently. The huge size of the code space of EPICS allows one to use and to exchange user-specific symbols with little possibility of overlapping code points even if coordination is not performed. EpicVM allows one to send not only static characters but also dynamic programs. This programmability enables one to send compressed data with a decompression program incrementally and efficiently. Compression reduces the amount of network traffic and storage overhead on WWW.

References

  1. Dave Raggett, HTML 3.2 Reference Specification, W3C, http://www.w3.org/pub/WWW/TR/PR-html32-961105.
  2. Goldberg, Adele and David Robson, Smalltalk-80: the language and its implementation, Addison-Wesley, 1983.
  3. Gosling, James, Bill Joy and Guy Steele, The JavaTM Language Specification, Addison-Wesley, 1996.
  4. ISO, Standard Generalized Markup Language, ISO 8879:1986, 1986.
  5. ISO, Information processing - ISO 7-bit and 8-bit coded character sets - Code extension techniques, ISO 2022:1986, 1986.
  6. ISO/IEC, Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane, ISO/IEC 10646-1:1993(E), 1993.
  7. ISO/IEC, Information Technology - Portable Operating System Interface (POSIX) - Part 1: System Application Program Interface (API) [C Language], ISO/IEC 9945-1:1996(E), (ANSI/IEEE Std 1003.1), 1996.
  8. Lindholm, Tim and Frank Yellin, The JavaTM Virtual Machine Specification, Addison-Wesley, 1996.
  9. Mukaigawa, Shinichi and Noritoshi Demizu, "Design and Implementation of I18N WWW Browser - i18n Arena," http://www.wg.omron.co.jp/~shin/Arena-jwwwc-95/ (in Japanese).
  10. Mukaigawa, Shinichi, "Proposal of Internationalized WWW Browser Arena i18n," Chapter 5 in Multilingual Environments, K. Nishikimi et al, Prentice-Hall Japan, 1996 (in Japanese).
  11. Nishikimi, Kimiko, Kenichi Handa and Satoru Tomura, "Mule: MULtilingual Enhancement to GNU Emacs," Proc. of INET'93, 1993.
  12. Osawa, Noritaka and Norio Kimura, "A Programming Language 'Kinari'," Technical report of TRON technical meeting, Vol.5, No.2, pp.39-50, 1993 (in Japanese).
  13. Osawa, Noritaka and Toshitsugu Yuba, "A Parallel Language for Discrete Event Simulation: 'Moegi'," Technical Report of IEICE, COMP94-22, pp.25-32, 1995 (in Japanese).
  14. Osawa, Noritaka and Toshitsugu Yuba, "A Dynamically Customizable Virtual Machine used as a Substratum in Heterogeneous Distributed Environments: PivotVM," Proc. of Computer Systems Symp. 1996, pp.81-86, 1996 (in Japanese).
  15. Stroustrup, Bjarne, The C++ Programming Language, Addison-Wesley, 1991.
  16. The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wesley Developers Press, 1996.
  17. Ziv, J. and A. Lempel, "Compression of Individual Sequences via Variable-Rate Coding," IEEE Trans. on Information Theory, Vol. IT-24, No.5, pp.530-536, Sep. 1978.




Return to Top of Page
Return to Posters Index