UTF-8

Description Glossary RFCs Publications Obsolete RFCs

Description:

Type:Single byte character set.

ISO/IEC 10646-1 defines a multi byte character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. Multi byte characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so called UCS transformation formats (UTF), each with different characteristics.

UTF-8 has a one byte encoding unit. It uses all bits of a byte, but has the quality of preserving the full US-ASCII range: US-ASCII characters are encoded in one byte having the normal US-ASCII value, and any byte with such a value can only stand for a US-ASCII character, and nothing else.


Glossary:


RFCs:

[RFC 3629] UTF-8, a transformation format of ISO 10646.


Publications:


Obsolete RFCs:

[RFC 2044] UTF-8, a transformation format of Unicode and ISO 10646.

[RFC 2279] UTF-8, a transformation format of ISO 10646.


Description Glossary RFCs Publications Obsolete RFCs