UTF-8 and ISO-8859-1 Overview
The ISO-8859-1 encoding standard provides for single byte representation of characters that are in the LATIN-1 language group. The Latin 1 language group includes:
- Afrikaans
- Finnish
- Italian
- Basque
- French
- Norwegian
- Catalan
- Galician
- Portuguese
- Danish
- English
- Scottish
- Dutch
- German
- Spanish
- English
- Icelandic
- Swedish
- Faeroese
- Irish
For a complete list of characters supported in the ISO-8859-1 character set refer to: http://unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
The following table lists examples of ISO-8859-1 (single byte) character encodings.
| Hex Value | Display Character | Description |
|---|---|---|
| 30 | 0 | The number zero |
| 41 | A | Capital letter A |
| A5 | ¥ | YEN SIGN |
| DF | ß | LATIN SMALL LETTER SHARP S |
| BF | ¿ | INVERTED QUESTION MARK |
| E6 | Æ | LATIN SMALL LIGATURE AE |
| FF | Ÿ | LATIN SMALL LETTER Y WITH DIAERESIS |
The UTF-8 encoding standard provides for multi-byte representation of characters that are in all language groups (over 650 languages). In some cases, characters are represented using one byte, in some cases two bytes, and in other cases they are represented using three and four bytes.
The table shows examples of characters that are encoded in UTF-8. In this example:
The first two characters have encodings that are identical to ISO-8859-1.
The next five characters are valid ISO-8859-1 characters but are encoded with two bytes in UTF-8 versus one byte in ISO-8859-1.
The remaining four characters are encoded as two bytes in UTF-8 and are NOT part of the ISO character set.
For a complete list of UTF-8 character encodings refer to: http://unicode.org/.
| Hex Value | Display Character | Description |
|---|---|---|
| 30 | 0 | The number zero |
| 41 | A | Capital letter A |
| C2 A5 | ¥ | YEN SIGN |
| C2 BF | ¿ | INVERTED QUESTION MARK |
| C3 9F | ß | LATIN SMALL LETTER SHARP S |
| C3 A6 | Æ | LATIN SMALL LIGATURE AE |
| C3 BF | Ÿ | LATIN SMALL LETTER Y WITH DIAERESIS |
| D1 88 | ш | CYRILLIC SMALL LETTER SHA |
| D1 89 | щ | CYRILLIC SMALL LETTER SHCHA |
| D1 8A | ъ | CYRILLIC SMALL LETTER HARD SIGN |
| D1 8B | ы | CYRILLIC SMALL LETTER YERU |