Representations of BOMs by encoding
The following table describes how BOMs are represented in various character encoding sets.
| Encoding | Representation (hex) | Representation (dec) |
|---|---|---|
| UTF-8 | EF BB BF† | 239 187 191 |
| UTF-16 (BE) | FE FF | 254 255 |
| UTF-16 (LE) | FF FE | 255 254 |
| UTF-32 (BE) | 00 00 FE FF | 0 0 254 255 |
| UTF-32 (LE) | FF FE 00 00 | 255 254 0 0 |
| UTF-7 | 2B 2F 76 and one of the following bytes: [ 38 | 39 | 2B | 2F ]† |
|
| UTF-1 | F7 64 4C | 247 100 76 |
| UTF-EBCDIC | DD 73 66 73 | 221 115 102 115 |
| SCSU | 0E FE FF† | 14 254 255 |
| BOCU-1 | FB EE 28optionally followed byFF† | 251 238 40 optionally followed by255 |
In UTF-8, this is not really a byte order mark. It identifies the text as UTF-8 but does not indicate anything about the byte order because UTF-8 does not have byte order issues.
In UTF-7, the fourth byte of the BOM, before encoding as base64, is 001111xx in binary, and xx depends on the next character (the first character after the BOM). Hence, technically, the fourth byte is not purely a part of the BOM, but also contains information about the next (non-BOM) character. For xx=00, 01, 10, 11, this byte is, respectively, 38, 39, 2B, or 2F when encoded as base64. If no following character is encoded, 38 is used for the fourth byte and the following byte is 2D.
SCSU allows other encodings of U+FEFF, the shown form is the signature recommended in UTR #6.
For BOCU-1 a signature changes the state of the decoder. Octet 0xFF resets the decoder to the initial state.