Documentation Center

Representations of BOMs by encoding

The following table describes how BOMs are represented in various character encoding sets.

EncodingRepresentation (hex)Representation (dec)
UTF-8EF BB BF†239 187 191
UTF-16 (BE)FE FF254 255
UTF-16 (LE)FF FE255 254
UTF-32 (BE)00 00 FE FF0 0 254 255
UTF-32 (LE)FF FE 00 00255 254 0 0
UTF-72B 2F 76 and one of the following bytes: [ 38 | 39 | 2B | 2F ]†

43 47 and one of the following bytes: [ 56 | 57 | 43 | 47 ]

UTF-1F7 64 4C247 100 76
UTF-EBCDICDD 73 66 73221 115 102 115
SCSU0E FE FF†14 254 255
BOCU-1FB EE 28optionally followed byFF† 251 238 40 optionally followed by255

In UTF-8, this is not really a byte order mark. It identifies the text as UTF-8 but does not indicate anything about the byte order because UTF-8 does not have byte order issues.

In UTF-7, the fourth byte of the BOM, before encoding as base64, is 001111xx in binary, and xx depends on the next character (the first character after the BOM). Hence, technically, the fourth byte is not purely a part of the BOM, but also contains information about the next (non-BOM) character. For xx=00, 01, 10, 11, this byte is, respectively, 38, 39, 2B, or 2F when encoded as base64. If no following character is encoded, 38 is used for the fourth byte and the following byte is 2D.

SCSU allows other encodings of U+FEFF, the shown form is the signature recommended in UTR #6.

For BOCU-1 a signature changes the state of the decoder. Octet 0xFF resets the decoder to the initial state.