Byte order mark definition
A byte order mark (BOM) is a special character or sequence of bytes placed at the beginning of a text file to indicate its encoding and byte order. BOMs are mostly used in Unicode encodings like UTF-8 and UTF-16. They help software interpret and display text in different character encodings.
Many modern programming languages and text editors can detect the encoding of a text file without relying on the byte order mark. But BOMs are still common in various contexts to ensure compatibility and prevent encoding issues. For example, it’s useful when dealing with files that may be shared between different platforms and software.
See also: encoding
Functions of a byte order mark
- Character encoding identification. BOMs indicate the character encoding of a text file. They signal to software how to interpret the bytes in the file and how to decode the characters. This is particularly important for Unicode encodings like UTF-8 and UTF-16, which have byte order and encoding variants.
- Byte order indication. In UTF-16, which can be either big-endian or little-endian, the BOM indicates the byte order. It tells software whether the most significant byte (MSB) or the least significant byte (LSB) comes first in a multi-byte character. Using the wrong byte order can lead to an incorrect interpretation of the text.
- Compatibility. BOMs are used to ensure that text files are compatible across different systems and software. They help interpret the text correctly regardless of the default encoding or byte order of the receiving system.
How a byte order mark works
- When you open a text file, the software examines its initial bytes. If it encounters a byte order mark, it uses the information encoded in it to determine the text’s character encoding and byte order.
- The BOM contains information about the character encoding, such as UTF-8, UTF-16LE, or UTF-16BE. These details are essential because they define how characters are represented in binary form.
- Once the software finds the BOM and gets the correct character encoding and byte order, it reads and processes the text accordingly. This ensures that characters are displayed accurately.