Deserializing error with UTF-8 BOM (Byte Order Mark) Content #1115

zenoxs · 2024-03-05T08:58:06Z

Deserializing Panic with UTF-8 BOM (Byte Order Mark) Content

I encounter an issue when attempting to deserialize a string encoded in UTF-8 with a Byte Order Mark (BOM). The deserializer throws the following error: Error("expected value", line: 1, column: 1).

How to Reproduce

To reproduce the issue, encode a JSON file in UTF-8 with BOM and use from_reader or from_str for deserialization.

Workaround

As a temporary workaround, I check if the file content begins with the first three bytes of the BOM and remove them if present:

use std::fs;

fn main() {
    // Specify the path to your file
    let file_path = "path/to/your/file_with_bom.json";

    // Read the file to a Vec<u8>
    let mut data = fs::read(file_path).unwrap();

    // UTF-8 BOM is three bytes: EF BB BF
    if data.starts_with(&[0xEF, 0xBB, 0xBF]) {
        // Remove the first three bytes (the BOM)
        data = data[3..].to_vec();
    }

    // Proceed with deserialization...
}

The text was updated successfully, but these errors were encountered:

valaphee · 2024-05-12T12:19:56Z

One way would be to handle it in Rust itself rust-lang/rfcs#2428 at least IETF RFC 3629 doesn't forbids it. (even though I'm personally against it, as it is a protocol detail)

But your file is theoretically not compliant with IETF RFC 7159 (even though this also not strictly forbidden in the beforementioned RFC as its a protocol detail)

Implementations MUST NOT add a byte order mark to the beginning of a
JSON text. In the interests of interoperability, implementations
that parse JSON texts MAY ignore the presence of a byte order mark
rather than treating it as an error.

Either way its at least totally valid to ignore the BOM to be still conformant.

dtolnay changed the title ~~Deserializing Panic with UTF-8 BOM (Byte Order Mark) Content~~ Deserializing error with UTF-8 BOM (Byte Order Mark) Content Mar 5, 2024

mistydemeo mentioned this issue Mar 14, 2024

fix: parse JSON with UTF-8 BOMs axodotdev/axoasset#87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserializing error with UTF-8 BOM (Byte Order Mark) Content #1115

Deserializing error with UTF-8 BOM (Byte Order Mark) Content #1115

zenoxs commented Mar 5, 2024

valaphee commented May 12, 2024 •

edited

Deserializing error with UTF-8 BOM (Byte Order Mark) Content #1115

Deserializing error with UTF-8 BOM (Byte Order Mark) Content #1115

Comments

zenoxs commented Mar 5, 2024

Deserializing Panic with UTF-8 BOM (Byte Order Mark) Content

How to Reproduce

Workaround

valaphee commented May 12, 2024 • edited

valaphee commented May 12, 2024 •

edited