You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just tried reading a JSON file that I wrote using PowerShell and got the following failure:
(Error: (Corrupted_Format (File_Like.Value (File foo.json)) 'Parse error in parsing JSON: Unexpected character (\'\' (code 65279 / 0xfeff)): expected a valid value (JSON String, Number, Array, Object or token \'null\', \'true\' or \'false\') at position [line: 1, column: 2].' (Invalid_JSON.Error 'Unexpected character (\'\' (code 65279 / 0xfeff)): expected a valid value (JSON String, Number, Array, Object or token \'null\', \'true\' or \'false\') at position [line: 1, column: 2]')))
The 0xfeff character seems to be the problem.
I imagine we need to strip whitespace before parsing.
A smallest repro to see the bug is:
'\ufeff{}'.parse_json
I think Enso should be able to handle files encoded with BOM.
We may want to revise how our CSV parser handles such cases too.
Changes
By default, we should read the BOM when reading text in from a stream.
Our default "charset" should be cleverer and use the BOM to determine which Unicode charset to use.
If the default "charset" encounters an invalid character then we should fallback to Windows-1252.
This should be the case for reading plain text, delimited tables, JSON and XML.
The error reporting should only report a few of the failing indexes (3) and a count of the total failures.
The text was updated successfully, but these errors were encountered:
Radosław Waśko reports a new STANDUP for yesterday (2024-05-22):
Progress: Fixed the failing test and got the union PR merged. Introduced Encoding.Default and set it as default for relevant read operations. Added tests for BOM handling and detection heuristics. It should be finished by 2024-05-28.
Next Day: Next day I will be working on the same task. Work on BOM detection. Investigate Datalinks issue. Types work.
I just tried reading a JSON file that I wrote using PowerShell and got the following failure:
The
0xfeff
character seems to be the problem.I imagine we need to strip whitespace before parsing.
A smallest repro to see the bug is:
I think Enso should be able to handle files encoded with BOM.
We may want to revise how our CSV parser handles such cases too.
Changes
By default, we should read the BOM when reading text in from a stream.
Our default "charset" should be cleverer and use the BOM to determine which Unicode charset to use.
If the default "charset" encounters an invalid character then we should fallback to Windows-1252.
This should be the case for reading plain text, delimited tables, JSON and XML.
The error reporting should only report a few of the failing indexes (3) and a count of the total failures.
The text was updated successfully, but these errors were encountered: