Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode \uXXXX unescapes as ASCII_8BIT #484

Closed
wants to merge 1 commit into from

Conversation

davishmcclurg
Copy link

This fixes an incompatible encoding issue when the source string mixes
\uXXXX escape sequences with unescaped UTF-8 characters:

>> JSON::Pure::Parser.new('"\u00e9 é"').parse
/Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:208:in `rescue in parse_string': Caught Encoding::CompatibilityError at '': incompatible character encodings: UTF-8 and ASCII-8BIT (JSON::ParserError)
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:169:in `parse_string'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:231:in `parse_value'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:123:in `parse'
	from (irb):2:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'
/Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:172:in `gsub': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:172:in `parse_string'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:231:in `parse_value'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:123:in `parse'
	from (irb):2:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'

It looks like gsub raises this error when you call it on a
force-encoded ascii-8bit string and replace with a utf-8 one (as is done
in convert_encoding and parse_string):

>> "x é".encode('utf-8').force_encoding('ascii-8bit').gsub('x', 'é')
(irb):11:in `gsub': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from (irb):11:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'

To fix the issue, this adds another force_encoding to make the
replacement string encoding match the source one. I believe the behavior
should be the same as convert_encoding.

This fixes an incompatible encoding issue when the source string mixes
`\uXXXX` escape sequences with unescaped UTF-8 characters:

```
>> JSON::Pure::Parser.new('"\u00e9 é"').parse
/Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:208:in `rescue in parse_string': Caught Encoding::CompatibilityError at '': incompatible character encodings: UTF-8 and ASCII-8BIT (JSON::ParserError)
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:169:in `parse_string'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:231:in `parse_value'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:123:in `parse'
	from (irb):2:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'
/Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:172:in `gsub': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:172:in `parse_string'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:231:in `parse_value'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/json-2.6.0/lib/json/pure/parser.rb:123:in `parse'
	from (irb):2:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'
```

It looks like `gsub` raises this error when you call it on a
force-encoded ascii-8bit string and replace with a utf-8 one (as is done
in [`convert_encoding`][0] and [`parse_string`][1]):

```
>> "x é".encode('utf-8').force_encoding('ascii-8bit').gsub('x', 'é')
(irb):11:in `gsub': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from (irb):11:in `<main>'
	from /Users/dharsha/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `load'
	from /Users/dharsha/.rbenv/versions/3.0.2/bin/irb:23:in `<main>'
```

To fix the issue, this adds another `force_encoding` to make the
replacement string encoding match the source one. I believe the behavior
should be the same as `convert_encoding`.

[0]: https://github.com/flori/json/blob/442318643a5a42ca0fe3805b84a5e9dba5b34f73/lib/json/pure/parser.rb#L135-L147
[1]: https://github.com/flori/json/blob/442318643a5a42ca0fe3805b84a5e9dba5b34f73/lib/json/pure/parser.rb#L182
@davishmcclurg
Copy link
Author

Oops, duplicate of #483

davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request Jun 7, 2023
Truffleruby's version of json/pure has a bug that's triggered by these
files. It's been fixed but the fix isn't in Truffleruby yet:

- flori/json#483
- flori/json#484

This skips the files when the error is raised and drops them before
checking fixtures.
davishmcclurg added a commit to davishmcclurg/json_schemer that referenced this pull request Jun 7, 2023
Truffleruby's version of json/pure has a bug that's triggered by these
files. It's been fixed but the fix isn't in Truffleruby yet:

- flori/json#483
- flori/json#484

This skips the files when the error is raised and drops them before
checking fixtures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant