Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text encoding is changed when passing a utf-8 string to content #73

Closed
konstin opened this issue Sep 4, 2020 · 7 comments · Fixed by #78
Closed

Text encoding is changed when passing a utf-8 string to content #73

konstin opened this issue Sep 4, 2020 · 7 comments · Fixed by #78

Comments

@konstin
Copy link

konstin commented Sep 4, 2020

I'd expect that utf-8 characters (such as German umlauts) are returned the way I passed them to content, however this is not the case:

import httpx
import respx

with respx.mock() as respx_mock:
    text_in = "Gehäusegröße"
    respx_mock.get("https://example.com", content=text_in)
    text_out = httpx.get("https://example.com").text
    print(text_in)
    print(text_out)
    print(text_in == text_out)
Gehäusegröße
GehäusegröÃ�e
False

Versions: respx 0.12.1 and httpx 0.14.3

@lundberg
Copy link
Owner

lundberg commented Sep 4, 2020

Interesting! Will investigate, thanks.

Debugging help appreciated ;-)

@lundberg
Copy link
Owner

I've now narrowed it down to the content type heder @konstin .

If you set the content type it should work. It would be nice though to default to utf-8.

I already had TODO:s both in a test and content handling, mentioning this "problem" ;-)

Here's your example, modified, that should work:

import httpx
import respx

with respx.mock() as respx_mock:
    text_in = "Gehäusegröße"
    respx_mock.get("https://example.com", content=text_in, content_type="text/plain; charset=utf-8")
    text_out = httpx.get("https://example.com").text                                 #  ^
    print(text_in)
    print(text_out)
    print(text_in == text_out)

I need to dig into HTTPX to see how it's solved there, default-wise. Happy to see input or feedback how to best solve this.

@lundberg
Copy link
Owner

Worth mentioning, RESPX sets a default Content-Type header, if not specified.

This might be the problem or the place to fix this issue.

When I change this default header to text/plain; charset=utf-8, a test with your input "Gehäusegröße" works without specifying content_type when mocking @konstin .

But, if I remove setting the default content type header completely, your test input still works, but unfortunately fails the existing test input "äpple".

@konstin
Copy link
Author

konstin commented Sep 11, 2020

Thank you for investigating, setting text/plain; charset=utf-8 works!

I'd although like a utf-8 default; If I understand the new docs correctly, encode/httpx#1269 might solve this without any extra work.

@lundberg
Copy link
Owner

Awesome if this could be solved in HTTPX!

If/when that HTTPX PR is merged, we'll need to decide if it's enough to just remove mentioned RESPX todo's and the encoding-code with it.

@lundberg
Copy link
Owner

Update...

With HTTPX 0.15.0 and recent RESPX alignments in master to httpcore 0.11.0, this seems to work now without specifying charset or content type @konstin !

I've extended the content test with your example in #78 to clarify.

Still not sure what to do with our default Content-Type header, or the hard utf-8 content encoding in the response template.

@lundberg
Copy link
Owner

About the content encoding, I think we should introduce encoding option to ResponseTemplate to align with HTTPX response flexibility.

From HTTPX chardet PR:

*Users can override this behaviour if required with an explicit response.encoding = ...

We can have it set to utf-8 by default, but overridable when adding patterns to RESPX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants