Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature - avoid utf-8 decoding for text frames #1376

Open
toppk opened this issue Jun 26, 2023 · 5 comments
Open

feature - avoid utf-8 decoding for text frames #1376

toppk opened this issue Jun 26, 2023 · 5 comments

Comments

@toppk
Copy link

toppk commented Jun 26, 2023

just because it is supposed to be in utf-8, doesn't mean I prefer it in that form. specifically, my usecase, is giving the data to orjson, and passing it around as an orjson.Fragment().

Here are the documents for that use case.

https://github.com/ijl/orjson#deserialize
https://github.com/ijl/orjson#fragment

looking at websockets code, if such a capability were to be implemented, it seems like we'd want to add an flag to WebSocketCommonProtocol() and then use it to force binary around the time it decides on whether to decode it or not, located here:

https://github.com/python-websockets/websockets/blob/main/src/websockets/legacy/protocol.py#L1053

I'd be happy to whip up a patch in case you would consider this feature request.

@aaugustin
Copy link
Member

I understand your use case and, indeed, you cannot do this with the current API.

For receiving frames, it would mean an API like websocket.recv(decode_text_frames=False). (Naming TBC.) Can you confirm that it's what you want? (Then, you get bytes in all cases so you cannot tell if it was a Text or Binary frame in the first plac; but you don't really care anyway.)

This raises the question of providing a symmetrical API for sending bytes (assumed to be valid UTF-8) as a Text frame. You didn't ask for this but I'd like to keep consistency between both sides.

@toppk
Copy link
Author

toppk commented Jun 26, 2023

That would work quite well. I guess I misunderstood the code, because it looked to me as if the recv() method is decoupled from where the actual processing of inbound data (read_message()). The solution you propose would certainly be more flexible.

@toppk
Copy link
Author

toppk commented Jun 28, 2023

just thinking about the send side, I think it really is less important. there aren't too many servers that are strict in what they accept, especially when they are expecting text. I think if we implement it for send, while the effect is the same (skip encode, skip decode), but the names of the options will be different, e.g: decode_text_frames=False for recv(), and send_as_text=True for send()

@aaugustin
Copy link
Member

Yes, we need to pick the names for both sides carefully and, ideally, consistently.

raw_utf8 is a name that could work for both sides. I'm not sure it's the best name we can find, though.

If we have two names, I'd like some symmetry e.g. using the words decode and encode.

@carlos-sarmiento
Copy link

I'm finding myself in the same position, trying to send data encoded with orjson as a text frame even when it is provided to websockets in binary form.

Any chance this gets added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants