Skip to content

fix: set { stream: true } when calling decoder.decode on multiple chunks #11409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

mustafa0x
Copy link
Contributor

@mustafa0x mustafa0x commented Dec 20, 2023

Fixes #11044

The bug

  1. Visit this page: https://svkit-bug.slk.is
  2. click "go"

� will appear. refreshing fixes.

image

The reason

https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode#stream

A boolean flag indicating whether additional data will follow in subsequent calls to decode().
Set to true if processing the data in chunks, and false for the final chunk or if the data is not chunked.
It defaults to false.

Please don't delete this checklist! Before submitting the PR, please make sure you do the following:

  • It's really useful if your PR references an issue where it is discussed ahead of time. In many cases, features are absent for a reason. For large changes, please create an RFC: https://github.com/sveltejs/rfcs
  • This message body should clearly illustrate what problems it solves.
  • Ideally, include a test that fails without this PR but passes with it.

Tests

  • Run the tests with pnpm test and lint the project with pnpm lint and pnpm check

Changesets

  • If your PR makes a change that should be noted in one or more packages' changelogs, generate a changeset by running pnpm changeset and following the prompts. Changesets that add features should be minor and those that fix bugs should be patch. Please prefix changeset messages with feat:, fix:, or chore:.

Sorry, something went wrong.

Copy link

changeset-bot bot commented Dec 20, 2023

🦋 Changeset detected

Latest commit: 57cdb28

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@sveltejs/kit Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@benmccann
Copy link
Member

The linked MDN docs state that stream: false should be called on the final chunk, but we're never calling it here

https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode#options

@mustafa0x
Copy link
Contributor Author

Thanks for the feedback @benmccann!

I'm unable to glean from the code what condition to check to know whether the loop is on the last chunk.

@benmccann benmccann changed the title client.load_data TextDecoder: add {stream: true} (fixes #11044) fix: set { stream: true } when calling decoder.decode on multiple chunks Dec 21, 2023
@dummdidumm
Copy link
Member

I also don't see a way how we would know that it's the last chunk before receiving it with the empty value, at which point we don't need to call decode. Is there any docs what happens if you don't do it? Memory leak?

@mustafa0x
Copy link
Contributor Author

Perhaps add decode('', {stream: false})? Ie, call decode one final time, when value is empty.

Tangential,

  1. Until this PR is merged, should the docs warn to add data-sveltekit-reload to for sites that use multibyte characters?
  2. I find it strange that this hasn't been noticed before. Seems like all SvelteKit users have been by speakers of Latin-based languages?

@Rich-Harris
Copy link
Member

I think it's fine to just always use stream: true — AFAIK all it does is keep the incomplete byte sequence in memory so that it can prepend it to the next chunk. Assuming the response is well-formed, no state will be left over after the final chunk, and in any case the entire decoder will get garbage collected once we're done.

I asked ChatGPT and it basically concurs, FWIW.

I find it strange that this hasn't been noticed before. Seems like all SvelteKit users have been by speakers of Latin-based languages?

It looks like there are other necessary conditions besides multi-byte characters — the body needs to be large enough that the string is chunked, and the server needs to chunk it in an inconvenient way (in the repro this happens non-deterministically, and I can't reproduce it locally at all which suggests it may be a quirk of Caddy. It would make sense to me that most servers would strive to avoid splitting characters like this, and if my admittedly incomplete understanding of string encoding is correct, all it takes is to ensure that each chunk length is a multiple of 4 bytes). So I'm not particularly surprised that this didn't surface sooner!

@mustafa0x
Copy link
Contributor Author

Makes sense, thanks Rich!

in the repro this happens non-deterministically, and I can't reproduce it locally at all which suggests it may be a quirk of Caddy

Same here, as mentioned in #11044 (comment)

I am unable to reproduce this bug locally, even when using pnpm run preview. I suspect the webserver (caddy in my case) may be affecting things, with how it passes/buffers bytes from svkit to the client.

@Rich-Harris Rich-Harris merged commit f8d3757 into sveltejs:main Jan 8, 2024
@github-actions github-actions bot mentioned this pull request Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

U+FFFD (REPLACEMENT CHARACTER) in text
4 participants