Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tokenizer): Drop chunks after emitting tokens #432

Merged
merged 8 commits into from Mar 4, 2022
Merged

Conversation

fb55
Copy link
Collaborator

@fb55 fb55 commented Mar 3, 2022

Fixes #292, #357

We would sometimes drop chunks before the corresponding tokens were processed by the rewriting stream. This led to #292. Now, we will drop buffers after emitting events.

BREAKING CHANGE: The SAX parser and rewriting stream will now emit text events when the underlying buffer will be dropped right after the event.

@qnighy I've adopted your test from #357 — hope that's okay! I'd also love it if you could make sure this works for you.

fb55 and others added 4 commits March 3, 2022 11:55
Fixes #357

Co-Authored-By: Masaki Hara <41755+qnighy@users.noreply.github.com>
Otherwise we might miss some data
And use it to emit text blocks from sax parser as soon as we might drop the buffer.
@@ -536,6 +542,7 @@ export class Tokenizer {
if (this.currentCharacterToken.type !== type) {
this.currentLocation = this.getCurrentLocation(0);
this._emitCurrentCharacterToken(this.currentLocation);
Copy link
Collaborator Author

@fb55 fb55 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't drop the parsed chunk in _emitCurrentCharacterToken, as it is called from prepareToken and could remove the buffer for the token right after the character token.

Copy link
Collaborator

@wooorm wooorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not my area of expertise in this code base. But if the test was broken before and works now, 👍

@fb55 fb55 merged commit 790c756 into master Mar 4, 2022
@fb55 fb55 deleted the fix/drop-chunk branch March 6, 2022 13:00
@qnighy
Copy link

qnighy commented Mar 26, 2022

Thanks a lot for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewriter: text content longer than 65536 chars is truncated
3 participants