Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are the various "enum"s encoded? #36

Open
dead-claudia opened this issue May 24, 2018 · 3 comments
Open

How are the various "enum"s encoded? #36

dead-claudia opened this issue May 24, 2018 · 3 comments

Comments

@dead-claudia
Copy link

They're listed as strings in the spec, but it would seem highly inefficient to encode them that way. Are they in fact encoded as strings? (If not, you could encode them as LEB128 integers.)

@Yoric
Copy link
Collaborator

Yoric commented May 25, 2018

We have several binary encodings and we're still experimenting and tweaking them to improve compression and parse speed. The one with which we've been measuring parse speed does encode them as strings, which are themselves encoded as indices in the string table, as LEB32 integers. The tokenizer itself is optimized to perform LEB32 lookups instead of string lookups, so that's still quite fast and reasonably easy to compress.

We're also experimenting with encoding them as special interfaces, in a variant of the format which uses predictions on interfaces to improve compression, and this seems to observably decrease the size of the file. We haven't checked the impact on decompression speed.

@dead-claudia
Copy link
Author

@Yoric LEB32? (You mean 32-bit little-endian integers?)

@Yoric
Copy link
Collaborator

Yoric commented May 26, 2018

Indeed, that's what I meant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants