How are the various "enum"s encoded? #36

dead-claudia · 2018-05-24T19:48:35Z

They're listed as strings in the spec, but it would seem highly inefficient to encode them that way. Are they in fact encoded as strings? (If not, you could encode them as LEB128 integers.)

Yoric · 2018-05-25T20:28:10Z

We have several binary encodings and we're still experimenting and tweaking them to improve compression and parse speed. The one with which we've been measuring parse speed does encode them as strings, which are themselves encoded as indices in the string table, as LEB32 integers. The tokenizer itself is optimized to perform LEB32 lookups instead of string lookups, so that's still quite fast and reasonably easy to compress.

We're also experimenting with encoding them as special interfaces, in a variant of the format which uses predictions on interfaces to improve compression, and this seems to observably decrease the size of the file. We haven't checked the impact on decompression speed.

dead-claudia · 2018-05-26T05:06:48Z

@Yoric LEB32? (You mean 32-bit little-endian integers?)

Yoric · 2018-05-26T06:27:59Z

Indeed, that's what I meant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are the various "enum"s encoded? #36

How are the various "enum"s encoded? #36

dead-claudia commented May 24, 2018

Yoric commented May 25, 2018

dead-claudia commented May 26, 2018

Yoric commented May 26, 2018 •

edited

How are the various "enum"s encoded? #36

How are the various "enum"s encoded? #36

Comments

dead-claudia commented May 24, 2018

Yoric commented May 25, 2018

dead-claudia commented May 26, 2018

Yoric commented May 26, 2018 • edited

Yoric commented May 26, 2018 •

edited