Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to CBOR as the de facto serialization format #253

Open
marceline-cramer opened this issue Nov 29, 2023 · 9 comments
Open

Migrate to CBOR as the de facto serialization format #253

marceline-cramer opened this issue Nov 29, 2023 · 9 comments
Labels
complex High difficulty to accomplish enhancement New feature or request guest Deals with guest side code host Deals with host side code

Comments

@marceline-cramer
Copy link
Collaborator

marceline-cramer commented Nov 29, 2023

We've been using JSON, which bloats message size, is inefficient to encode and decode floats and integers, and requires conversion of blobs to base64 strings in order to have reasonable fast serialization. This kinda sucks!

A new message format that alleviates these problems should have the following properties:

  • Binary format. Efficient data representation, especially for raw byte arrays.
  • Schema-less. Can be made human-readable without the need for an external schema definition. Can be directly imported into a scripting environment as an object type.
  • Simple. Easy to work with.

CBOR satisfies all of these goals. No more base64! No more processes bottlenecked by ser/de!

To-do:

  • guest: make send/recv serialized by default and add send_raw/recv_raw for bare messages #278
  • add ciborium as a workspace dep
  • remove serde_with from schema and DON'T base64-encode anything
  • add ciborium to runtime and use it to serialize events in PubSub, deserialize messages in SinkProcess, and serialize responses in RequestResponseProcess.
  • swap out serde_json for ciborium in init
  • swap out serde_json for ciborium in run_wasm example
  • remove serde_json dep from workspace, schema, ctl, init, wasm, server, fs, and runtime.
  • swap out serde_json for ciborium in guest
  • add ciborium to kindling workspace deps
  • update all of the kindling crates that use serde_json to ciborium
  • remove ciborium from kindling

Then, after #197:

  • update ByteVec with custom Deserialize and Serialize implementations that read and write the serde bytes data type directly
  • rework JsonAssetLoader into CborAssetLoader
@marceline-cramer marceline-cramer added enhancement New feature or request host Deals with host side code labels Nov 29, 2023
@squeaktoy
Copy link

There's also CBOR which is inspired by MessagePack but is also standardized at IETF and has a few features that MessagePack doesn't support, like differentiating between text strings and binary strings afaik.
There's also flatbuffers or rkyv which are zero-copy so they're very fast and that sounds fitting for games, but they're probably not that similar to JSON.

@marceline-cramer
Copy link
Collaborator Author

Yeah, differentiating text and binary strings does sound like it'd come in handy, especially for scripting compatibility. I'll take a look at CBOR.

I've used Flatbuffers and looked at rkyv before but they don't fit our criteria for being schemaless. Both of these will definitely come in handy if we ever need very application-specific encoding formats where minimizing bandwidth is of utmost performance.

Thank you for your contribution! It's always good to see new faces. :)

@marceline-cramer
Copy link
Collaborator Author

CBOR looks fantastic!

ciborium has serde support and a Value type that should integrate with future scripting languages really easily.

I was really concerned about how CBOR might encode very large integers, but this makes it seem like there isn't a huge difference.

Yeah, let's use CBOR instead.

@marceline-cramer marceline-cramer changed the title Migrate to MessagePack as the de facto serialization format Migrate to CBOR as the de facto serialization format Nov 30, 2023
@squeaktoy
Copy link

squeaktoy commented Nov 30, 2023 via email

@marceline-cramer
Copy link
Collaborator Author

To answer your question both ways...

The reason why I decided on CBOR over MessagePack is:

  • ciborium has a Value API, which is much easier to work with compared to rmp's streaming encode/decode API
  • CBOR is a web standard

The reason why I decided so quickly was because indecision can really paralyze these kinds of big sweeping design decisions, and even if there's some big mistake and we have to make all the same changes to the repository again with MessagePack instead or some other protocol, that's better than never attempting to migrate at all, or spending a lot of developer time studying and angsting over small differences between protocols. Since Hearth is such a small project right now, we have the benefit of really high agility in these kinds of decisions, so we should be taking advantage of that.

@squeaktoy
Copy link

squeaktoy commented Nov 30, 2023 via email

@marceline-cramer marceline-cramer added the guest Deals with guest side code label Dec 7, 2023
@marceline-cramer
Copy link
Collaborator Author

Could also ditch serde entirely and use this to reduce wasm binary sizes: https://docs.rs/minicbor/latest/minicbor/

@Meister1593
Copy link
Contributor

When #278 closes I would like to try to dab into this, sounds fun and interesting

@marceline-cramer
Copy link
Collaborator Author

When #278 closes I would like to try to dab into this, sounds fun and interesting

That would be fantastic, thank you so much! I figure that we should save potentially using minicbor for later down the line, if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complex High difficulty to accomplish enhancement New feature or request guest Deals with guest side code host Deals with host side code
Projects
None yet
Development

No branches or pull requests

3 participants