Migrate to CBOR as the de facto serialization format #253

marceline-cramer · 2023-11-29T19:37:43Z

We've been using JSON, which bloats message size, is inefficient to encode and decode floats and integers, and requires conversion of blobs to base64 strings in order to have reasonable fast serialization. This kinda sucks!

A new message format that alleviates these problems should have the following properties:

Binary format. Efficient data representation, especially for raw byte arrays.
Schema-less. Can be made human-readable without the need for an external schema definition. Can be directly imported into a scripting environment as an object type.
Simple. Easy to work with.

CBOR satisfies all of these goals. No more base64! No more processes bottlenecked by ser/de!

To-do:

guest: make send/recv serialized by default and add send_raw/recv_raw for bare messages #278
add ciborium as a workspace dep
remove serde_with from schema and DON'T base64-encode anything
add ciborium to runtime and use it to serialize events in PubSub, deserialize messages in SinkProcess, and serialize responses in RequestResponseProcess.
swap out serde_json for ciborium in init
swap out serde_json for ciborium in run_wasm example
remove serde_json dep from workspace, schema, ctl, init, wasm, server, fs, and runtime.
swap out serde_json for ciborium in guest
add ciborium to kindling workspace deps
update all of the kindling crates that use serde_json to ciborium
remove ciborium from kindling

Then, after #197:

update ByteVec with custom Deserialize and Serialize implementations that read and write the serde bytes data type directly
rework JsonAssetLoader into CborAssetLoader

The text was updated successfully, but these errors were encountered:

squeaktoy · 2023-11-29T23:05:31Z

There's also CBOR which is inspired by MessagePack but is also standardized at IETF and has a few features that MessagePack doesn't support, like differentiating between text strings and binary strings afaik.
There's also flatbuffers or rkyv which are zero-copy so they're very fast and that sounds fitting for games, but they're probably not that similar to JSON.

marceline-cramer · 2023-11-30T01:31:16Z

Yeah, differentiating text and binary strings does sound like it'd come in handy, especially for scripting compatibility. I'll take a look at CBOR.

I've used Flatbuffers and looked at rkyv before but they don't fit our criteria for being schemaless. Both of these will definitely come in handy if we ever need very application-specific encoding formats where minimizing bandwidth is of utmost performance.

Thank you for your contribution! It's always good to see new faces. :)

marceline-cramer · 2023-11-30T01:43:47Z

CBOR looks fantastic!

ciborium has serde support and a Value type that should integrate with future scripting languages really easily.

I was really concerned about how CBOR might encode very large integers, but this makes it seem like there isn't a huge difference.

Yeah, let's use CBOR instead.

squeaktoy · 2023-11-30T16:34:42Z

What made you decide on CBOR over MessagePack so quickly? Apparently the text string/binary string issue is misinformation on my part according to this article: https://diziet.dreamwidth.org/2020/07/14/

marceline-cramer · 2023-11-30T17:59:31Z

To answer your question both ways...

The reason why I decided on CBOR over MessagePack is:

ciborium has a Value API, which is much easier to work with compared to rmp's streaming encode/decode API
CBOR is a web standard

The reason why I decided so quickly was because indecision can really paralyze these kinds of big sweeping design decisions, and even if there's some big mistake and we have to make all the same changes to the repository again with MessagePack instead or some other protocol, that's better than never attempting to migrate at all, or spending a lot of developer time studying and angsting over small differences between protocols. Since Hearth is such a small project right now, we have the benefit of really high agility in these kinds of decisions, so we should be taking advantage of that.

squeaktoy · 2023-11-30T18:27:08Z

That sounds pretty inspiring. I kind of resonate with paralization in regards to studying different standards and solutions. I often spend a lot of time investigating before I work on my projects and I'm a pretty slow developer, perhaps because of that :/

marceline-cramer · 2023-12-20T07:34:20Z

Could also ditch serde entirely and use this to reduce wasm binary sizes: https://docs.rs/minicbor/latest/minicbor/

Meister1593 · 2024-01-09T01:35:45Z

When #278 closes I would like to try to dab into this, sounds fun and interesting

marceline-cramer · 2024-01-09T05:09:40Z

When #278 closes I would like to try to dab into this, sounds fun and interesting

That would be fantastic, thank you so much! I figure that we should save potentially using minicbor for later down the line, if necessary.

marceline-cramer added enhancement New feature or request host Deals with host side code labels Nov 29, 2023

marceline-cramer changed the title ~~Migrate to MessagePack as the de facto serialization format~~ Migrate to CBOR as the de facto serialization format Nov 30, 2023

marceline-cramer added the guest Deals with guest side code label Dec 7, 2023

marceline-cramer added the complex High difficulty to accomplish label Jan 3, 2024

marceline-cramer mentioned this issue Jan 3, 2024

guest: make send/recv serialized by default and add send_raw/recv_raw for bare messages #278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to CBOR as the de facto serialization format #253

Migrate to CBOR as the de facto serialization format #253

marceline-cramer commented Nov 29, 2023 •

edited

squeaktoy commented Nov 29, 2023

marceline-cramer commented Nov 30, 2023

marceline-cramer commented Nov 30, 2023

squeaktoy commented Nov 30, 2023 via email

marceline-cramer commented Nov 30, 2023

squeaktoy commented Nov 30, 2023 via email

marceline-cramer commented Dec 20, 2023

Meister1593 commented Jan 9, 2024

marceline-cramer commented Jan 9, 2024

Migrate to CBOR as the de facto serialization format #253

Migrate to CBOR as the de facto serialization format #253

Comments

marceline-cramer commented Nov 29, 2023 • edited

squeaktoy commented Nov 29, 2023

marceline-cramer commented Nov 30, 2023

marceline-cramer commented Nov 30, 2023

squeaktoy commented Nov 30, 2023 via email

marceline-cramer commented Nov 30, 2023

squeaktoy commented Nov 30, 2023 via email

marceline-cramer commented Dec 20, 2023

Meister1593 commented Jan 9, 2024

marceline-cramer commented Jan 9, 2024

marceline-cramer commented Nov 29, 2023 •

edited