Future specification on binary format #27

MMZK1526 · 2023-07-14T17:13:18Z

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

loreto · 2023-07-14T17:17:44Z

A similar question came up here: jetify-com/typeid-go#5

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

MMZK1526 · 2023-07-14T17:24:04Z

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

I'm still experimenting with it. Currently, I have an 8-bit length indicator of the prefix followed by the raw ASCII of the prefix, then followed by the normal encoding of the UUID.

It's not the most compact way of doing so, e.g. the length only needs 6 bits and each letter only 5 bits. I think I'm happy with what I'm doing now for my particular use case (since I don't need to squeeze every inch of space), but it may not be very suitable as a standard way defined in a spec.

Another possibility is (if we use 5 bits to encode each letter) to stop encoding the length but fuse a separator indicator with the last letter, since normally there are 32 - 26 = 6 unused bits.

loreto · 2023-07-14T17:39:06Z

For the spec I think we need to answer what we're trying to optimize for. Things running through my mind include:

How important is size? Do we want the absolute minimal encoding, or are we willing to trade it off for something else? (say speed)
How important is performance? Should we try to keep the encoding 8-bit aligned?
Sortable. TypeIDs promise to be k-sortable, and in some applications like DBs their sorting order is important. Do we want the binary representation to guarantee the same sorting order as the string representation? (if not, some implementations might end up sorting differently depending on which of the representations they use for sorting)

Do you have any thoughts on these?

loreto · 2023-07-14T17:42:44Z

Tagging people who have implemented typeid libraries in other languages: @cbuctok @sloanelybutsurely @fxlae @softprops @faustbrian @akhundMurad @broothie @conradludgate @johnnynotsolucky @Frizlab @ongteckwu @tensorush

Do you have a need for a binary encoding specification? If so, what properties do you think are important for your use cases?

conradludgate · 2023-07-14T17:47:44Z

For a binary encoding, I would expect to have an already typed binary schema. In that case, I'd personally use a UUID big endian 16 byte encoding rather than create anything bespoke. Since my binary schema would already be typed, I would forfeit the type prefix.

For a nontyped binary format like cbor, I could imagine a custom encoding though. Cbor has no byte alignment properties so I would perhaps encode the prefix str and the 16 bytes as a cbor array

loreto · 2023-07-14T17:53:59Z

I wonder if we're better off not defining a binary encoding as part of the spec and leaving it up to the use case. The examples @conradludgate gives make me think the ideal encoding is use-case dependent. If you can already guarantee the type in your binary format, you can completely elide the prefix, and re-introduce it when decoding the binary representation. If you want to encode the type, you might be better off using the representation suggested by the format you're using (i.e. cbor might represent a string + vector one way, protobufs and jsonb might do it a different way)

Frizlab · 2023-07-14T23:29:59Z

I created the lib just "because I could" and am not using it, so I'd be happy with whatever binary encoding specs you guys will come up with 🙂

akhundMurad · 2023-07-15T08:06:47Z

IMO, it would be better to define several possible encoding options for a variety of use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future specification on binary format #27

Future specification on binary format #27

MMZK1526 commented Jul 14, 2023

loreto commented Jul 14, 2023

MMZK1526 commented Jul 14, 2023 •

edited

loreto commented Jul 14, 2023

loreto commented Jul 14, 2023

conradludgate commented Jul 14, 2023

loreto commented Jul 14, 2023 •

edited

Frizlab commented Jul 14, 2023

akhundMurad commented Jul 15, 2023

Future specification on binary format #27

Future specification on binary format #27

Comments

MMZK1526 commented Jul 14, 2023

loreto commented Jul 14, 2023

MMZK1526 commented Jul 14, 2023 • edited

loreto commented Jul 14, 2023

loreto commented Jul 14, 2023

conradludgate commented Jul 14, 2023

loreto commented Jul 14, 2023 • edited

Frizlab commented Jul 14, 2023

akhundMurad commented Jul 15, 2023

MMZK1526 commented Jul 14, 2023 •

edited

loreto commented Jul 14, 2023 •

edited