Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future specification on binary format #27

Open
MMZK1526 opened this issue Jul 14, 2023 · 8 comments
Open

Future specification on binary format #27

MMZK1526 opened this issue Jul 14, 2023 · 8 comments

Comments

@MMZK1526
Copy link
Contributor

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

@loreto
Copy link
Contributor

loreto commented Jul 14, 2023

A similar question came up here: jetify-com/typeid-go#5

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

@MMZK1526
Copy link
Contributor Author

MMZK1526 commented Jul 14, 2023

So definitely open to having the spec define a formal binary representation. Did you already have a particular binary representation in mind?

I'm still experimenting with it. Currently, I have an 8-bit length indicator of the prefix followed by the raw ASCII of the prefix, then followed by the normal encoding of the UUID.

It's not the most compact way of doing so, e.g. the length only needs 6 bits and each letter only 5 bits. I think I'm happy with what I'm doing now for my particular use case (since I don't need to squeeze every inch of space), but it may not be very suitable as a standard way defined in a spec.

Another possibility is (if we use 5 bits to encode each letter) to stop encoding the length but fuse a separator indicator with the last letter, since normally there are 32 - 26 = 6 unused bits.

@loreto
Copy link
Contributor

loreto commented Jul 14, 2023

For the spec I think we need to answer what we're trying to optimize for. Things running through my mind include:

  • How important is size? Do we want the absolute minimal encoding, or are we willing to trade it off for something else? (say speed)
  • How important is performance? Should we try to keep the encoding 8-bit aligned?
  • Sortable. TypeIDs promise to be k-sortable, and in some applications like DBs their sorting order is important. Do we want the binary representation to guarantee the same sorting order as the string representation? (if not, some implementations might end up sorting differently depending on which of the representations they use for sorting)

Do you have any thoughts on these?

@loreto
Copy link
Contributor

loreto commented Jul 14, 2023

Tagging people who have implemented typeid libraries in other languages: @cbuctok @sloanelybutsurely @fxlae @softprops @faustbrian @akhundMurad @broothie @conradludgate @johnnynotsolucky @Frizlab @ongteckwu @tensorush

Do you have a need for a binary encoding specification? If so, what properties do you think are important for your use cases?

@conradludgate
Copy link
Contributor

For a binary encoding, I would expect to have an already typed binary schema. In that case, I'd personally use a UUID big endian 16 byte encoding rather than create anything bespoke. Since my binary schema would already be typed, I would forfeit the type prefix.

For a nontyped binary format like cbor, I could imagine a custom encoding though. Cbor has no byte alignment properties so I would perhaps encode the prefix str and the 16 bytes as a cbor array

@loreto
Copy link
Contributor

loreto commented Jul 14, 2023

I wonder if we're better off not defining a binary encoding as part of the spec and leaving it up to the use case. The examples @conradludgate gives make me think the ideal encoding is use-case dependent. If you can already guarantee the type in your binary format, you can completely elide the prefix, and re-introduce it when decoding the binary representation. If you want to encode the type, you might be better off using the representation suggested by the format you're using (i.e. cbor might represent a string + vector one way, protobufs and jsonb might do it a different way)

@Frizlab
Copy link
Contributor

Frizlab commented Jul 14, 2023

I created the lib just "because I could" and am not using it, so I'd be happy with whatever binary encoding specs you guys will come up with 🙂

@akhundMurad
Copy link
Contributor

IMO, it would be better to define several possible encoding options for a variety of use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants