Provide support for UUID type (a.k.a. GUID) #2224

jtattermusch · 2016-10-06T07:01:55Z

Filing on behalf of a customer:
Protobuf lacks Uuid (Guid in .NET) support out of the box. It would have been nice to have a Well-Known Type (like we do with Timestamp to represent Date and Times) since Uuids are pretty common, particularly in distributed systems.

jtattermusch · 2016-10-06T07:03:12Z

CC @jskeet who was involved in the discussion.

DanFTRX · 2018-08-02T19:47:50Z

Is this still on the roadmap?

xfxyjwf · 2018-08-02T20:37:35Z

No, this is not on our roadmap.

listepo · 2019-08-26T23:00:29Z

Any news?

mihaimyh · 2019-11-01T09:28:15Z

No, this is not on our roadmap.

Why not?

jtattermusch · 2019-11-01T12:36:14Z

Regardless of whether this is on the roadmap or not, I can see two possible designs:

Option 1

// Message representing a version 4 universally unique identifier. See
// rfc/4122#section-4.4 for additional information.
message UUID {
  // The two int64s below, should be populated with the most and least
  // significant 64 bits of a version 4 UUID.
  // (e.g., https://docs.oracle.com/javase/8/docs/api/java/util/UUID.html).
  uint64 most_significant_uuid_bits = 1
  uint64 least_significant_uuid_bits = 2
}

Option 2

message UUID {
  string value = 1
}

kucint · 2020-01-03T12:32:35Z

the UUID represented by two uint64 values will have problem with endianness: see how-do-i-represent-a-uuid-in-a-protobuf-message discussion.

gmabey · 2020-02-19T20:49:08Z

doesn't RFC4122 section 4.1.2 present a solution to the problem identified by @kucint ?

onesteveo · 2020-09-03T00:35:45Z

bump

bill-poole · 2021-01-07T11:40:52Z

I think @gmabey is correct in that RFC 4122 section 4.1.2 presents a solution to allow a UUID to be encoded in binary (as opposed to text) and allow the endianness to be handled by the protobuf encoding layer (as opposed to at the application layer). This approach would have a proto-spec like below.

// A UUID, encoded in accordance with section 4.1.2 of RFC 4122.
message Uuid {
	// The low field of the timestamp (32 bits).
	fixed32 time_low = 1;

	// The middle field of the timestamp (16 bits).
	uint32 time_mid = 2;

	// The high field of the timestamp multiplexed with the version number (16 bits).
	uint32 time_hi_and_version = 3;

	// The high field of the clock sequence multiplexed with the variant (8 bits).
	uint32 clock_seq_hi_and_reserved = 4;

	// The low field of the clock sequence (8 bits).
	uint32 clock_seq_low = 5;

	// The spatially unique node identifier (48 bits).
	uint64 node = 6;
}

This would be encoded from a System.Guid in .NET/C# as follows.

Span<byte> bytes = stackalloc byte[16];
guid.TryWriteBytes(bytes);
TimeLow = BinaryPrimitives.ReadUInt32LittleEndian(bytes.Slice(0, 4));
TimeMid = BinaryPrimitives.ReadUInt16LittleEndian(bytes.Slice(4, 2));
TimeHiAndVersion = BinaryPrimitives.ReadUInt16LittleEndian(bytes.Slice(6, 2));
ClockSeqHiAndReserved = bytes[8];
ClockSeqLow = bytes[9];
Node = BinaryPrimitives.ReadUInt64BigEndian(bytes.Slice(8, 8)) & 0x0000FFFFFFFFFFFF;

... and decoded as follows.

checked
{
	Span<byte> bytes = stackalloc byte[16];
	BinaryPrimitives.WriteUInt32LittleEndian(bytes.Slice(0, 4), TimeLow);
	BinaryPrimitives.WriteUInt16LittleEndian(bytes.Slice(4, 2), (ushort)TimeMid);
	BinaryPrimitives.WriteUInt16LittleEndian(bytes.Slice(6, 2), (ushort)TimeHiAndVersion);
	BinaryPrimitives.WriteUInt64BigEndian(bytes.Slice(8), Node);
	bytes[8] = (byte)ClockSeqHiAndReserved;
	bytes[9] = (byte)ClockSeqLow;
	return new Guid(bytes);
}

I'm keen to get people's thoughts and feedback on this approach.

bill-poole · 2021-01-11T16:41:49Z

I've just done some benchmarking of string-encoded vs little endian byte array-encoded vs RFC 4122-encoded UUIDs in .NET 5 and the results are below.

Method	Mean	Error	StdDev
ConvertToStringUuid	52.498 ns	0.2467 ns	0.2308 ns
ConvertToLittleEndianBinaryUuid	64.575 ns	0.2746 ns	0.2293 ns
ConvertToRfc4122Uuid	10.849 ns	0.0662 ns	0.0620 ns
SerialiseStringUuid	54.187 ns	0.4128 ns	0.5511 ns
SerialiseLittleEndianByteArrayUuid	25.091 ns	0.1539 ns	0.1364 ns
SerialiseRfc4122Uuid	61.701 ns	0.4367 ns	0.4085 ns
DeserialiseStringUuid	224.519 ns	1.2474 ns	1.1058 ns
DeserialiseLittleEndianByteArrayUuid	217.149 ns	0.9792 ns	0.8177 ns
DeserialiseRfc4122Uuid	150.113 ns	0.4288 ns	0.3801 ns
ConvertFromStringUuid	82.985 ns	0.2628 ns	0.2459 ns
ConvertFromLittleEndianByteArrayUuid	2.475 ns	0.0123 ns	0.0115 ns
ConvertFromRfc4122Uuid	9.682 ns	0.0210 ns	0.0186 ns

The aggregate performance of the three approaches is:

Convert and serialise StringUuid: 106.685 ns
Convert and serialise LittleEndianBinaryUuid: 89.666 ns
Convert and serialise Rfc4122Uuid: 72.55 ns
Deserialise and convert StringUuid: 307.504 ns
Deserialise and convert LittleEndianBinaryUuid: 219.624 ns
Deserialise and convert Rfc4122Uuid: 159.795 ns

So the RFC 4122-based representation is fastest in both serialisation and deserialisation.

However, the StringUuid serialises to 38 bytes, the LittleEndianBinaryUuid to 18 bytes and the Rfc4122Uuid 27 bytes - according to the CalculateSize() method on each message type. So, while the RFC 4122-based encoding is faster, it is 50% larger than the little endian binary encoding on the wire.

Note also ByteString.UnsafeWrap (see #7645) will improve the ConvertToLittleEndianBinaryUuid performance when it is available.

tdhintz · 2021-01-12T22:04:14Z

Neither string or byte array are good solutions from a security perspective because they can be abused in certain kinds of DOS or fuzzing attacks. I like the idea of a specific implementation.

gmabey · 2021-01-13T15:47:52Z

@tdhintz Are you referring to something like

message WellKnownUUID {
    uint32 w1 = 1;
    uint32 w2 = 2;
    uint32 w3 = 3;
    uint32 w4 = 4;
}

There certainly isn't much variability to that structure!
Perhaps @billpoole-mi would be kind enough to benchmark that approach?

bill-poole · 2021-01-14T05:10:44Z

@gmabey you need to have the 6 UUID elements defined as per RFC 4122 for this approach to work because those 6 elements are all defined as unsigned integers and therefore by defining the message this way, we avoid any endianness issues.

For example, I'm assuming the w2 element in your WellKnownUUID message would correspond to the time_mid and time_hi_and_version RFC 4122 fields, but it isn't specified whether the high 16 bits are the time_hi_and_version or the low 16 bits.

It would of course be possible to specify how to read the two 16-bit values from w2 as part of the documentation of the WellKnownUUID message such that the converters to/from this type on each platform do so correctly. But if you're willing to move the responsibility for this into the converters, you might as well go all the way with it and define the message with two ulong fields.

You would then specify the first ulong value is time_low in the high 32 bits of the high 64-bit field, then time_mid in the high 16 bits of the low 32 bits of the high 64-bit field and time_hi_and_version in the low 16 bits of the high 64-bit field. You'd apply similar logic to the low 64-bit field.

This would likely result in a smaller message size (i.e. with 2 fields instead of 6), but carries the inconvenience of the converters having to deal with picking the 6 fields defined by RFC 4122 from the 2 64-bit ulong fields.

In the end, this is effectively defining the message as a 16-byte binary buffer and leaving it up to the converters to properly read/write the 6 values defined by RFC 4122 from/to the buffer.

tdhintz · 2021-01-14T08:17:15Z

@gmabey Yes, avoid use of arrays and strings (which really are just a specialized array).

bill-poole · 2021-01-15T04:20:55Z

I've now tested structuring the UUID message as two 64-bit fixed integers. The proto spec is below.

// A UUID, encoded in accordance with section 4.1.2 of RFC 4122.
message Uuid {
	// The high 64 bits of the UUID - MSB -> LSB: time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits).
	fixed64 high64 = 1;

	// The low 64 bits of the UUID - MSB -> LSB: clock_seq_hi_and_reserved (8 bits) | clock_seq_low (8 bits) | node (48 bits).
	fixed64 low64 = 2;
}

This is encoded from a System.Guid in .NET as follows.

Span<byte> bytes = stackalloc byte[16];
guid.TryWriteBytes(bytes);

// MSB -> LSB: time_low (32 bits) | time_mid (16 bits) | time_hi_and_version (16 bits).
High64 = ((ulong)BinaryPrimitives.ReadUInt32LittleEndian(bytes.Slice(0, 4)) << 32) // time_low
	| ((ulong)BinaryPrimitives.ReadUInt16LittleEndian(bytes.Slice(4, 2)) << 16) // time_mid
	| BinaryPrimitives.ReadUInt16LittleEndian(bytes.Slice(6, 2)); // time_hi_and_version

// MSB -> LSB: clock_seq_hi_and_reserved (8 bits) | clock_seq_low (8 bits) | node (48 bits).
Low64 = BinaryPrimitives.ReadUInt64BigEndian(bytes.Slice(8, 8));

It is converted back to a System.Guid as follows.

Span<byte> bytes = stackalloc byte[16];
BinaryPrimitives.WriteUInt32LittleEndian(bytes.Slice(0, 4), (uint)(High64 >> 32));
BinaryPrimitives.WriteUInt16LittleEndian(bytes.Slice(4, 2), (ushort)((High64 >> 16) & 0xFFFF));
BinaryPrimitives.WriteUInt16LittleEndian(bytes.Slice(6, 2), (ushort)(High64 & 0xFFFF));
BinaryPrimitives.WriteUInt64BigEndian(bytes.Slice(8, 8), Low64);
return new Guid(bytes);

The Uuid message size is 18 bytes (as opposed to 27 bytes when defining the Uuid message with the 6 individual fields defined by RFC 4122).

The conversion/serialisation/deserialisation benchmarks are below.

Method	Mean	Error	StdDev
ConvertToUuid	8.825 ns	0.0665 ns	0.0622 ns
SerialiseUuid	20.942 ns	0.0891 ns	0.0790 ns
DeserialiseUuid	96.178 ns	0.3735 ns	0.3494 ns
ConvertFromUuid	9.520 ns	0.0578 ns	0.0541 ns

Convert & serialise is 29.767 ns and deserialise & convert is 105.698 ns.

So this approach is much faster and more efficient on the wire than defining the Uuid message with the 6 fields defined by RFC 4122.

singhbaljit · 2021-04-30T19:39:56Z

For UUIDv4, MSB is both positive and negative, while the LSB is always negative. So, shouldn't it be sfixed64?

bill-poole · 2021-05-02T04:06:06Z

fixed64 and sfixed64 are the same on the wire. The only difference is how their bits are interpreted by the sending/receiving endpoints.

In this case, the bits are interpreted by breaking the 64 bits into the components defined by section 4.1.2 of RFC 4122.

That is, the 64 bits are never interpreted as a positive 64-but integer nor a negative 64-bit integer. Therefore, it is fine to encode it either way.

However, since the sign is semantically irrelevant, I think it’s better to encode as fixed64. It also makes the code that writes the UUID to the message simpler in .NET.

AtosNicoS · 2021-07-29T11:41:02Z

Are there any plans for implementing this? I guess we have a lot of good examples and speed tests available so it could be easy integrated.

perezd · 2021-07-29T16:00:19Z

We have no plans at this time to integrate this.

fowles · 2022-04-18T17:48:03Z

Most folks I have seen simply use a string for this.

I don't really see a path forward for this. The cost of adding this as a specific well known type is quite high as compared to having a third_party simply package their preferred proto with some helper functions.

gmabey · 2022-04-18T18:19:38Z

I certainly just use a string (without curly braces) for this myself. The benefits that seem to be implied were such an effort to be undertaken are: speed, security (I guess), and interoperability. The argument that a third_party function could implement this could be used against most of the data types currently supported by WellKnownTypes -- since you could serialize a date time to string and have a third_party function deserialize it.

Please do reply to this message if you dispute any of these points:

UUID is well known.
UUID is well defined.
UUID is very common. (in my world haha)
Protobuf messages would be "more well defined" (sigh, I wish there was a better way to say that) if a first-class (hrmm, I guess WellKnownTypes are second-class citizens) data type existed that standardized serialization/deserialization and conversion to/from platform specific classes (like in python).
Point (4.) would make including UUIDs (as a member of a message) less error prone.

fowles · 2022-04-18T18:58:40Z

I would dispute points (3) and (4).

Point 3: Some quick searches for code search indicates that absl::Time is about 100x more common than our UUID class within google's C++ codebase.

Point 4: Getting something cross language tends to nail down a bunch of painful corners in ways that are not helpful. The WellKnownType for time actually causes frequent impedance mismatches with language bindings that have slightly different concepts of time.

gmabey · 2022-04-18T19:11:10Z

@fowles I don't dispute your rebuttal to Point 3, but I didn't define "very" -- wahoo! :-D

Regarding your Point 4 rebuttal -- do you see any "painful corners" associated with UUIDs? Or, are you just complaining about corner cases of Google.Protobuf.WellKnownTypes.Timestamp? (if so, wrong thread ;-)

fowles · 2022-04-18T19:13:16Z

I don't know UUIDs particularly well. It is possible they are simpler enough that they won't hit such impedance mismatches. Regardless, point (3) alone is enough for me to continue to feel confident that this doesn't rise to the bar where we want to add it to the core of protobuffer.

lalomartins · 2022-04-18T19:26:15Z

Point 3: Some quick searches for code search indicates that absl::Time is about 100x more common than our UUID class within google's C++ codebase.

Is protobuf meant for Google's use only? 🤔

Quick check on the node.js world: protobufjs has 7,194,982 weekly downloads, UUID has 59,615,024 weekly downloads (and it's only one implementation). UUID is a standard, rfc4122, and its increasing adoption has been doing wonders to increase interoperability and reliability in various areas.

Realistically speaking, for a team starting a new project, the fact that protobuf has no UUID support is more likely to result in the team not using protobuf, than not using UUID.

As for point 4, there would be no impedance mismatches, since it is a standard. Yes, I've been to that place, having code that uses miliseconds since epoch talking to code that uses seconds since epoch, but UUID is UUID, that's the whole point of its existence.

fowles · 2022-04-18T19:32:18Z

protobuf is intended for public use, but Google maintains full ownership of it and its evolution. The flip side is that Google also provides the vast majority of the maintenance cost of it.

As you note, UUIDs are a standard and libraries exist in most languages to parse them to and from strings. I would advise any group that wants to encode them in protobufs to use a bytes field. If you are starting a new project and are unwilling to accept that trade off, that is a totally reasonable choice for you to make.

perezd · 2022-04-18T20:16:17Z

Realistically speaking, for a team starting a new project, the fact that protobuf has no UUID support is more likely to result in the team not using protobuf, than not using UUID.

This isn't mutually exclusive. UUIDs are able to be freely represented as bytes or encoded as hex values and written to strings (where they a majority of observed use cases show up).

adding an explicit UUID type I guess provides...validation as the primary feature request? I dunno what else you really need an explicit type here for. JSON doesn't have a UUID type (hell, at least protobuf has bytes which JSON does not) and nobody stops using JSON for lack of "support".

Futhermore, consider the JSON/protobuf interop requirements....for JSON, they'll just end up as a string again, so what have we really done here?

kibblewhite · 2022-04-20T12:48:16Z

Futhermore, consider the JSON/protobuf interop requirements....for JSON, they'll just end up as a string again, so what have we really done here?

Not using JSON in our project, but also you are right in saying that JSON provides no UUID support, but I would like to add that JSON doesn't add any support for data types like date/times/etc...?

From the sounds of things from fowles comment, it seems like a cost thing as Google provides the vast majority of the maintenance costs? Could it just need a financial push in that direction?

Anyways, looking forwards to seeing how this might (or not) resolve in the future. I'll continue to use a string or bytes field with the variable name prepended with Guid for now.
Thanks to everyone for the input, it's been insightful.

bill-poole · 2022-04-20T13:21:20Z

JSON doesn't have a UUID type (hell, at least protobuf has bytes which JSON does not) and nobody stops using JSON for lack of "support".

I don't think the lack of UUID support in JSON is reasonable justification for the lack of support in Protobuf. JSON is string (UTF-8) encoded, while Protobuf is binary-encoded. Therefore, the performance penalty of encoding UUIDs to/from their hex-encoded string representations in JSON is expected and therefore acceptable.

Conversely, Protobuf is binary encoded and therefore there is an expectation that the performance penalties/overheads of encoding/decoding through strings are avoided. For example, integers are sent/received in binary representation in Protobuf, rather than encoded/decoded as UTF-8 strings. Why is that? JSON encodes integers as strings, so why not Protobuf? The reason is performance and efficiency.

JSON and Protobuf are different encodings with different goals and performance characteristics. If that were not the case (i.e., if JSON and Protobuf were completely interchangeable), then why have Protobuf at all? Why doesn't everyone just use JSON instead of Protobuf?

There is an opportunity to encode/decode UUIDs in binary form. In fact UUIDs are really just 128-bit integers. Why should 64-bit integers be encoded in binary but 128-bit integers encoded as hex-encoded strings?

Byte strings and custom UUID message types are both heap-allocated in the code generated by protoc. Messages must be encoded/decoded as these intermediate heap-allocated objects, and then serialised/deserialised. If Protobuf had a well-known type for UUID, then these intermediate heap-allocated objects would no longer be required, and messages could use a "primitive" 128-bit type, which would save the heap allocation and the translation through the intermediate format.

i.e., support for a UUID well-known type would substantially increase serialisation/deserialisation performance of UUID fields. And isn't that a key reason for using Protobuf over JSON? Performance?

mprimeaux · 2022-04-20T15:51:52Z

I do agree with @bill-poole in that validation and performance are the primary drivers for our teams in having native UUID type support in protobuf.

Offering a comparison to JSON as a reason for not providing native UUID support in protobuf, I feel, has conflated this conversation a bit.

perezd · 2022-04-20T20:15:54Z

There is an opportunity to encode/decode UUIDs in binary form. In fact UUIDs are really just 128-bit integers. Why should 64-bit integers be encoded in binary but 128-bit integers encoded as hex-encoded strings?

FWIW, I just did a scan of Google's internal protos and all fields named "uuid" I've observed are encoded as string or bytes. If this has been good enough for all of Google, I am really wondering if the performance wins we're claiming here are a red herring?

Further, what's preventing folks from making message types that encode this as a pair of sfixed64 numbers? I think this would also mitigate the allocation concerns, no?

mprimeaux · 2022-04-20T20:36:21Z

Again, I think this is conflating the discussion.

It’s not about “is this good enough for Google and therefore good enough for the broader community” but more a question of efficiency in terms of (IMHO) network and memory serialization/deserialization.

Think “durable storage technologies” and why they have native support for UUID types. Optimization.

While I am sure that Google is very conscious of optimization, their search infrastructure is less resource limited than many companies.

Contrast this to other domains where we are even more attentive to resource constraints. In particular edge IoT, AR/VR for telemedicine, public transportation, defense, etc.

My kind request is to not focus on what Google does but to focus on the broader scientific benefit.

bill-poole · 2022-04-21T05:00:46Z

I agree with @mprimeaux.

FWIW, I just did a scan of Google's internal protos and all fields named "uuid" I've observed are encoded as string or bytes. If this has been good enough for all of Google, I am really wondering if the performance wins we're claiming here are a red herring?

I guess it depends on how much of a performance penalty would be deemed by Google to be sufficient to warrant doing something about it. How much slower is too much slower? I did benchmarking a while back for encoding UUIDs as strings and byte arrays in Protobuf and posted the results earlier in this thread (see #2224 (comment)).

Further, what's preventing folks from making message types that encode this as a pair of sfixed64 numbers? I think this would also mitigate the allocation concerns, no?

Nothing, which is what I did and turned out to be the fastest option available without defining a well-known type that can be serialised/deserialised directly from/to a "primitive" UUID type (see #2224 (comment)). The point is that it requires converting through an intermediate heap-allocated type (with the two fixed64 integers), which is much slower than it would otherwise be to serialise/deserialise directly from/to a "primitive" UUID type.

ghost · 2022-05-10T10:43:49Z

I code in .Net 4.8, so Guid does not exists method guid.TryWriteBytes(bytes); . What is the solution to this problem?

bill-poole · 2022-05-13T09:16:36Z

One way would be to use the Guid.ToByteArray method instead of the Guid.TryWriteBytes(Span<byte>) method. However, that will heap-allocate an array each time you invoke it, which will create more GC pressure.

You could instead try defining your own DecodedGuid struct, which is decorated with [StructLayout(LayoutKind.Explicit)] and has:

a Guid field decorated with [FieldOffset(0)]; and
two ulong fields Low64 and High64 decorated with [FieldOffset(0)] and [FieldOffset(8)] respectively.

When you instantiate a DecodedGuid struct, the Guid constructor parameter will then be written to the Guid field, and the low and high 64-bit unsigned integer components can be read from the Low64 and High64 fields.

If this works (and I haven't confirmed that it does/will), then it will be faster than using the Guid.ToByteArray method because the Guid contents will be copied into a stack-allocated SerializedGuid, rather than a heap-allocated array.

You should then be able to populate the High64 and Low64 fields of the Uuid Protobuf instance as per #2224 (comment) from the High64 and Low64 fields of the DecodedGuid struct.

JamesOldfield · 2022-07-14T07:15:07Z

One thing I haven’t seen spelt out here is the relative sizes of the different options, including the overhead of having the data in a child message which would be needed for a well-known type. For comparison (and fun) I also included a hypothetical native 128-bit fixed width type (which could be added as there are still 3 possible wire type numbers left!).

What	Size	Calculation
Native 128-bit field	17	1 tag + 16 payload
Native 128-bit (in message)	19	1 tag + 1 size + (1 tag + 16 payload)
Bytes field	18	1 tag + 1 size + 16 payload
Bytes (in message)	20	1 tag + 1 size + (1 tag + 1 size + 16 payload)
String field	38	1 tag + 1 size + 36 payload
String (in message)	40	1 tag + 1 size + (1 tag + 1 size + 36 payload)
Two fixed64 (in message)	20	1 tag + 1 size + 2 × (1 tag + 8 payload)

By the way, JSON was brought up almost as an argument against well known type for UUID. But actually I see it as the strongest reason in favour of UUID well-known type. The trade off is between:

string: nice JSON string representation but huge binary encoding (38 bytes) – given the great pains protobuf goes to for binary compactness (varints, zigzag encoding, and packed tags), it doesn't make sense to use something so wasteful
bytes: compact binary encoding (18 bytes) but bonkers JSON representation (e.g. UUID "12345678-1234-5678-1234-567812345678" becomes "EjRWeBI0VngSNFZ4EjRWeA==")
WKT based on bytes: decent length binary encoding (20 bytes) and nice JSON string representation

I haven’t included the message with separate RFC 4122 fields as it’s misguided in my view (and, with all those varints, would be a nightmare for me to compute the size). I also dispute the snippet above where various bits of two 64-bits numbers are sliced up with different endianness. Both of those fail to recognise that a UUID is simply a sequence of 16 bytes and nothing more. There is no possible endianness issue with that. It could have been generated as multiple fields by RFC 4122, in which case care must be taken with endianness when converting those fields to or from the byte sequence, but that’s not the serialisation layer’s problem.

bill-poole · 2022-07-14T15:39:21Z

I haven’t included the message with separate RFC 4122 fields as it’s misguided in my view

I agree and found there was a significant performance penalty for doing so.

Both of those fail to recognise that a UUID is simply a sequence of 16 bytes and nothing more. There is no possible endianness issue with that.

There can actually be endianness issues with UUIDs. Microsoft frameworks (e.g. .NET) tend to represent UUIDs in little endian format in memory, whereas RFC 4122 recommends big endian binary representation for network transmission. I think big endian representation is therefore the correct/best representation for Protobuf, but it means that Microsoft frameworks like .NET need to convert between little endian and big endian (which can be done with a SIMD shuffle instruction).

Note that I posted the results of performance testing on .NET for bytes versus two-fixed64 representations earlier in this issue.

My performance testing for little endian bytes representation on .NET here:

Convert and serialise: 89.666 ns
Deserialise and convert: 219.624 ns

My performance testing for big endian two-fixed64 representation on .NET here:

Convert and serialise: 29.767 ns
Deserialise and convert: 107.698 ns

Based on the above results, the bytes representation is much slower than the two-fixed64 representation.

I imagine than a WKT based on a native 128-bit field would be the simplest and most performant representation. However in the absence of a native 128-bit type, I think a two-fixed64 WKT is best due to its performance advantage over the bytes representation.

JamesOldfield · 2022-07-15T13:30:24Z

@bill-poole

But those tests are for C# / .Net, as you said. For UUID to be a WKT it has to make sense for all languages, and actually C# is one of the less used languages for protobuf (and certainly not why I'm here). People picking apart nanosecond-level performance are more likely to be using C++.

Also, those tests assume that every protobuf UUID field will be converted to the language's native UUID type when deserialised, but I think this would be a small minority usage. Much of the time it would just be used directly as a byte array, regardless of language. Obviously, having the data already in bytes format is most convenient for this. The two comments above about Google's code seems to support this. None of the comments here, except yours, have focused on conversion to C#'s GUID type, especially performance of it. I do agree conversion methods should exist (in all languages where they make sense), but they shouldn't be the focus of the discussion.

Using two int64 members would be super confusing - you've basically invented your own new representation for UUIDs, and the existing selection is already confusing enough!

It sounds like I'm backtracking in my support for a UUID well known type - why don't I just use bytes if that's what I want? But, like I said in my previous comment, a well known type is still useful because it allows you to effectively communicate that this field is a UUID (rather than just code comment saying so, or your own custom UUID message) and it gives you the standard JSON string representation.

bill-poole · 2022-07-15T15:20:54Z

@JamesOldfield, I provided the performance testing results for .NET because those were the results I had previously posted on this topic that I thought were relevant to what you said. I would be very interested to see how the performance compares between the various options in C++ and other platforms. I very much encourage that testing to be done.

I hypothesise that a similar performance difference between the options will be seen across multiple platforms. If it turns out that bytes WKTs are really slow in .NET for some reason (compared to other platforms), then I expect that would provide strong motivation for the .NET implementation to improve its performance in this area.

At the very least, I don't think it would be prudent to assume that a bytes representation is faster or as fast as a two fixed64 field representation in C++ (or any other platform) without doing the requisite performance testing.

Using two int64 members would be super confusing - you've basically invented your own new representation for UUIDs

I don't think that's true. RFC 4122 specifies the 128-bit layout, and every 128-bit value comprises a high 64-bit and low 64-bit value. i.e., the only complexity a two fixed64 field representation introduces is the concept of a 128-bit value being decomposed into a high 64-bit value and a low 64-bit value.

At this time doesn't exist a native support to UUID type, so we used a string type. [Issue]: protocolbuffers/protobuf#2224 Signed-off-by: Antonio Gisondi <antonio.gisondi@secomind.com>

minesworld · 2023-02-18T03:28:26Z

Thanks for this thread of discussion providing real-world solution for people solving real-world problems in the here-and-now. At least I don't have to wait for the governance of protobuf to get down to earth... BTW: looks that I will use the 2 x fixed64 solution ( #2224 ) to which principles I've tended before going deeper into the protobuf rabbid-hole of official definitions... Thanks bill-poole providing that & showing up the performance of it.

AbdulRehman-z · 2023-10-09T16:41:12Z

What.... an open issue since 2k16

mprimeaux · 2023-10-10T23:01:31Z

We use v4 UUIDs quite a bit in our AI / ML "workloads" and literally every bit counts "over the wire and on disk", which I think @bill-poole addressed in part in his experiments above.

There's been spirited conversation with a diversity of positions, which has made for a healthy set of discussions. My sincerely hope is we find a way forward to have this supported as an intrinsic type.

BoysheO · 2023-11-13T08:05:27Z

What the fxxk?This issue continue in 2023??Google must support UUID/GUID! We all want a officly solution! Not endless discussions!

alikleit · 2024-01-10T10:49:19Z

ah open since 2016, hope 2024 gives this a kick somewhere...

AbdulRehman-z · 2024-01-13T10:59:06Z

ah open since 2016, hope 2024 gives this a kick somewhere...

In our Dreams

zs-dima · 2024-01-13T11:02:12Z

ah open since 2016, hope 2024 gives this a kick somewhere...

Waiting for the anniversary

xfxyjwf added enhancement proto3 labels Oct 6, 2016

xfxyjwf self-assigned this Mar 7, 2017

CAFxX mentioned this issue Jun 4, 2017

Endianness of UUID and protobuf cloudfoundry/dropsonde#6

Closed

davidgasquez mentioned this issue Jul 7, 2017

Updated funnel schemas bufferapp/buda-protobufs#4

Merged

anandolee added the P3 label Jun 11, 2018

rotemtam mentioned this issue Mar 29, 2021

Does ent grpc now support UUID primary keys? ent/ent#1402

Closed

elharo added the syntax specification label Aug 21, 2021

hannahhoward mentioned this issue Nov 20, 2021

Use UUIDs for request identifiers ipfs/go-graphsync#278

Closed

4 tasks

elharo unassigned xfxyjwf Jan 27, 2022

kibblewhite mentioned this issue Apr 25, 2022

Replace Guid serialization protobuf-net/protobuf-net#292

Closed

colotiline mentioned this issue Feb 16, 2023

gRPC JSON transcoding / Support custom JsonConverter dotnet/aspnetcore#46712

Open

Provide support for UUID type (a.k.a. GUID) #2224

Provide support for UUID type (a.k.a. GUID) #2224

Comments

jtattermusch commented Oct 6, 2016

jtattermusch commented Oct 6, 2016

DanFTRX commented Aug 2, 2018

xfxyjwf commented Aug 2, 2018

listepo commented Aug 26, 2019

mihaimyh commented Nov 1, 2019

jtattermusch commented Nov 1, 2019

kucint commented Jan 3, 2020

gmabey commented Feb 19, 2020 • edited

onesteveo commented Sep 3, 2020

bill-poole commented Jan 7, 2021

bill-poole commented Jan 11, 2021

tdhintz commented Jan 12, 2021

gmabey commented Jan 13, 2021

bill-poole commented Jan 14, 2021

tdhintz commented Jan 14, 2021

bill-poole commented Jan 15, 2021

singhbaljit commented Apr 30, 2021

bill-poole commented May 2, 2021

AtosNicoS commented Jul 29, 2021

perezd commented Jul 29, 2021

fowles commented Apr 18, 2022

gmabey commented Apr 18, 2022

fowles commented Apr 18, 2022

gmabey commented Apr 18, 2022

fowles commented Apr 18, 2022

lalomartins commented Apr 18, 2022

fowles commented Apr 18, 2022

perezd commented Apr 18, 2022 • edited

kibblewhite commented Apr 20, 2022

bill-poole commented Apr 20, 2022

mprimeaux commented Apr 20, 2022 • edited

perezd commented Apr 20, 2022

mprimeaux commented Apr 20, 2022 • edited

bill-poole commented Apr 21, 2022

ghost commented May 10, 2022

bill-poole commented May 13, 2022 • edited

JamesOldfield commented Jul 14, 2022 • edited

bill-poole commented Jul 14, 2022

JamesOldfield commented Jul 15, 2022

bill-poole commented Jul 15, 2022

minesworld commented Feb 18, 2023 • edited

AbdulRehman-z commented Oct 9, 2023

mprimeaux commented Oct 10, 2023 • edited

BoysheO commented Nov 13, 2023

alikleit commented Jan 10, 2024

AbdulRehman-z commented Jan 13, 2024

zs-dima commented Jan 13, 2024

gmabey commented Feb 19, 2020 •

edited

perezd commented Apr 18, 2022 •

edited

mprimeaux commented Apr 20, 2022 •

edited

mprimeaux commented Apr 20, 2022 •

edited

bill-poole commented May 13, 2022 •

edited

JamesOldfield commented Jul 14, 2022 •

edited

minesworld commented Feb 18, 2023 •

edited

mprimeaux commented Oct 10, 2023 •

edited