Support map values and nested values for attributes #376

tigrannajaryan · 2019-12-06T16:24:45Z

After #368 gets merged we will have support for array values.

If we add support for maps and nesting it will allow to represent arbitrary nested data structures in attribute values if needed.

This will apply to span and resource attributes.

jmacd · 2019-12-06T17:56:28Z

Can we explicitly state that this applies to resources too? I believe that span attributes and resources (which are only specified in the proto, currently) are specified with the same structure.

Oberon00 · 2019-12-12T12:20:17Z

Are there any use cases for arbitrary nesting? I think (multi)maps would be useful to store, e.g., HTTP headers, but what would be the rationale for arbitrary nesting?

shengxil · 2020-05-05T05:06:48Z

Arbitrary nesting map can represent the classified values e.g. {"http" : {"url":...,"method":...}} or {"sql" : {"query":...,"engine":...}}. It can also host vendor specific data like {"aws": {"account_id":...}} in the situations when Resource isn't the right place, e.g. client side metrics

jmacd · 2020-05-05T05:27:30Z

In #579, Tigran's example seems to contain a use-case. The resource of "application B" is a set of key-value attributes.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows for nesting. AnyValue can represent any data that can be represented in JSON. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to current state. More importantly, performance critical applications can use Gogo ProtoBuf generator (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogoproto proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is futureproof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simples approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field types (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This simplifies the schema however it results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values are strings. Integer and boolean values are also used (although significantly less frequently than strings). Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we kept all these value types in AnyValue we would pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also is not futureproof and will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AnyValue can represent any data that can be represented in JSON. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This simplifies the schema however it results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values are strings. Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 Resolves: open-telemetry#106 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AnyValue can represent any data that can be represented in JSON. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This simplifies the schema however it results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values are strings. Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 Resolves: open-telemetry#106 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. We need to add such capability. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AnyValue can represent any data that can be represented in JSON. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This simplifies the schema however it results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values are strings. Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 Resolves: open-telemetry#106 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. We need to add such capability. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AnyValue can represent any data that can be represented in JSON. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values of produced telemetry are strings (see e.g. semantic conventions for proof). Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 Resolves: open-telemetry#106 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. We need to add such capability. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values of produced telemetry are strings (see e.g. semantic conventions for proof). Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

mwear · 2020-06-12T21:23:24Z

Semantically, I agree that the dotted string notation is equivalent to a map, although I'd like to point out, that at least from the tracing client perspective, dotted strings are a slightly more efficient representation.

Consider the following representations for the key-value pair 'http.method': 'GET'.

Dotted string representation
{ 'http.method': 'GET }
To represent this we need 1 map and 2 strings; 3 total objects.

Map representation
{ { 'http' : { 'method': 'get' } }
This requires 2 maps, 3 strings; 5 total objects.

Furthermore, most tracing backends do not support nested attributes (as far as I know), and will need to flatten them into dotted strings. This is something that will either have to been done in the tracing clients during export, or by the backends on ingest.

While I recognize that this does have some advantages in regards to semantics and for the data that can be represented, it does introduce complexity into tracing clients and backends. I'm not saying we shouldn't pursue this proposal, but we should discuss what the actual benefits are, and whether the added complexity is worth the tradeoff.

## Summary This adds support for arrays and maps to attribute values, including support for nested values. This is a breaking protocol change. Resolves: open-telemetry/opentelemetry-specification#376 Resolves: open-telemetry#106 ## Motivation There are several reasons for this change: - The API defines that attributes values [may contain arrays of values](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes). However the protocol has no way of representing array values. We need to add such capability. - We intend to support Log data type in the protocol, which also requires array values (it is a Log Data Model requirement). In addition, Log data type requires support of key-value lists (maps) as attribute values, including nested values. - There are long-standing requests to support nested values, arrays and maps for attributes: open-telemetry/opentelemetry-specification#376 open-telemetry/opentelemetry-specification#596 This change introduces AnyValue. AnyValue can represent arbitrary numeric, boolean, string, arrays or maps of values and allows nesting. AttributeKeyValue now uses AnyValue to store the "value" part. Note: below "Current" refers to the state of the "master" branch before this PR/commit is merged. "Proposed" refers to the schema suggested in this PR/commit. ## Performance This change has a negative impact on the performance when using canonical Go ProtoBuf compiler (compared to current OTLP state): ``` BenchmarkEncode/Current/Trace/Attribs-8 813 1479588 ns/op BenchmarkEncode/Proposed/Trace/Attribs-8 417 2873476 ns/op BenchmarkEncode/OpenCensus/Trace/Attribs-8 162 7354799 ns/op BenchmarkDecode/Current/Trace/Attribs-8 460 2646059 ns/op 1867627 B/op 36201 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 246 4827671 ns/op 2171734 B/op 56209 allocs/op BenchmarkDecode/OpenCensus/Trace/Attribs-8 154 7560952 ns/op 2775949 B/op 76166 allocs/op ``` However, I do not think this is important for most applications. Serialization CPU and Memory usage is going to be a tiny portion of consumed resources for most applications, except certain specialized ones. For the perspective I am also showing OpenCensus in the benchmark to make it clear that we are still significantly faster than it despite becoming slower compared to the current state. More importantly, performance critical applications can use Gogo ProtoBuf compiler (Collector does use it), which _gains_ performance due to this change: ``` BenchmarkEncode/Current(Gogo)/Trace/Attribs-8 1645 705385 ns/op BenchmarkEncode/Proposed(Gogo)/Trace/Attribs-8 1555 698771 ns/op BenchmarkDecode/Current(Gogo)/Trace/Attribs-8 537 2241570 ns/op 2139634 B/op 36201 allocs/op BenchmarkDecode/Proposed(Gogo)/Trace/Attribs-8 600 2053120 ns/op 1323287 B/op 46205 allocs/op ``` With Gogo compiler proposed approach uses 40% less memory than the current schema. After considering all tradeoffs and alternates (see below) I believe this proposal is the best overall approach for OTLP. It is idiomatic ProtoBuf, easy to read and understand, is future-proof to adding new attribute types, has enough flexibility to represent simple and complex attribute values for all telemetry types and can be made fast by custom code generation for applications where it matters using Gogo ProtoBuf compiler. Note: all performance measurements are done for Go implementation only (although it is expected that other languages should exhibit somewhat similar behavior). ## Alternates Considered I also designed and benchmarked several alternate schemas, see below. ### Adding array value to AttributeKeyValue This is the simplest approach. It doubles down on the current OTLP protocol approach and simply adds "array_values" field to AttributeKeyValue, e.g.: ```proto message AttributeKeyValue { // all existing fields here. // A list of values. "key" field of each element in the list is ignored. repeated AttributeKeyValue array_values = 7; } ``` This eliminates the need to have a separate AnyValue message and has lower CPU usage because it requires less indirections and less memory allocations per value. However, this is semantically incorrect since the elements of the array must actually be values not key-value pairs, which this schema violates. It also uses more memory than the proposed approach: ```proto BenchmarkEncode/Proposed/Trace/Attribs-8 400 2869055 ns/op BenchmarkEncode/MoreFieldsinAKV/Trace/Attribs-8 754 1540978 ns/op BenchmarkDecode/Proposed/Trace/Attribs-8 250 4790010 ns/op 2171741 B/op 56209 allocs/op BenchmarkDecode/MoreFieldsinAKV/Trace/Attribs-8 420 2806918 ns/op 2347827 B/op 36201 allocs/op ``` It will become even worse memory-wise if in the future we need to add more data types to attributes. This approach is not scalable for future needs and is semantically wrong. ### Fat AnyValue instead of oneof. In this approach AnyValue contains all possible field values (similarly to how AttributeKeyValue is currently): ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; double double_value = 5; repeated AnyValue list_values = 6; repeated AttributeKeyValue kvlist_values = 7; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` This results in significantly bigger AnyValue in-memory. In vast majority of cases attribute values of produced telemetry are strings (see e.g. semantic conventions for proof). Integer and boolean values are also used, although significantly less frequently than strings. Floating point number, arrays and maps are likely going to be diminishingly rare in the attributes. If we keep all these value types in AnyValue we will pay the cost for all these fields although almost always only string value would be present. Here are benchmarks comparing proposed schema and schema with fat AnyValue and using string and integer attributes in spans: ``` BenchmarkEncode/Proposed/Trace/Attribs-8 415 2894513 ns/op 456866 B/op 10005 allocs/op BenchmarkEncode/FatAnyValue/Trace/Attribs-8 646 1885003 ns/op 385024 B/op 1 allocs/op BenchmarkDecode/Proposed/Trace/Attribs-8 247 4872270 ns/op 2171746 B/op 56209 allocs/op BenchmarkDecode/FatAnyValue/Trace/Attribs-8 343 3423494 ns/op 2988081 B/op 46205 allocs/op ``` Memory usage with this approach is much higher and it also will become worse as we add more types. ### AnyValue plus ExoticValue This is based on fat AnyValue approach but rarely used value types are moved to a separate ExoticValue message that may be referenced from AnyValue if needed: ```proto message AnyValue { ValueType type = 1; bool bool_value = 2; string string_value = 3; int64 int_value = 4; ExoticValue exotic_value = 5; } message ExoticValue { double double_value = 1; repeated AnyValue array_values = 2; repeated AttributeKeyValue kvlist_values = 3; } message AttributeKeyValue { string key = 1; AnyValue value = 2; } ``` While this improves the performance (particularly lowers memory usage for most frequently used types of attributes) it is awkward and sacrifices too much readability and usability for small performance gains. Also for the rare cases it is slow and uses even more memory so its edge case behavior is not desirable. ### Using different schema for log data type I also considered using a different message definition for LogRecord attributes and Spans. This would allow to eliminate some of the requirements that we do not yet formally have for Span attributes (particularly the need to have maps of nested values). However, this does not help much in terms of performance, makes Span and LogRecord attributes non-interchangeable and significantly increases the bloat of code in applications that need to work with both Spans and Log records.

mwear · 2020-06-15T16:16:03Z

I'd also like to add that with the dotted-string notation, tracing clients can reduce the runtime string allocations to 0 for attribute keys by introducing constants for semantic conventions (and any other commonly used keys). We would lose this ability by changing to nested maps.

I should also clarify that I am completely ok with array support. It's the nested map support that I have reservations about.

tigrannajaryan · 2020-06-23T14:05:57Z

@bogdandrutu can you please clarify why is this reopened?

bogdandrutu · 2020-06-23T15:23:27Z

@tigrannajaryan because of the last week discussion and concerns raised by @mwear

Oberon00 · 2020-06-24T11:15:41Z

Was closing it even intentional? I can't remember any final decision in this issue. At least it's not documented here?

tigrannajaryan · 2022-09-27T18:37:14Z

Another use-case here: open-telemetry/oteps#219

tsloughter · 2022-09-27T22:07:43Z

#2841 is an example where the complexity for both sides (client and backend) is increased by lack of map support. The semantic convention is also harder to understand when it has to be split into pieces where it could be that the whole, like exception.structured_stacktrace is optional but within the whole there may be those that are required in structured_stacktrace but still are marked as optional since they technically are optional in the attributes since there may be no structured_stacktrace.

jmacd · 2022-10-04T23:34:18Z

I support map-valued attributes. On a technical level, this will not present a significant problem for the OTel-Go SDK based on my understanding of the issues. We already admit slice-valued attributes and although there was a bug in this support, it was recently fixed, and we already have a data type (attribute.Set) which represents a unique attribute-value map. To add a Map-valued attribute, we would add the attribute.Set as one of the possible Value types. (Likewise, to add List-of-attributes-valued attributes, we would apply the technique used for primitive-valued slices.)

@Oberon00 list of what it takes to finish this support looks good: #376 (comment)

yurishkuro · 2022-10-15T18:44:35Z

I also support maps as attribute values.

I don't completely agree with @Oberon00's list #376 (comment), I think it broadens the scope of this issue more than necessary and makes it difficult to make progress. Specifically, I am talking about the two points on deciding whether maps should be used in future or existing semantic conventions. Yes, those are important questions, but they do not need to block the API change itself, since there are use cases for map attributes that are not about semantic conventions, but about capturing app-specific data.

Resolves open-telemetry#376 Use cases where this is necessary or useful: 1. Specify more than one resource in the telemetry: open-telemetry#579 2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment) or open-telemetry#376 (comment) 3. Capturing HTTP headers: open-telemetry#376 (comment) 4. Structured stack traces: open-telemetry#2841 5. Payloads as attributes: open-telemetry/oteps#219 (comment) This is a draft PR to see what the change looks like. If this PR is merged it will be nice to follow it up with: - A standard way of flattening maps and nested objects when converting from OTLP to formats that don't support maps/nested objects. - Recommendations for semantic conventions to use/not use complex objects.

tigrannajaryan · 2022-10-17T21:46:44Z

Here is a draft PR to discuss: #2888

…telemetry#2581 and Support map values and nested values for attributes open-telemetry#376

Resolves open-telemetry#376 Use cases where this is necessary or useful: 1. Specify more than one resource in the telemetry: open-telemetry#579 2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment) or open-telemetry#376 (comment) 3. Capturing HTTP headers: open-telemetry#376 (comment) 4. Structured stack traces: open-telemetry#2841 5. Payloads as attributes: open-telemetry/oteps#219 (comment) This is a draft PR to see what the change looks like. If this PR is merged it will be nice to follow it up with: - A standard way of flattening maps and nested objects when converting from OTLP to formats that don't support maps/nested objects. - Recommendations for semantic conventions to use/not use complex objects.

austinlparker · 2024-04-23T20:21:22Z

Closed by #3858

tigrannajaryan mentioned this issue Dec 6, 2019

Allow array values for attributes #368

Merged

lmolkova mentioned this issue Dec 10, 2019

Experiment: ResourceValue as struct open-telemetry/opentelemetry-dotnet#378

Closed

pyohannes mentioned this issue May 5, 2020

Support passing attributes with AddEvent open-telemetry/opentelemetry-cpp#60

Merged

shengxil added a commit to shengxil/opentelemetry-specification that referenced this issue May 12, 2020

Support nested Attribute values (open-telemetry#376)

d2c6996

shengxil mentioned this issue May 12, 2020

Support nested Attribute values (#376) #596

Closed

tigrannajaryan mentioned this issue May 26, 2020

Align Embedded Logs data model with Standalone Logs data model #622

Open

tigrannajaryan mentioned this issue Jun 8, 2020

Add support for arrays and maps for attribute values open-telemetry/opentelemetry-proto#157

Merged

bogdandrutu closed this as completed in open-telemetry/opentelemetry-proto#157 Jun 15, 2020

bogdandrutu reopened this Jun 16, 2020

carlosalberto mentioned this issue Jun 22, 2020

Declare Trace part of the protocol as Stable open-telemetry/opentelemetry-proto#154

Closed

tigrannajaryan mentioned this issue Sep 27, 2022

semantic conventions: add structured stacktrace to exception attributes #2841

Closed

tigrannajaryan mentioned this issue Oct 17, 2022

Support maps and heterogeneous arrays as attribute values #2888

Closed

MSNev added a commit to MSNev/opentelemetry-specification that referenced this issue Oct 18, 2022

[Common] Spec inconsistency with proto definition of Attributes open-…

68d7937

…telemetry#2581 and Support map values and nested values for attributes open-telemetry#376

MSNev added a commit to MSNev/opentelemetry-specification that referenced this issue Nov 14, 2022

[Common] Spec inconsistency with proto definition of Attributes open-…

ae94042

…telemetry#2581 and Support map values and nested values for attributes open-telemetry#376

tigrannajaryan mentioned this issue May 17, 2023

[Schema] Converting values that cause conflicts #3497

Open

tigrannajaryan mentioned this issue Jul 20, 2023

AnyValue diverging from spec on website open-telemetry/opentelemetry-proto#501

Closed

dvoytenko mentioned this issue Nov 2, 2023

recordException should support chained exceptions open-telemetry/semantic-conventions#941

Open

pellared mentioned this issue Jan 9, 2024

log: Add design doc open-telemetry/opentelemetry-go#4809

Merged

pellared mentioned this issue Jan 11, 2024

Update Attributes field type in Logs Data Model #3816

Closed

austinlparker closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support map values and nested values for attributes #376

Support map values and nested values for attributes #376

tigrannajaryan commented Dec 6, 2019 •

edited

jmacd commented Dec 6, 2019

Oberon00 commented Dec 12, 2019

shengxil commented May 5, 2020

jmacd commented May 5, 2020

mwear commented Jun 12, 2020 •

edited

mwear commented Jun 15, 2020 •

edited

tigrannajaryan commented Jun 23, 2020

bogdandrutu commented Jun 23, 2020

Oberon00 commented Jun 24, 2020 •

edited

tigrannajaryan commented Sep 27, 2022

tsloughter commented Sep 27, 2022

jmacd commented Oct 4, 2022

yurishkuro commented Oct 15, 2022

tigrannajaryan commented Oct 17, 2022

austinlparker commented Apr 23, 2024

Support map values and nested values for attributes #376

Support map values and nested values for attributes #376

Comments

tigrannajaryan commented Dec 6, 2019 • edited

jmacd commented Dec 6, 2019

Oberon00 commented Dec 12, 2019

shengxil commented May 5, 2020

jmacd commented May 5, 2020

mwear commented Jun 12, 2020 • edited

mwear commented Jun 15, 2020 • edited

tigrannajaryan commented Jun 23, 2020

bogdandrutu commented Jun 23, 2020

Oberon00 commented Jun 24, 2020 • edited

tigrannajaryan commented Sep 27, 2022

tsloughter commented Sep 27, 2022

jmacd commented Oct 4, 2022

yurishkuro commented Oct 15, 2022

tigrannajaryan commented Oct 17, 2022

austinlparker commented Apr 23, 2024

tigrannajaryan commented Dec 6, 2019 •

edited

mwear commented Jun 12, 2020 •

edited

mwear commented Jun 15, 2020 •

edited

Oberon00 commented Jun 24, 2020 •

edited