Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support maps and heterogeneous arrays as attribute values #2888

Conversation

tigrannajaryan
Copy link
Member

Resolves #376

Use cases where this is necessary or useful:

  1. Specify more than one resource in the telemetry: Define a way to specify more than one Resource in telemetry data #579
  2. Data coming from external source, e.g. AWS Metadata: Support nested Attribute values (#376) #596 (comment) or Support map values and nested values for attributes #376 (comment)
  3. Capturing HTTP headers: Support map values and nested values for attributes #376 (comment)
  4. Structured stack traces: semantic conventions: add structured stacktrace to exception attributes #2841
  5. Payloads as attributes: Trace Payload Collection oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:

  • A standard way of flattening maps and nested objects when converting from OTLP to formats that don't support maps/nested objects.
  • Recommendations for semantic conventions to use/not use complex objects.

@mwear
Copy link
Member

mwear commented Oct 18, 2022

I have a couple of questions. Do we necessarily need nested maps for HTTP headers? Is there a reason why we couldn't use http.headers.<header-name> as keys and primitives (or arrays thereof) for the values? If so, I suspect a similar approach could be applied to structured stack traces.

If we were to accept this change, would this allow attributes such as {a: [{b: ..., c: ...}, {b: ..., c: ...}...]}? If so, how would we flatten such a structure? a.0.b, a.0.c, a.1.b, a.1.c,..., a.n.b, a.n.c?

Lastly, I'm curious if any backends allow querying on nested attributes. Ones that I am familiar with will allow queries such as http.path = "/some/path" are there any that allow an attribute to be queried as {http: {path: "/some/path"}}?

@tsloughter
Copy link
Member

If so, I suspect a similar approach could be applied to structured stack traces.

That is what the current PR does, #2841, but it requires splitting up each piece (filename, line number, etc) into a separate array of values which would then be put back together. It is hardly ideal even if it technically works with some deconstruction :)

@tigrannajaryan
Copy link
Member Author

Is there a reason why we couldn't use http.headers.<header-name> as keys and primitives (or arrays thereof) for the values? If so, I suspect a similar approach could be applied to structured stack traces.

Yes, we can do that, but I don't think it is a great solution. It gets worse when the values are non-primitives since now you have to have parallel arrays (e.g. the stacktrace case).

Is it doable? Yes. Is it a good way to express such data? I don't think so.

If we were to accept this change, would this allow attributes such as {a: [{b: ..., c: ...}, {b: ..., c: ...}...]}? If so, how would we flatten such a structure? a.0.b, a.0.c, a.1.b, a.1.c,..., a.n.b, a.n.c?

I think we will want to have a follow up PR that makes a proposal for that. This PR doesn't try to address that.

Lastly, I'm curious if any backends allow querying on nested attributes. Ones that I am familiar with will allow queries such as http.path = "/some/path" are there any that allow an attribute to be queried as {http: {path: "/some/path"}}?

I believe there are backends which accept nested attributes. I remember looking at Collector exporters and IIRC I found a few which didn't do flattening but kept the original nesting.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

github-actions bot commented Nov 2, 2022

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 2, 2022
@tigrannajaryan
Copy link
Member Author

I heard several times that we need this. Reopening and moving from a draft to a proper PR to discuss more widely.

@tigrannajaryan tigrannajaryan marked this pull request as ready for review November 16, 2022 17:57
@tigrannajaryan tigrannajaryan requested review from a team as code owners November 16, 2022 17:57
@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers PTAL.

@github-actions github-actions bot removed the Stale label Nov 17, 2022
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 25, 2022
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/allow-complex-attr-values branch 2 times, most recently from 97f8247 to 1243c4b Compare November 28, 2022 17:08
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/allow-complex-attr-values branch 4 times, most recently from 34e5311 to d8a488d Compare November 30, 2022 14:37
@tigrannajaryan
Copy link
Member Author

I'm keen to understand how this feature operates in a real-world scenario. It would be immensely helpful if we could demonstrate its application with at least one open-source tracing and logging databases, showcasing its practical utility. This step, although not typical for us, could significantly clarify its benefits and applicability for users like myself.

@jpkrohling Here is an example from Elastic's demo website readily available:
https://demo.elastic.co/app/discover#/doc/kibana-event-log-data-view/.ds-.kibana-event-log-ds-2024.01.17-000010?id=OKvwGI0BpiGkpO9Rapug (you may need to click the link twice to make it open the log record I am referring to directly).

See for example the "kibana.saved_objects" field. See how it is a map of arrays of strings:

image

Another example to look at is this: https://demo.elastic.co/app/discover#/doc/filebeat-*/.ds-logs-apm.error-default-2023.12.22-000014?id=n7DyGI0BGTVq0q2e7pZw

See _source and how it contains an "error" which is a map of array of map of array of maps of primitives or maps.
image

Complex, nested data happens all the time in the logging world.

@tigrannajaryan
Copy link
Member Author

@yurishkuro's proposal of storing arrays and complex types as blobs raises concerns about the expected user experience. Storing it but limiting the ability to search within these values seems counterintuitive.

@jpkrohling this is not a new proposal. It is already part of the spec, written in a Stable document. See here: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute

I am quoting the spec:

For protocols that do not natively support non-string values, non-string values SHOULD be 
represented as JSON-encoded strings.

@@ -236,12 +237,14 @@ Disclaimer: this list of features is still a work in progress, please refer to t
| OTLP File exporter | | | - | | - | | | | | | - | |
| Can plug custom LogRecordExporter | | | + | | | | | + | | + | | |
| Trace Context Injection | | | + | | + | | | + | | + | + | |
| Non-homogeneous arrays and maps (including nested) for attribute values| | | | | | | | | | | | |
Copy link

@MSNev MSNev Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg
Copy link
Member

I support this change in its current form, but prefer a slightly different option:

  • In my view, this issue accomplishes:
    • Ensuring that we don't artificially limit what data can be recorded. Note that while attributes on metrics are best thought of as metadata, attributes on logs and spans certainly can be argued to constitute the data itself.
    • Allow a uniform attributes representation to be used in all places where attributes appear in the API / SDK.
  • Adding support for complex types while carving out an exception for metrics is undesirable.
    • Either we have to have a metrics-specific attributes representation which prevents recording complex types. Or, the metrics SDK implementation has to validate that attributes do not contain complex types on the record hotpath, which incurs unnecessary overhead. Can also check on the collect path as well, but that yields weird behavior.
  • Why don't we say that attributes are the same everywhere, including metrics, but:
    • State that semantic conventions MUST not include complex types for metric attributes.
    • Heavily discourage use of complex types in metric attributes in docs.
    • Provide a fallback way to serialize complex attributes with JSON-string encoding if an exporter comes across a complex type. Note: this is the same thing that is expected today if prometheus exporter comes across an array attribute.

This design accomplishes symmetry while avoiding artificial constraints on how data can be represented for spans and logs where real use cases exist for complex types.

Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2888 (comment)

I also (currently) agree with #2888 (comment).

tigrannajaryan pushed a commit that referenced this pull request Jan 18, 2024
## Changes

Change the type of `Resource` to reference the definition in the SDK so
that the resource's scheme URL is included in logs data model.

## Why

The Logs Data Model misses scheme URL of resource. All of the languages
reuse a common type representing resource.
- Java (stable logs): [reuses a common resource
type](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/logs/src/main/java/io/opentelemetry/sdk/logs/data/LogRecordData.java)
- .NET (stable logs): [reuses a common resource
type](https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/OtlpLogRecordTransformer.cs)
- C++ (stable logs): [reuses a common resource
type](https://github.com/open-telemetry/opentelemetry-cpp/blob/main/sdk/src/logs/read_write_log_record.cc)
- Python (experimental logs): [reuses a common resource
type](https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py)

These changes can be seen as breaking. However, there is 100% precedence
of how log attributes and resource are currently handled in all
languages. I see the current description as a bug because we would miss
scheme URL of the resource. AFAIK the intention of the OTel Logs
Specification authors was to have only nested values for Body and
Attributes.

Related PRs:
-
#2888
- open-telemetry/opentelemetry-go#4809
@rfratto
Copy link

rfratto commented Jan 18, 2024

The goal of OpenTelemetry is to capture telemetry in the most effective and semantics-preserving way, not to provide the lowest common denominator and call it a "level playing field" for backend implementations. Vendor neutrality does not mean that you can switch your vendor and keep exactly the same capabilities, what would be the point of competition then. Compatibility matrix is always implicitly there, but it's not OpenTelemetry's concern.

I found this comment to be particularly surprising, as it was not the definition of "vendor neutrality" I had in mind, and by some of the arguments on this issue, it seems that I'm not alone in my confusion. I wasn't able to find anything on the OpenTelemetry website that clearly laid these expectations out with the same clarity as this explanation.

Should this be documented more explicitly (or if it's already documented, moved somewhere easier to find) to avoid reader misunderstanding?

@MrAlias
Copy link
Contributor

MrAlias commented Jan 18, 2024

  • Why don't we say that attributes are the same everywhere

@jack-berg we have already defined attributes differently for logs than for all the other signals. The Attributes field for the logs bridge API emit operation is defined by the logs data-model (map<string, any>), not the Attributes all the other signals12345 use.

As was pointed out here, the purpose of this PR is not to change the logs definition. This change will only affect the existing signals and resources that already use these attributes. I don't think we can justify making a change to attributes used by the trace and metric signal and resources so that they can be "unified" with logs when logs already uses a different definition and will not be migrated.

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#record-1

  2. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#record

  3. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#add-1

  4. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#measurement

  5. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span

@MrAlias
Copy link
Contributor

MrAlias commented Jan 18, 2024

Have there been any asks by users to add these attribute types to tracing or metrics?

@pellared
Copy link
Member

pellared commented Jan 18, 2024

@MrAlias Isn't the logs attribute type semantically the same as the common attribute proposed in this PR? If not then can you point out the difference?

Take notice that the common attributes specification already references log attributes (which in my opinion suggests that this is the same type):

## Attribute Collections
[Resources](../resource/sdk.md), Metrics
[data points](../metrics/data-model.md#metric-points),
[Spans](../trace/api.md#set-attributes), Span
[Events](../trace/api.md#add-events), Span
[Links](../trace/api.md#link) and
[Log Records](../logs/data-model.md) may contain a collection of attributes. The

@tigrannajaryan
Copy link
Member Author

Have there been any asks by users to add these attribute types to tracing or metrics?

@MrAlias Yes for traces, a couple issues are linked in the PR description - the stacktrace and the payload use cases.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 18, 2024

@MrAlias Isn't the logs attribute type semantically the same as the common attribute proposed in this PR? If not then can you point out the difference?

Take notice that the common attributes specification already references log attributes (which in my opinion suggests that this is the same type):

## Attribute Collections
[Resources](../resource/sdk.md), Metrics
[data points](../metrics/data-model.md#metric-points),
[Spans](../trace/api.md#set-attributes), Span
[Events](../trace/api.md#add-events), Span
[Links](../trace/api.md#link) and
[Log Records](../logs/data-model.md) may contain a collection of attributes. The

Semantically equivalent is not the same as identical. Implementations that have developed a logs attribute type to conform with the specification cannot simply change out that type to another without breaking the API. For example if in Go we build:

type LogAttribute struct { .... }

func (Record) SetAttribute(...LogAttribute)

We cannot simply redefine to this:

func (Record) SetAttribute(...attribute.KeyValue)

Any user of the API will break when they try to upgrade.

@pellared
Copy link
Member

pellared commented Jan 18, 2024

Semantically equivalent is not the same as identical.

I agree. This is why I do not want to release a stable OTel Go Logs API module until this issue/PR is resolved.

Implementations that have developed a logs attribute type to conform with the specification cannot simply change out that type to another without breaking the API.

This will "negatively" (breaking changes) affect some non-stable implementations (AFAIK Rust and JS).
CC @lalitb @scheler

But it could "positively" impact all stable implementations (and Python):

@MSNev
Copy link

MSNev commented Jan 18, 2024

This will "negatively" (breaking changes) affect some non-stable implementations (AFAIK Rust and JS). CC @lalitb Lalit Kumar Bhasin FTE @scheler

I don't believe that this will be an issue for JS as we previously had a SpanAttributes which is now just an alias for Attributes, so the same would be for the LogAttributes. JavaScript itself doesn't have any concept of runtime "type" checking, this is "added" by implementing using TypeScript and enforced at compile time (which you can be ignore with some additional casting), it is flexible enough to determine (mostly -- not always) that 2 different interface definitions which have the same set of properties / types are equivalent.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 18, 2024

Semantically equivalent is not the same as identical.

I agree. This is why I do not want to release a stable OTel Go Logs API module until this issue/PR is resolved.

Implementations that have developed a logs attribute type to conform with the specification cannot simply change out that type to another without breaking the API.

This will "negatively" (breaking changes) affect some non-stable implementations (AFAIK Rust and JS). CC @lalitb @scheler

But it could "positively" impact all stable implementations (and Python):

If we are able to retroactively make breaking changes to a stable portion of the specification because no stable implementation is compliant I think we can just replace our versioning and stability guidelines with:

Do whatever, we'll fix it later.

😆

In all seriousness though, that is going to have to be a separate issue than this one. The TC and likely GC (cc @open-telemetry/governance-committee @open-telemetry/technical-committee ) will need to weight our reputation loss in not doing what we say we will with the desire to correct this situation.

I just wanted to point this out as @jack-berg had mentioned conformity and reuse of the existing attribute definition as motivation for this change, but it has been pointed out by @tigrannajaryan and others that that is not the intent of this PR.

@jpkrohling
Copy link
Member

Thank you for your clarifications and patience, @tigrannajaryan. I also appreciate @jack-berg's options and I agree with them as well. We are also aligned that we shouldn't aim for the lowest common denominator, and I hope my previous messages didn't imply that.

One of my goals with my questions is to ensure that open-source users can take advantage of the solutions we are giving to the problems they have. And if they can't, what does it mean? That the open-source ecosystem doesn't care about this feature, because users don't care about them? Or that there are other easier ways to solve the same problem? In my opinion, what we shouldn't be doing is creating a feature without taking the broader open-source ecosystem in consideration.

My other major point was in having consistency among signals: if the use-case isn't strong enough, we probably shouldn't break the consistency we have now, even if we have inconsistencies elsewhere. I believe those inconsistencies are smaller compared to ruling out metrics for this proposal here.

Your example of Elastic helps in showing one open-source solution for logs using complex attribute values. While it isn't open-source, I take that it works for OpenSearch as well, although a confirmation would be desirable. And given that we have @yurishkuro's approval of this issue here, I believe Jaeger users are represented as well, so, I'm happy with the tracing side of things. It's still not clear to me whether users can use that information for their observability scenarios ("show me how all traces containing stacktraces for the line 53 of KotlinExtensions.kt"), or if it's only for reference once the right data point (log or span) has been found. This is what I meant by an end-to-end example: how do we envision things happening to solve the problem at hand.

We still have a situation regarding metrics, though, and the one thing I'd ask before moving forward is to sync with the Prometheus folks and come up with an agreement on how to handle this situation. Are they happy to just drop this data? Or to serialize it as JSON? Would they be open to implementing a different way to store this data and not keep it in TSDB? If we don't engage with them, I predict that they will be mad at us because we are specifying something we know they don't support. Why Prometheus? Because it's one of the most relevant, if not the most relevant, open-source metrics database out there, especially for our audience. If the Prometheus WG folks (@dashpole comes to mind) are OK with this change, I'm certainly OK with it as well.

- A primitive type: string, boolean, double precision floating point (IEEE 754-1985) or signed 64 bit integer.
- An array of primitive type values. The array MUST be homogeneous,
i.e., it MUST NOT contain values of different types.
- An array of values of primitive type [before 1.29.0].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include a byte array (as for logs attributes)?

@dashpole
Copy link
Contributor

Prometheus only supports string label values, so we already need to deal with similar cases today for arrays. Thus far, we've followed https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute, and serialized to json for all of those cases, rather than drop. TBH, I don't think I've ever seen a prometheus metric with an array-typed label from OTel, although I suspect it happens occasionally when we encode Resource as the target_info metric (e.g. if they have process.command_args).

I definitely agree with the sentiment that prometheus users don't want large, structured, json-encoded attribute values in their metrics. Even adding namespaces to metric labels is seen as too verbose by prometheus folks (although I understand the rationale). In practice, many prometheus users set label_value_length_limit to avoid ingesting very large label values. If they did encounter a large, structured json-encoded label value, it would cause scraping to fail, and the user would likely choose to drop that label anyways using prometheus relabel rules.

If, like arrays, maps are rarely used for metric label values, I don't think this will be a huge issue. We can JSON-encode them when they do appear and let users deal with those strings. But I don't see this being useful for metrics, and would personally be against using these in semantic conventions for metrics.

@gouthamve @fstab

@tigrannajaryan
Copy link
Member Author

I spent a bit more time thinking about this issue and came to a conclusion that this change is non-trivial and likely warrants an OTEP. I don't think a PR is the right mechanism for a change like this. I am going to close the PR.

I don't plan to create the OTEP myself right now since I don't have time. If someone else wishes to create the OTEP, I can review it.

If such an OTEP is proposed I think it needs to at least:

  • Explain the current state of the spec. I see comments in this thread that indicate lack of knowledge of what we have today. It would be useful to describe it in the OTEP.
  • Explain how the change is possible to implement in existing stable language API/SDKs within the compatibility promises those SDKs give their users.
  • Resolve an outstanding open question: should metrics discourage but in theory allow the same complex attribute value data type as all other signals and flatten it based destination limitations OR should metrics outright prevent complex data types (and whether to prevent it at runtime or compile time).
  • Attach prototype implementations in multiple languages proving the previous claim. Show what the performance impact (if any) of the change is. We have a Java prototype from @jack-berg and I believe @MrAlias has some preliminary sketches in Go.

To @pellared and @MrAlias, who are waiting on the resolution to this. I see 3 options:

  • Do the OTEP yourself (if you are willing) to unblock yourself.
  • Wait for someone else to do the OTEP.
  • Don't wait and implement log attributes according to existing spec. This likely means a different data type and likely impossibility to later make the API uniform across signals.

There is probably some other alternative I am not seeing right now.

Thank you everyone for patience and for helpful feedback and comments.

carlosalberto pushed a commit that referenced this pull request Mar 6, 2024
If we aren't going to accept complex attribute types (#2888) we should
explicitly rule them out of future designs. Doing so cements the idea
that attributes are "metadata" instead of "data", since if attributes
were data, we would not want to artificially limit their structure. Once
its clear that attributes are metadata and restricted to a limited set
of types, its easy to determine that use cases which require complex
types (like event payloads) should seek to put the data elsewhere (like
in a log record body).

While I was in favor of supporting complex attribute types (#2888) I
believe its more important that we commit one way or the other. The
uncertainty around the question of whether this type of evolution will
occur has muddied the waters of several related conversations.

There was consensus on codifying this in the 1/30/24 spec SIG meeting.
We should capitalize on this momentum and get this over the finish line.
Stalling out just to revisit this same debate in the future is a bad use
of time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support map values and nested values for attributes