Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(RFC): Add Internal Trace Model. #20170

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hdost
Copy link
Contributor

@hdost hdost commented Mar 25, 2024

Provides a path forward for integrating sources and sinks for Tracing.

Relates #1444, #17307, #17308

Provides a path forward for integrating `sources` and `sinks` for Tracing.

Relates vectordotdev#1444, vectordotdev#17307, vectordotdev#17308
rfcs/2024-03-22-20170-trace-data-model.md Outdated Show resolved Hide resolved
rfcs/2024-03-22-20170-trace-data-model.md Outdated Show resolved Hide resolved
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @hdost . We'll try to review it next week in more detail.


##### Span Mapping

Vector fields shown using VRL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Vector fields shown using VRL.
Vector fields are shown using VRL path notation.

| Datadog Field | Vector Field | Comments |
|---------------|--------------|----------|
| Name | X | See "Span Name Mapping" |
| TraceID | `.["scope_spans"][i]["spans"][j]["trace_id"]` | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (and below) valid VRL path notation? I'm thinking it should be more like this:

Suggested change
| TraceID | `.["scope_spans"][i]["spans"][j]["trace_id"]` | |
| TraceID | `.scope_spans[i].spans[j].trace_id` | |

Comment on lines +128 to +129

#####
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove this extraneous heading.


##### From Datadog -> Vector

There's not a known advantage to providing this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not receive traces through the datadog_agent interface? Even if not, I would think at some point we will want to be able to pass them through as well.


## Drawbacks

- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to fill this in?


### Out of scope

- Change the `TraceEvent` type internally from a re-typing of `LogEvent`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this must be out of scope, if the desired data model requires a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit confused on this one. It seems like the RFC here is proposing a strongly-typed representation, which I think makes sense, but means we'd also have an actual struct representing the trace event.


### Implementation

For the initial Implementation, the data model will be [v1.1.0][otel-proto-110] of the OpenTelemetry Proto.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general description of what is in the data model would be useful here. Also, is this data model so generic that only a Map can describe it, or are more dedicated data types relevant?


As OpenTelemetry is the primary model the this will only describe the data types. The top level
`TraceEvent` will represent a top level proto `message ResourceSpans`.
`Value::Object`. The remaining proto to mappings can be found:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining mappings of what?

Comment on lines +75 to +76
The keys for `message` come the field names for `Value::Object`, and MUST follow a "Snake Case" format.
As an example, instead of `traceId` and `droppedEventCount`, use `trace_id` and `dropped_event_count`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The keys for `message` come the field names for `Value::Object`, and MUST follow a "Snake Case" format.
As an example, instead of `traceId` and `droppedEventCount`, use `trace_id` and `dropped_event_count`.
The keys for `message` come the field names for `Value::Object`, and MUST follow a "snake case" format.
As an example, instead of `traceId` and `droppedEventCount`, use `trace_id` and `dropped_event_count`.

Comment on lines +80 to +81
The basis of Mapping between the Internal Trace format and the Datadog format can be based on the
[Datadog agent][datadog-agent-otlp-ingest-09].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warrants some expansion as well.


##### From Datadog -> Vector

There's not a known advantage to providing this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the advantage be to do datadog agent -> vector -> trace collector be it one that supports OTLP or Zipkin or whatever?


### Using OpenTelemetry-Inspired Model

Rather than needing to respecify the definitions and re-create work done by the wider OpenTelemetry

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the proposal this? It just does conversions of the proto types to Vector's internal Rust types but keeps the structure of Resource -> InstrumentationScope -> Spans the same?


- Define a Vector Trace event model schema.
- Support for Links between `TraceEvent`s.
- Defining a mapping between the Vector trace event model to the Datadog trace event (one direction).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need both directions, no? The datadog_agent source needs to decode it and the datadog_traces sink needs to encode it.


### Out of scope

- Change the `TraceEvent` type internally from a re-typing of `LogEvent`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit confused on this one. It seems like the RFC here is proposing a strongly-typed representation, which I think makes sense, but means we'd also have an actual struct representing the trace event.

@bruceg
Copy link
Member

bruceg commented Apr 11, 2024

We discussed this internally and agree with the general direction of this proposal. It makes sense to adopt the the OpenTelemetry data types as the internal model, since the OTel project has defined many mappings to and from their data model for other trace types. So it stands as a useful intermediate representation for many kinds of traces.

However, if we are going to adopt that data model, we would prefer it be implemented in a more structured form. Specifically, we would want all of the data structures, were possible, to become fixed Rust structs instead of a generic map type. This would echo our metric types, which can represent a variety of different kinds of metrics with a fixed structure. This will simplify all kinds of code managing conversion to and from other data types, such as protobufs or other trace models, since they can rely on the presence of certain fields at a source code level.

As such, we would like to see the above reflected in the RFC. It should contain source code specifics of what the structures would look like and, if appropriate, what methods would be made available for manipulating them. Since this isn't a completely different direction, we don't think it requires a complete overhaul of this document but it does require changing the core data types, which means that consideration is very much in scope.

Some examples from existing RFCs to show what that might look like:

Many other RFCs contain Rust code samples, as we intend for these documents to be very opinionated as to the target solution, with many of the implications worked out.

@hdost
Copy link
Contributor Author

hdost commented Apr 12, 2024

Thank you for the thorough review, I'll take some time to reformulate with all that in mind.

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: rfc meta: awaiting author Pull requests that are awaiting their author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants