Split pdata into multiple packages #4832

dmitryax · 2022-02-10T00:04:20Z

Problem

As a result of discussion on how to handle experimental data types, a decision was made to split pdata package into multiple modules per signal and publish them as separate modules. pdata package incapsulates internal proto data structures into private fields. Given that currently all signal data structures (logs, metrics, traces) use common structures like Resource, AttributeMap, AttributeValue, they need direct access to the private proto fields of the common structures. Splitting them as is cannot be done since it removes the access to the private fields of common data types from other signal packages. There are several ways how this can be solved, all of them have some drawbacks.

Possible solutions

1. Wrap common pdata structs

Common pdata structures like Resource and others can be wrapped into another struct that is accessible in a public module while wrapped part can be placed into an internal package that all other signals have access to.

Cons:

Complicates common data structures
Additional computation overhead to wrap/unwrap

2. Add new exported functions to common data structures to be used only by other pdata signals

Private functions of common data types that currently used to pass private proto fields can be exported. In that case other signal modules can have access to it.

For example:

func newResource(orig *otlpresource.Resource) Resource

can be changed to

func NewResourceFromProto(orig *otlpresource.Resource) Resource {

Pros:

Less additional complexiy

Cons:

Introducing exported functions that are needed for internal use only.

3. Put `pdata` into an internal package as is and use aliases to it in exported modules

The whole pdata package can be put as is into an internal package. And exported packages will contain only aliases to the internal pdata package.

Pros:

Easy to generate and maintain

Cons:

Slightly complicates the module structure and godoc.

Proposal

The third solution seems to be the most appropriate. It keeps the overhead minimal while introducing no changes to existing pdata structures. It just adds another layer of reference in the code and generated documentation.

The text was updated successfully, but these errors were encountered:

bogdandrutu · 2022-02-10T19:23:56Z

My first question/reaction: Considering all these problems... Do we still want to split the packages?

dmitryax · 2022-02-10T22:16:20Z

Also the only downside of the approach 3 can be mitigated to some extent. Signal specific parts of the pdata code that is put under internal in #4833 potentially can be moved to exported p<signal> modules and only pcommon will stay with the aliases.

dmitryax · 2022-02-11T22:15:53Z

@open-telemetry/collector-approvers please let me know if the options outlined in this issue need more clarifications for further discussion.

I don't think we should consider Option 1 due to the overhead. So I believe the question is whether we want to accept one of the following:

Option 2: exposing functions that are needed for internal usage only like NewResourceFromProto. There are must be several of them.

Option 3: having aliases in pcommon module pointing to an internal package.

Aneurysm9 · 2022-02-11T22:24:39Z

I don't feel that this issue provides enough information for us to determine the best path forward. I don't see from here what the structure of independent signal modules would look like. What would be the dependencies among them and the existing modules/packages? What would the wrapping structures for option 1) look like? What would the interfaces for common types in option 2) look like? Is it only exporting initializers or is there more that needs to be exposed? Why would it be a bad thing to allow the initialization of common data types that might be used by multiple signals? It feels like they are currently internal-only but need not be going forward.

I don't think we should consider Option 1 due to the overhead.

What overhead? Is it a performance concern? Do you have benchmarks that demonstrate the magnitude of the overhead?

Option 2: exposing functions that are needed for internal usage only like NewResourceFromProto.

Why should the pdata package be the only place that a resource can be constructed from proto structs? Can external callers not already contstruct a new empy resource and then fill it with attributes from the proto struct? Why not give a direct method, particularly if we're talking about ways to enable support for more, new signal types.

Again, I think we need a lot more detail on all of these options before we can make an informed decision. Please see open-telemetry/opentelemetry-go#2526 where the Go SIG was recently presented with choices for how to structure a metrics API. Having that level of detail made it easier for us to understand the end-state we're moving toward and how we're getting there.

codeboten · 2022-02-11T22:36:48Z

Thanks for putting together a draft PR demonstrating option 3. I don't know how feasible it would be to have a PR for both other options, or if it would be easier to put together a rough sketch of what Options 1 & 2 would look like (as per @Aneurysm9's linked proposal in the go repo)

Looking through the PR for Option 3, I'm not sure how different from option 2 it is, sure we're hiding the internals but exposing the aliases essentially does almost the same as exposing the internals here.

Add new exported functions to common data structures to be used only by other pdata signals

I don't think there's a good way to enfore only pdata signals to use exported functions, so i guess the question is what's the downside in exposing those methods?

dmitryax · 2022-02-11T22:44:44Z

@codeboten

sure we're hiding the internals but exposing the aliases essentially does almost the same as exposing the internals here.

Not really, for Option 2 we would need to expose functions like NewResourceFromProto, but we don't need to provide aliases for them in Option 3. We don't have any internals exposed in the draft PR that I submitted

I don't think there's a good way to enfore only pdata signals to use exported functions, so i guess the question is what's the downside in exposing those methods?

I think the only downside is their actual presence. Users won't be able to do anything with them outside the Collector's model

Aneurysm9 · 2022-02-11T22:51:17Z

I think the only downside is their actual presence. Users won't be able to do anything with them outside the Collector's model

Can you elaborate on this? I'm not sure I understand.

dmitryax · 2022-02-12T05:03:07Z

I think the only downside is their actual presence. Users won't be able to do anything with them outside the Collector's model

Can you elaborate on this? I'm not sure I understand.

Option 2 implies introducing publicly exported functions like the following:

func ResourceFromProto(orig *otlpresource.Resource) Resource {
	return Resource{orig: orig}
}

This function can be imported by any client, but cannot be used anywhere outside of Core Collector's model, because it takes an argument of a type defined in model/internal which is not leaked anywhere else.

Aneurysm9 · 2022-02-12T21:00:05Z

Would that be an issue if we used the same proto structs that everyone else uses? I think that being able to interoperate with other systems at that level would be advantageous.

tigrannajaryan · 2022-02-14T15:20:42Z

Would that be an issue if we used the same proto structs that everyone else uses? I think that being able to interoperate with other systems at that level would be advantageous.

That would be a significant performance degradation (IIRC around 1.3x last time we measured). Also probably (?) makes impossible/more complicated the patching/manual Protobuf serialization code that @codeboten works on.

tigrannajaryan · 2022-02-14T15:33:38Z

My first question/reaction: Considering all these problems... Do we still want to split the packages?

+1.

I think we need to keep asking this question. I still don't see an alternate that works if we keep insisting on our formal approach to introducing experimental API (according to Go semver requirements). However if we relax this requirement then the alternate is much simpler: don't split, mark unstable parts (currently logs) as "unstable" via some sort of labelling (like gRPC does).

I have a feeling by trying to be formal we are complicating our lives disproportionately.

mx-psi · 2022-02-14T16:42:10Z

I still don't see an alternate that works if we keep insisting on our formal approach to introducing experimental API (according to Go semver requirements).

@tigrannajaryan How would the implementation in #4833 'not work' in this sense?

Aneurysm9 · 2022-02-14T17:18:29Z

That would be a significant performance degradation (IIRC around 1.3x last time we measured). Also probably (?) makes impossible/more complicated the patching/manual Protobuf serialization code that @codeboten works on.

What about the other way around, then? Take the internal structs from the collector and make them available for everyone else to use. If they're that much better than the structs in the proto-go repo then why shouldn't we be encouraging everyone to use them? It seems that the perceived complication here is simply stemming from the fact that we're not using a common OTLP implementation and we should look at fixing that rather than throwing up our hands and saying "well, we've tried nothing and we're all out of ideas!"

As for the patches on the existing implementation that @codeboten has developed, my understanding is that they are only necessary because we're not using the structs from proto-go that are generated with a protoc that understands optional fields. Not needing them seems a lot simpler, not more complicated.

I still don't see an alternate that works if we keep insisting on our formal approach to introducing experimental API (according to Go semver requirements). However if we relax this requirement then the alternate is much simpler: don't split, mark unstable parts (currently logs) as "unstable" via some sort of labelling (like gRPC does).

There will be no alternative that works and does not require changes, potentially significant ones. I do not believe it is a sound idea to declare v1.0.0(but-open-to-breaking-changes). When we have discussed this in the SIG there has been little to no support for this approach and issues with gRPC, Prometheus, and others doing this have been cited as the reason why nobody supports it.

I have a feeling by trying to be formal we are complicating our lives disproportionately.

I know that by trying to avoid correctness we will complicate the lives of our users. There are many more of them than there are of us. The cost of choosing to be incorrect here will be high.

tigrannajaryan · 2022-02-14T17:44:25Z

I still don't see an alternate that works if we keep insisting on our formal approach to introducing experimental API (according to Go semver requirements).

@tigrannajaryan How would the implementation in #4833 'not work' in this sense?

By "alternate" I mean a solution that does not require splitting pdata into multiple packages. My understanding is that implementation in #4833 still requires pdata to be split into modules by signals.

tigrannajaryan · 2022-02-14T17:49:18Z

What about the other way around, then? Take the internal structs from the collector and make them available for everyone else to use.

The reason we keep them internal is to allow future optimizations. We don't want in-memory representation to be part of a public API. There are 2 potential optimizations that I would like to try some time in the future:

Faster AnyValue.
Hashmap for AttributesMap.

These don't require pdata API changes but require different in-memory structs.

tigrannajaryan · 2022-02-14T17:54:33Z

I do not believe it is a sound idea to declare v1.0.0(but-open-to-breaking-changes)

Somehow this works for gRPC, which tells me that maybe is not such a big deal. See https://github.com/grpc/grpc/blob/master/doc/versioning.md

I know that by trying to avoid correctness we will complicate the lives of our users. There are many more of them than there are of us. The cost of choosing to be incorrect here will be high.

Yes, it is a tradeoff.

To clarify, I don't insist that we must do this. I only ask every approver/maintainer to critically assess the tradeoffs. I am personally leaning towards grpc approach, but only slightly so, and can be convinced otherwise.

mx-psi · 2022-02-14T18:08:09Z

Somehow this works for gRPC, which tells me that maybe is not such a big deal. See https://github.com/grpc/grpc/blob/master/doc/versioning.md

Here are a bunch of issues related to grpc-go's approach to semver:

Update your SemVer Policy Please - Breaking changes in minor versions causing heartache grpc/grpc-go#3798
Breaking change in non-major release grpc/grpc-go#3726
using go mod tidy , there is error: google.golang.org/grpc/naming: module google.golang.org/grpc@latest found (v1.43.0), but does not contain package google.golang.org/grpc/naming grpc/grpc-go#5072
API stability guarantees grpc/grpc-go#3808
etcd not compatible with grpc v1.30.0 etcd-io/etcd#12124 which I mentioned in the original discussion. It is the issue with the most 👍 ever in the etcd repository
google.golang.org/grpc cannot be upgraded >= v1.30.0 kubernetes/kubernetes#93320 (caused by the above)

Other languages have support for this kind of thing, but Go is not among them (maybe it will, see golang/go#34409, but not in the near future).

tigrannajaryan · 2022-02-14T18:20:40Z

@mx-psi thank you for the links, this is very useful and shows that it indeed causes pains to users. I think I am convinced :-)

The other alternate is to follow proper semver and increase major version number anytime we change part of pdata that depends on a non-stable OTLP signal. The downsides seem to be:

Much more frequent major releases than people are typically used to.
we may crank up a bunch of major version numbers and end up with v120.0.0 pretty soon, which may look weird.
Something else?

bogdandrutu · 2022-02-17T00:05:58Z

During today SIG meeting, @Aneurysm9 pointed to an interesting solution. It seems that you can depend on an internal package from a different module as long as the path is still respected.

dmitryax · 2022-02-17T02:40:49Z

I'm on PTO til Friday, so wasn't able to participate. @Aneurysm9 can you please share more details about the solution you proposed?

Aneurysm9 · 2022-02-18T19:47:32Z

The Go internal package import restrictions pre-date modules and operate solely on the package import path hierarchy. So long as "the package doing the import is within the tree rooted at the parent of the internal directory" that package can import the internal package. This would allow a model/internal/data/protogen/* hierarchy within a module rooted at model to be utilized by modules rooted at model/pdata/trace, model/pdata/logs, etc.

It would also allow for a model/pdata/internal to house functions such as:

func NewResourceFromProto(orig *otlpresource.Resource) Resource {

without exposing them to consumers outside of the tree rooted at model/pdata. I believe that eliminates the only listed downside for approach 2) without the duplication implied by option 3) and presents a clear path to separate modules per signal.

bogdandrutu · 2022-02-18T20:11:16Z

@Aneurysm9 you are correct in your suggestion to make that API part of the "internal" package. I am not sure how NewResourceFromProto will be internal while Resource is public:

pacakge model/pdata:

type Resource struct {
  orig *otlpresource.Resource
}

pacakge model/pdata/internal:

func NewResourceFromProto(orig *otlpresource.Resource) Resource {
  // does not have access to the private member in `return Resource{orig: orig}`
}

Solution to this move Resource to model/pdata/internal and simply create an alias in model/pdata for the Resource, while keep the access to NewResourceFromProto internal/private.

dmitryax · 2022-02-25T01:44:14Z

@Aneurysm9 @bogdandrutu @codeboten @tigrannajaryan

I added couple more PRs to illustrate the proposed approaches:

Next PR for Approach 3: Split pdata package by telemetry signal type #4918
Draft PR for Approach 2: Split pdata package without using aliases #4919

I don't think it's reasonable to illustrate Approach 1 since it's pretty similar to Approach 3, it uses wrapping instead of aliasing witch adds more unnecessary complexity without any benefits.

bogdandrutu · 2022-02-25T16:58:41Z

Here is way I propose version 3:

It has less public API, we are exposing new "helper" funcs that we need to keep forever even though users cannot really use it. Why? Because someone may call AttributeMapFromOrig(nil).
We can go from version 3 -> version 2 in the future but not vice-versa. Why? Because of the extra public API needed in option 2.
If by mistake anyone adds a Getter that returns the internal struct, users can access all their public members/funcs. We had this happened in the past.
CONS: The public API surface may be unclear. A possible solution is to duplicate the code and call into the internal. This is generated code anyway.

dmitryax · 2022-02-25T18:13:08Z

CONS: The public API surface may be unclear. A possible solution is to duplicate the code and call into the internal. This is generated code anyway.

Most of the code operates on orig private field, so not much to duplicate, maybe just wrap the functions calls and duplicate godoc

codeboten · 2022-03-01T22:33:04Z

Thanks for putting together the draft for approach 2 @dmitryax. After reviewing it I'm leaning towards approach 3 to minimize new APIs to add at this point.

mx-psi · 2022-03-02T15:59:53Z

I am also leaning towards approach 3, I think @bogdandrutu's points make a lot of sense

Aneurysm9 · 2022-03-03T01:34:51Z

We can go from version 3 -> version 2 in the future but not vice-versa. Why? Because of the extra public API needed in option 2.

I think the plan laid out in #4918 (comment) makes the congruence between these approaches clearer, though I do wish that #4919 had used the same package structure. The original proposal here for option 3 was missing the detail needed to draw a line through to the final destination. I'm fine with moving in this direction now, though as I question in #4918 (comment) I'm not sure we need #4833 as an interim step before #4918.

If by mistake anyone adds a Getter that returns the internal struct, users can access all their public members/funcs. We had this happened in the past.

Isn't this a problem any time you have internal packages with types that have exported methods? I don't think options 2 or 3 are any different in this regard, are they?

dmitryax added this to the `go.opentelemetry.io/collector/model` 1.0 milestone Feb 10, 2022

dmitryax mentioned this issue Feb 10, 2022

Move pdata into an internal package and use aliases in exported package #4833

Merged

punya mentioned this issue Feb 11, 2022

[Decision to be made] Approach to handle experimental types of telemetry #4705

Closed

mx-psi mentioned this issue Feb 24, 2022

Track @mx-psi interest and contributions for approver #2819

Closed

This was referenced Feb 25, 2022

Split pdata package by telemetry signal type #4918

Closed

Split pdata package without using aliases #4919

Closed

bogdandrutu assigned dmitryax Mar 4, 2022

bogdandrutu mentioned this issue Mar 17, 2022

Split otlp/otlpgrpc #5027

Closed

codeboten added the priority:p1 High label Mar 18, 2022

dmitryax mentioned this issue Apr 8, 2022

Split model by signal type and move it to the new pdata module #5168

Merged

bogdandrutu closed this as completed in #5168 Apr 11, 2022

mx-psi mentioned this issue Nov 17, 2023

[docs, .github] Add new issue template for stabilizing a module #8935

Merged

mx-psi mentioned this issue Jan 3, 2024

Add policy for experimental Go API #9209

Open

mx-psi mentioned this issue May 22, 2024

Proposal: split signal-aware packages into per-signal subpackages #10207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split pdata into multiple packages #4832

Split pdata into multiple packages #4832

dmitryax commented Feb 10, 2022 •

edited

bogdandrutu commented Feb 10, 2022

dmitryax commented Feb 10, 2022 •

edited

dmitryax commented Feb 11, 2022

Aneurysm9 commented Feb 11, 2022

codeboten commented Feb 11, 2022

dmitryax commented Feb 11, 2022

Aneurysm9 commented Feb 11, 2022

dmitryax commented Feb 12, 2022 •

edited

Aneurysm9 commented Feb 12, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

mx-psi commented Feb 14, 2022

Aneurysm9 commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

mx-psi commented Feb 14, 2022 •

edited

tigrannajaryan commented Feb 14, 2022

bogdandrutu commented Feb 17, 2022

dmitryax commented Feb 17, 2022 •

edited

Aneurysm9 commented Feb 18, 2022 •

edited

bogdandrutu commented Feb 18, 2022 •

edited

dmitryax commented Feb 25, 2022

bogdandrutu commented Feb 25, 2022

dmitryax commented Feb 25, 2022 •

edited

codeboten commented Mar 1, 2022

mx-psi commented Mar 2, 2022

Aneurysm9 commented Mar 3, 2022

Split pdata into multiple packages #4832

Split pdata into multiple packages #4832

Comments

dmitryax commented Feb 10, 2022 • edited

Problem

Possible solutions

1. Wrap common pdata structs

2. Add new exported functions to common data structures to be used only by other pdata signals

3. Put pdata into an internal package as is and use aliases to it in exported modules

Proposal

bogdandrutu commented Feb 10, 2022

dmitryax commented Feb 10, 2022 • edited

dmitryax commented Feb 11, 2022

Aneurysm9 commented Feb 11, 2022

codeboten commented Feb 11, 2022

dmitryax commented Feb 11, 2022

Aneurysm9 commented Feb 11, 2022

dmitryax commented Feb 12, 2022 • edited

Aneurysm9 commented Feb 12, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

mx-psi commented Feb 14, 2022

Aneurysm9 commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

tigrannajaryan commented Feb 14, 2022

mx-psi commented Feb 14, 2022 • edited

tigrannajaryan commented Feb 14, 2022

bogdandrutu commented Feb 17, 2022

dmitryax commented Feb 17, 2022 • edited

Aneurysm9 commented Feb 18, 2022 • edited

bogdandrutu commented Feb 18, 2022 • edited

dmitryax commented Feb 25, 2022

bogdandrutu commented Feb 25, 2022

dmitryax commented Feb 25, 2022 • edited

codeboten commented Mar 1, 2022

mx-psi commented Mar 2, 2022

Aneurysm9 commented Mar 3, 2022

dmitryax commented Feb 10, 2022 •

edited

3. Put `pdata` into an internal package as is and use aliases to it in exported modules

dmitryax commented Feb 10, 2022 •

edited

dmitryax commented Feb 12, 2022 •

edited

mx-psi commented Feb 14, 2022 •

edited

dmitryax commented Feb 17, 2022 •

edited

Aneurysm9 commented Feb 18, 2022 •

edited

bogdandrutu commented Feb 18, 2022 •

edited

dmitryax commented Feb 25, 2022 •

edited