Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: OpAMP Configuration Provider. #15295

Closed
2 tasks
pavankrish123 opened this issue Oct 19, 2022 · 16 comments
Closed
2 tasks

New component: OpAMP Configuration Provider. #15295

pavankrish123 opened this issue Oct 19, 2022 · 16 comments
Labels
Sponsor Needed New component seeking sponsor

Comments

@pavankrish123
Copy link
Contributor

The purpose and use-cases of the new component

There has been a lot of discussions and progress to standardize the agent configuration mechanism through OTEL community driven OpAMP specification.

Having a provider that effectively acts like an OpAMP client makes collector receive and listen for updates over OpAMP channels from OpAMP compliant servers.

Example configuration for the component

Something similar to s3 provider s3:// provider but we will be using OpAMP specified apis to poll for configuration updates. Initially we can go with only http and leverage https://github.com/open-telemetry/opamp-go client module for the implementation.

Telemetry data types supported

Configuration Provider.

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@pavankrish123 pavankrish123 added the needs triage New item requiring triage label Oct 19, 2022
@pavankrish123
Copy link
Contributor Author

@jpkrohling @tigrannajaryan any suggestions on this? Let me know if there is an effort already on the way. I am more than happy to help with the items

@tigrannajaryan
Copy link
Member

There are some limitations in the Collector that prevent implementing full-featured OpAMP management solution. Namely ReportFatalError needs to be eliminated before we can start accepting remote configuration robustly.

I think we should stat with a minimal OpAMP implementation which can support the following subset of OpAMP capabilities:

  • Report AgentDescription
  • Report Health
  • Report Effective Configuration

The more sophisticated capabilities (e.g. accepting remote config, etc) need to wait for now until we have the mechanisms to support them. I also think that full-feature OpAMP implementation likely needs to be an out-of-process Supervisor-based instead of being a Collector component.

I think @amdprophet and @portertech were planning to work no OpAMP provider, so it is best to sync with them to make sure there is no duplicate effort.

@tigrannajaryan
Copy link
Member

Generally, here are issues that are probably blockers for full OpAMP implementation:

Luckily we don't have to solve all of these right now. We can still have partial OpAMP support without these and it will still be valuable.

@amdprophet
Copy link

@tigrannajaryan we've put our work on an OpAMP provider on hold for now. Looking forward to the changes! 👍

@evan-bradley evan-bradley added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Oct 24, 2022
@pavankrish123
Copy link
Contributor Author

pavankrish123 commented Oct 24, 2022

Thank you so much @tigrannajaryan. We can definitely help with the OpAMP support even if it is partial. Some follow up questions -

  1. This is for @tigrannajaryan

I also think that full-feature OpAMP implementation likely needs to be an out-of-process Supervisor-based instead of being a Collector component.

Is this primarily driven through the use-cases for binary and process management?

So our most asked for use case in our organization is to remotely drive the configuration into OpenTelemetry collector and other Agents dynamically. Basically, a channel that we can establish with our agent directly. Do you see any issues going that route - granted I do understand having a supervisor based approach but do we really need one in case where

  1. We don't care about process management.
  2. We don't mind having any OpAMP specific code inside our agents.

Our agents are capable of dynamically processing the configuration commands as well.

  1. This is for @amdprophet - @jpkrohling

@tigrannajaryan we've put our work on an OpAMP provider on hold for now. Looking forward to the changes! 👍

This is but unrelated but since you have mentioned OpAMP "provider" - is there any effort to include providers in builder configuration (just as receivers, processors and exporters, extensions)?

For instance to include s3provider, the only way to include the package in the collector builds is by making changes to service.go

cc: @svrnm

@tigrannajaryan
Copy link
Member

Do you see any issues going that route - granted I do understand having a supervisor based approach but do we really need one in case where

It is currently impossible to implement remote configuration feature in a way that is reasonably robust. Any invalid configuration will result in Collector process exiting.

We need to fix open-telemetry/opentelemetry-collector#6344 and open-telemetry/opentelemetry-collector#6226 to make sure receiving a bad config doesn't result in the Collector process termination.

@portertech
Copy link
Contributor

portertech commented Oct 26, 2022

@pavankrish123 here's our initial OpAMP configuration provider POC sensu/sumologic-otel-collector@main...feature/opamp-config-provider I'm keen to continue experimenting with both a minimal (only reporting) and fully functional OpAMP configuration provider when some of these previously mentioned changes are merged. This POC is a little "hacky" and has been set up to work with Bindplane (as the OpAMP server).

If it makes sense, I can strip this POC down to a minimal (only reporting) implementation and push it somewhere where we can collaborate. WDYT?

@pavankrish123
Copy link
Contributor Author

pavankrish123 commented Oct 26, 2022

This is great @portertech - Makes complete sense to me to put the minimal set of features as a starting point and we can expand on it as we solve other gaps @tigrannajaryan pointed out. +1 for your idea.

Another thing worth solving initially is to package providers through builder file. Do you know of any work that is happening already. To summarize, we need a way to declare providers as a part of builder.yml to package them in the collector binary just like we do for other components. cc: @jpkrohling

@svrnm
Copy link
Member

svrnm commented Oct 27, 2022

Do you see any issues going that route - granted I do understand having a supervisor based approach but do we really >>need one in case where

It is currently impossible to implement remote configuration feature in a way that is reasonably robust. Any invalid >configuration will result in Collector process exiting.

@tigrannajaryan: assuming those 3 issues would get fixed, are there any other blockers/concerns from having such an OpAMP provider added as a new component to this repo? Anything you can think of why the @open-telemetry/collector-contrib-maintainer would not accept such a contribution?

@tigrannajaryan
Copy link
Member

It is currently impossible to implement remote configuration feature in a way that is reasonably robust. Any invalid >configuration will result in Collector process exiting.

@tigrannajaryan: assuming those 3 issues would get fixed, are there any other blockers/concerns from having such an OpAMP provider added as a new component to this repo? Anything you can think of why the @open-telemetry/collector-contrib-maintainer would not accept such a contribution?

@svrnm Not that I can think of, but the maintainers haven't explicitly said yes, so it is worth bringing up in the next SIG meeting to be sure.

@tigrannajaryan
Copy link
Member

If we agree to move forward with this we need a mini design-doc to describe:

  • How OpAMP client settings will be specified. Command line is fine for specifying the server endpoint but there are more options that may need to be configurable (e.g. auth headers to use, etc). A couple choices are: cram them into the command line or put them in a separate opamp.yaml config file and reference it from the command line.
  • Which of OpAMP capabilities are supported.
  • If remote config is supported, how we sanitize the config, see see spec recommendations.
  • How is OpAMP Provider notified about the health of the Collector. Somehow healthcheck extension must communicate with it. This also depends on (yet unknown) a mechanism to replace ReportFatalError by health reporting.

@portertech
Copy link
Contributor

@tigrannajaryan is there a design doc template that should be used?

@tigrannajaryan
Copy link
Member

We don't have a template, but here is one example: https://docs.google.com/document/d/1KtH5atZQUs9Achbce6LiOaJxLbksNJenvgvyKLsJrkc/edit#

@svrnm
Copy link
Member

svrnm commented Oct 28, 2022

@svrnm Not that I can think of, but the maintainers haven't explicitly said yes, so it is worth bringing up in the next SIG meeting to be sure.

Thanks, I wanted to clarify if there are any forseeable roadblockers already. Understood that this is not an "explicit yes", we will follow up on a SIG meeting eventually.

@jpkrohling
Copy link
Member

To summarize, we need a way to declare providers as a part of builder.yml to package them in the collector binary just like we do for other components. cc: @jpkrohling

I think we have a tracker for this already. If you want to give it a shot, feel free to assign me as a reviewer for the PR!

@tigrannajaryan
Copy link
Member

Closing this in favour of an approach that uses an extension: #16462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sponsor Needed New component seeking sponsor
Projects
None yet
Development

No branches or pull requests

7 participants