Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OTEL for metrics gathering (WIP) #3148

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Dec 6, 2023

What is the problem this PR solves?

An onweek project to change fleet-server's metrics collection to use otel + the APM exporter as a bridge.

We are currently using a mixture of elastic-agent-libs and prometheus+an APM bridge.
This is a little messy and introduces unneeded dependencies and complexity.

OTEL metrics for the routes are now tagged with server.host and server.port attributes to allow us to determine when the internal/external API ports have issues.

A translation/export mechanism is provided so that we can continue to provide fleet-server metrics on the /stats endpoint for metricbeat collection/monitoring until we have determined how elastic-agent will monitor components with otel.

Note that it completely removes the option for the prometheus endpoint.
It also removes collecting the generic system/cpu/mem datasets.

TODO

  • windows named pipe support
  • beat.info population
  • metrics server timeout values

How to test this PR locally

Start a server and ping 5066/stats to view the metricbeat stats endpoint.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

@michel-laterman michel-laterman force-pushed the otel-metrics branch 5 times, most recently from 045f12a to b0d2120 Compare December 8, 2023 18:24
Change the underlying metrics collection to use OTEL and the APM bridge
instead of prometheus and the elastic-agent-libs metrics collection
mechanism. All API metrics will now have server.host and server.port
attributes assocaited with the metric so we can easily determine when a
request hits the external or internal port. Provide a translation layer
to expose the OTEL metrics on the /stats endpoint so that Metricbeat can
still collect fleet-server meterics. Note that generic CPU/Mem/System
metrics are no longer collected or exposed on the endpoint.
@michel-laterman
Copy link
Contributor Author

This draft is unlikely to get worked on or merged anytime soon as we are still determining the otel roadmap

Copy link
Contributor

mergify bot commented Dec 26, 2023

This pull request is now in conflicts. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b otel-metrics upstream/otel-metrics
git merge upstream/main
git push upstream otel-metrics

1 similar comment
Copy link
Contributor

mergify bot commented Apr 15, 2024

This pull request is now in conflicts. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b otel-metrics upstream/otel-metrics
git merge upstream/main
git push upstream otel-metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant