Agent mode support for bootstrap output #3506

michel-laterman · 2024-04-30T23:30:45Z

What is the problem this PR solves?

When running under the elastic-agent fleet-server is not able to use output settings from it's policy.

How does this PR solve the problem?

This PR requires elastic/elastic-agent#4643 to work.

The elastic-agent will inject enrollment configuration options in output.elasticsearch.bootstrap instead of overwriting matching keys in output.elasticsearch.
When running under agent mode, fleet-server will inject specific keys in bootstrap that are not in output.elasticsearch, then test the resulting output to see if it can connect to Elasticsearch, if so it is used. If not then bootstrap is used instead and the output is periodically retested in case the failure was caused by a temporary network issue.

How to test this PR locally

Create an elastic-agent package from: elastic/elastic-agent#4643
replace the fleet-server component using one generated from this pr.

If testing with docker images docker.elastic.co/observability-ci/elastic-agent:8.15.0-SNAPSHOT-dd4c89e-1715633206 can be used as the BASE_IMAGE for generating a new image/deployment with the make cloud-deploy target in dev-tools/cloud.
Or the docker.elastic.co/observability-ci/elastic-agent:8.15.0-SNAPSHOT-laterman-1715705581 image can be used as it contains the changes from both PRs.

I've verified that the following behaviours work:

When deployed to ESS:

fleet-server is available/healthy
logging level for fleet-server can be changed with no issues
diagnostics can be collected
e2e cypress tests for defender succeed (thanks @tomsonpl!)

When deployed locally:

fleet-server is available/healthy
logging level for fleet-server can be changed with no issues
diagnostics can be collected
multiple hosts can be added to Elasticasearch output (in Kibana) and show up in fleet-server.yml when a diagnostics bundle is collected

Design Checklist

I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool

Related issues

internal/pkg/server/agent.go

Add support for a bootstrap attribute in the output when running in agent mode. If this attribute is missing the output block is used directly. If the attribute is provided, then any attributes within bootstrap that are not in the parent (output) object are recursivly injected and the resulting output is tested. If the resulting config works it is used, if it fails the test the bootstrap config is passed.

michel-laterman · 2024-05-13T22:15:17Z

Testing progress: elastic/elastic-agent#4643 (comment)

i'll now try to implement the areas of improvement above:

skip injecting verification_mode: none if a CA or CA fingerprint is in the retrieved policy.
async period output testing if the bootstap block has been passed

pchila

Just a couple of comments, looks good overall

internal/pkg/server/agent.go

cmacknz · 2024-05-21T17:32:44Z

I think we need an automated test proving that Elastic Agent can bootstrap Fleet Server in one of the repositories before this or elastic/elastic-agent#4643 are merged.

If the coordination of the two PRs with the test is annoying enough I'd be fine with the test being added a separate PR, but not closing the implementation issue until it exists.

michel-laterman · 2024-05-21T17:47:14Z

buildkite test this

elastic-sonarqube · 2024-05-21T18:05:20Z

Quality Gate passed

Issues
2 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
77.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Apr 30, 2024

michel-laterman force-pushed the fleet-bootstrap-output branch 4 times, most recently from 79978ac to fcec504 Compare May 1, 2024 16:31

cmacknz reviewed May 1, 2024

View reviewed changes

internal/pkg/server/agent.go Outdated Show resolved Hide resolved

michel-laterman force-pushed the fleet-bootstrap-output branch from fcec504 to 07bb754 Compare May 1, 2024 20:24

michel-laterman changed the title ~~wip~~ Agent mode support for bootstrap output May 1, 2024

michel-laterman added 2 commits May 2, 2024 10:54

Add unit tests

1164427

fix linter

92260e3

michel-laterman force-pushed the fleet-bootstrap-output branch from b01efb6 to 92260e3 Compare May 2, 2024 18:24

change output injection to use specific keys

33f1bd0

michel-laterman added 2 commits May 13, 2024 16:02

Ignore bootstrap verification_none if ca is in output

46cf94e

Add periodic output retests on error

070b778

michel-laterman mentioned this pull request May 14, 2024

Send fleet-server elasticsearch config under new bootstrap attribute elastic/elastic-agent#4643

Merged

5 tasks

Merge branch 'main' into fleet-bootstrap-output

dd77d07

michel-laterman marked this pull request as ready for review May 14, 2024 18:34

michel-laterman requested a review from a team as a code owner May 14, 2024 18:34

michel-laterman requested review from AndersonQ and pchila May 14, 2024 18:34

ycombinator removed the request for review from AndersonQ May 16, 2024 20:49

pchila approved these changes May 21, 2024

View reviewed changes

internal/pkg/server/agent.go Show resolved Hide resolved

Add warning logs when non object ssl attributes aree found

22d94b0

michel-laterman enabled auto-merge (squash) May 21, 2024 16:49

michel-laterman merged commit 12d1b4a into elastic:main May 21, 2024
8 checks passed

michel-laterman deleted the fleet-bootstrap-output branch May 21, 2024 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent mode support for bootstrap output #3506

Agent mode support for bootstrap output #3506

michel-laterman commented Apr 30, 2024 •

edited

michel-laterman commented May 13, 2024

pchila left a comment

cmacknz commented May 21, 2024

michel-laterman commented May 21, 2024

elastic-sonarqube bot commented May 21, 2024

Agent mode support for bootstrap output #3506

Agent mode support for bootstrap output #3506

Conversation

michel-laterman commented Apr 30, 2024 • edited

What is the problem this PR solves?

How does this PR solve the problem?

How to test this PR locally

Design Checklist

Checklist

Related issues

michel-laterman commented May 13, 2024

pchila left a comment

Choose a reason for hiding this comment

cmacknz commented May 21, 2024

michel-laterman commented May 21, 2024

elastic-sonarqube bot commented May 21, 2024

Quality Gate passed

michel-laterman commented Apr 30, 2024 •

edited