Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubeapiserver/events] Support namespace labels as tags on kubernetes events #25490

Merged
merged 10 commits into from May 13, 2024

Conversation

jennchenn
Copy link
Member

What does this PR do?

Add support for namespace labels as tags on kubernetes events. Use of the namespace InformerFactory is also removed now that namespace information is stored in workloadmeta.

Motivation

Namespace labels as tags is already supported on metrics; as we continue fleshing out the kubernetes events product, it would be good to add namespace labels as tags to emitted events.

Additional Notes

The code path to get namespace labels when the DCA is disabled was removed in this PR. Upon testing the latest version of the agent, it was found that namespace information was not being stored by the node agent (i.e. the paths in metadata_controller.go were not hit). As a result, when the DCA was disabled the agent is unable to fetch namespace labels. When the DCA Is enabled, the logic was changed to fetch namespace information from workloadmeta, thus the informer factory for namespaces was removed altogether.

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

  1. Enable both the agent and cluster agent
  2. Set kubernetes_namespace_collection_enabled to true (set to false by default)
  3. Configure namespaceLabelsAsTags
    e.g.
datadog:
  logLevel: DEBUG
  clusterName: test-ns-labels
  apiKeyExistingSecret: datadog-secret
  appKeyExistingSecret: datadog-secret
  collectEvents: true
  namespaceLabelsAsTags:
    "*": prefixing_%%label%%
  1. Deploy both the agent and cluster agent
  2. Add a label to a namespace you expect to see events from e.g. kubectl label namespace default test=qa
  3. Go to the event explorer and check that you can see kubernetes events being streamed; for events tagged with the namespace you labeled above, check that you also see a tag prefixing_test:qa

@jennchenn jennchenn requested review from a team as code owners May 9, 2024 20:47
@jennchenn jennchenn added this to the 7.55.0 milestone May 9, 2024
@jennchenn jennchenn force-pushed the jenn/CONTINT-3693_ns-label-tags-k8s-events branch from 2c281f7 to d02573a Compare May 9, 2024 20:48
@davidor
Copy link
Member

davidor commented May 9, 2024

I opened a PR yesterday that implements part of what has been implemented here: #25460 Just the migration of the endpoint to get the namespace labels.

@pr-commenter
Copy link

pr-commenter bot commented May 9, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=34031654 --os-family=ubuntu

@pr-commenter
Copy link

pr-commenter bot commented May 9, 2024

Regression Detector

Regression Detector Results

Run ID: 8626c1f2-763d-4c10-a8db-89e80ac33835
Baseline: 5966dc6
Comparison: 80f0661

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI
file_to_blackhole % cpu utilization -27.01 [-32.24, -21.78]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
uds_dogstatsd_to_api_cpu % cpu utilization +0.69 [-2.19, +3.57]
idle memory utilization +0.67 [+0.63, +0.71]
basic_py_check % cpu utilization +0.64 [-1.77, +3.05]
tcp_dd_logs_filter_exclude ingress throughput +0.05 [+0.01, +0.09]
trace_agent_msgpack ingress throughput +0.00 [-0.00, +0.00]
trace_agent_json ingress throughput -0.00 [-0.01, +0.01]
uds_dogstatsd_to_api ingress throughput -0.03 [-0.24, +0.17]
otel_to_otel_logs ingress throughput -0.21 [-0.60, +0.18]
file_tree memory utilization -0.37 [-0.46, -0.28]
pycheck_1000_100byte_tags % cpu utilization -1.01 [-5.71, +3.69]
tcp_syslog_to_blackhole ingress throughput -5.28 [-26.16, +15.59]
file_to_blackhole % cpu utilization -27.01 [-32.24, -21.78]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Copy link
Contributor

@sblumenthal sblumenthal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, great job

@jennchenn
Copy link
Member Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented May 13, 2024

🚂 MergeQueue

Pull request added to the queue.

There are 2 builds ahead! (estimated merge in less than 1h)

Use /merge -c to cancel this operation!

@jennchenn
Copy link
Member Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented May 13, 2024

❌ MergeQueue

PR already in the queue with status queued

If you need support, contact us on Slack #devflow with those details!

@dd-mergequeue dd-mergequeue bot merged commit 010f0bd into main May 13, 2024
203 of 204 checks passed
@dd-mergequeue dd-mergequeue bot deleted the jenn/CONTINT-3693_ns-label-tags-k8s-events branch May 13, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants