Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf: update Elasticsearch alerting #1600

Merged
merged 9 commits into from
May 24, 2024
Merged

tf: update Elasticsearch alerting #1600

merged 9 commits into from
May 24, 2024

Conversation

AndrewKostka
Copy link
Contributor

No description provided.

Copy link
Contributor

@tarrow tarrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good; I think we need to take a second look at grouping and it might be easier to do this after updating the google TF provider

condition_absent = "300s"
min_group_by = "metric.label.es_cluster"
},
"es-cluster-available-shards-${var.environment}" = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see this go; I wonder if we ought to add (in a follow up so not blocking) an alert for the shard size growing beyond a certain size. That would be a comparable alert to this one we are removing for growth in search content we may not have noticed

threshold_value = 1
duration = "60s"
threshold_value = 0
duration = "900s"
condition_absent = "300s"
min_group_by = "metric.label.es_cluster"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was somewhat struggling to confirm my suspicion but I suspect after inverting the COMPARISON_LT to COMPARISON_GT this should become max_group_by (and same logic for the red check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look's like the provider has undergone a major version bump beyond where we are at (5.29.1) and then looking here https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy is seems like the format may have changed somewhat in general for grouping to become more standard with the other apis and we might now need to us the https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy#cross_series_reducer for this logic.

@AndrewKostka
Copy link
Contributor Author

Generally looks good; I think we need to take a second look at grouping and it might be easier to do this after updating the google TF provider

These need to go out before we update the Google Terraform provider

@tarrow
Copy link
Contributor

tarrow commented May 24, 2024

This patch is a no-op on production and will make these changes on staging

@AndrewKostka AndrewKostka merged commit 08baff5 into main May 24, 2024
2 checks passed
@AndrewKostka AndrewKostka deleted the tf-edit-es-alarms branch May 24, 2024 08:17
@tarrow
Copy link
Contributor

tarrow commented May 27, 2024

This patch is a no-op on production and will make these changes on staging

whoops, this was wrong - I'd not clocked the final commit in this PR also touched production

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants