Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres Performance Optimization: Cache baseline metrics and apply updates incrementaly #17554

Merged

Conversation

amw-zero
Copy link
Contributor

What does this PR do?

#17187 was reverted because we observed that it could lead to inflated metrics. The root cause of that is that computed_derivative_rows currently assumes that all rows in pg_stat_statements are passed as input. It does not compute correct results when a subset of pg_stat_statements is passed in, because that can lead to taking the diff of two different pg_stat_statement rows, which leads to incoherent results.

The approach taken here is to introduce a baseline_metrics cache, which holds one entry for each pg_stat_statements row and is populated once from the full pg_stat_statements dataset. On subsequent check runs, only queries that have been called since the previous run will be returned, and the baseline_metrics will be updated for just those queryids.

This allows the full set of normalized rows to be constructed with one new function: _apply_deltas. This gets run before compute_derivative_rows and uses the baseline_metrics to reconstruct the full set of metrics rows. This relies on the fact that a query that hasn't been called between check runs has no change in metrics, so we can reuse the cached values.

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Changelog entries must be created for modifications to shipped code
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link
Contributor

@lu-zhengda lu-zhengda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about increased memory impact of _baseline_metrics because the size of this dict is not bounded by pg_stat_statements.max. In a high cardinality database with lots of evictions in pg_stat_statements, the size of _baseline_metrics could grow quickly.

postgres/datadog_checks/postgres/statements.py Outdated Show resolved Hide resolved
postgres/datadog_checks/postgres/statements.py Outdated Show resolved Hide resolved
postgres/datadog_checks/postgres/statements.py Outdated Show resolved Hide resolved
@amw-zero
Copy link
Contributor Author

Re the unbounded growth due to query eviction, I added an expiry time (currently set to 10 minutes) past which the baseline will be re-fetched: decf7e8.

@amw-zero
Copy link
Contributor Author

@lu-zhengda I tweaked the implementation slightly. I realized that aggregating the metric data by query signature in _apply_deltas duplicates the logic in _merge_duplicate_rows. So instead I'm using the baseline_metrics to re-construct the full set of rows (with one row per query signature), and allowing _merge_duplicate_rows to continue doing the aggregation.

@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch 3 times, most recently from 08deca3 to f7b78bd Compare May 13, 2024 14:39
Copy link

Test Results

296 tests   256 ✅  2m 0s ⏱️
  1 suites   39 💤
  1 files      1 ❌

For more details on these failures, see this check.

Results for commit f7b78bd.

@@ -77,6 +77,93 @@ def test_dbm_enabled_config(integration_check, dbm_instance, dbm_enabled_key, db
assert check._config.dbm_enabled == dbm_enabled


@requires_over_10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This test is failing on PG9. We may have to skip this optimization on that version. It seems like not all expected rows are being returned in the QUERY_CALLS_QUERY which prevents correct functioning.

@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch 2 times, most recently from 1d9227f to 1af5fec Compare May 14, 2024 14:47
@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from 32e8a45 to 2630e2f Compare May 14, 2024 19:20
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

1 similar comment
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from d35d2b3 to e44b0b3 Compare May 14, 2024 19:27
Copy link
Contributor

@lu-zhengda lu-zhengda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. minor change on the config default

postgres/assets/configuration/spec.yaml Outdated Show resolved Hide resolved
@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from 1e11c25 to d252c9a Compare May 14, 2024 21:05
lu-zhengda
lu-zhengda previously approved these changes May 14, 2024
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from 7f0bd81 to 8bdf3c6 Compare May 15, 2024 21:26
@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from 90a6700 to dbc72b2 Compare May 17, 2024 00:11
@amw-zero amw-zero force-pushed the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch from 2a98e2b to 7523552 Compare May 17, 2024 02:13
@amw-zero amw-zero merged commit 031c7c6 into master May 17, 2024
35 checks passed
@amw-zero amw-zero deleted the alex.weisberger/dbmon-4046-pg-optimization-delta-application branch May 17, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants