Unexpected metric values for doc count for elasticsearch integration #16477

jiahuijiang · 2023-12-20T16:55:16Z

Additional environment details (Operating System, Cloud provider, etc):
AWS self-hosting elasticsearch 8.5.0

Describe the results you received:
We self-host elasticsearch 8.5.0 on AWS. The datadog agents are running properly.
But we recently noticed the doc count metrics are different from expected.

In datadog we sum the value of elasticsearch.docs.count and elasticsearch.docs.deleted over all hosts. But the output is slightly higher than what we get through calling _cluster/stats. As far as I know, both values are counting replicas, so I'm expecting the output to be exactly the same.

For example, on our cluster where every index has 1 replica

our datadog dashboard reports elasticsearch.docs.count as 66.5 B, but _cluster/stats endpoint gives 33.25 B. This looks like we are double counting the number of document. Every datanode reports a value of ~900MB, and we have 75 data nodes.
our datadog dashboard reports elasticsearch.docs.deleted as 9.69 B, but _cluster/stats endpoint gives 4.65B. This is more than 2 time difference, and I have no good hypothesis.

Question: Is the way we aggregate elasticsearch.docs.count and elasticsearch.docs.deleted (sum over all hosts) correct? If so, what other reason could have cause the inconsistency?

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected metric values for doc count for elasticsearch integration #16477

Unexpected metric values for doc count for elasticsearch integration #16477

jiahuijiang commented Dec 20, 2023

Unexpected metric values for doc count for elasticsearch integration #16477

Unexpected metric values for doc count for elasticsearch integration #16477

Comments

jiahuijiang commented Dec 20, 2023