Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Search Quantization not always effective and does not match Datadog agent's #9158

Open
ZStriker19 opened this issue May 3, 2024 · 0 comments

Comments

@ZStriker19
Copy link
Contributor

ZStriker19 commented May 3, 2024

Summary of problem

Quantization of ElasticSearch urls does not match that of the Datadog agent and also does not work well. In some cases leading to un-useful resource names and high cardinality.

https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/contrib/elasticsearch/quantize.py
Turns a url like "/en_search/_doc/fa1117d4-7917-5a2d-b001-76e6e4ca83b2"' into '"/en_search/_doc/fa?d4-?-5a2d-b?-?e6e4ca?b2"' before using it for the resource name. Ending up with resource names like '"GET /en_search/_doc/fa?d4-?-5a2d-b?-?e6e4ca?b2"' where instead we would want something like '/en_search/_doc/?'`

Currently we can work around this with a trace filter, however in the next major release (3.0) we should change the quantization to match the Datadog agent's which changes this into '/en_search/_doc/{guid}'` In the meantime, we could also put the new quantization behind a flag for now.

Trace filter example:

from ddtrace import Span, tracer
from ddtrace.filters import TraceFilter

# chops off after last "/" if there are ints after it and replaces it with "/?"
REMOVE_AFTER_LAST_SLASH = re.compile(r"/[^/]*\d[^/]*$")

class CorrectESResourceNameFilter(TraceFilter):
    """example input: '/en_search/_doc/fa?d4-?-5a2d-b?-?e6e4ca?b2' output: '/en_search/_doc/?'"""
    def process_trace(self, trace):
        # type: (List[Span]) -> Optional[List[Span]]
        for span in trace:
            # if you're changing the service name of elasticsearch, use that instead
            if span.service == "elasticsearch":
                url = span.get_tag("elasticsearch.url")
                method = span.get_tag("elasticsearch.method")
                span.resource = "{method} {url}".format(method=method, url=REMOVE_AFTER_LAST_SLASH.sub(r"/?", url))
        return trace
# And then configure it with
tracer.configure(settings={'FILTERS': [CorrectESResourceNameFilter()]})

Which version of dd-trace-py are you using?

2.8.3

Elastic search

How can we reproduce your problem?

Try the current quantize method on `"/en_search/_doc/fa1117d4-7917-5a2d-b001-76e6e4ca83b2"'

What is the result that you get?

'"/en_search/_doc/fa?d4-?-5a2d-b?-?e6e4ca?b2"'

What is the result that you expected?

'/en_search/_doc/?'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant