Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to see flame graph for db calls, redis, endpoint etc in CI Visibility #8710

Open
hdadkhah1 opened this issue Mar 18, 2024 · 10 comments
Assignees
Labels

Comments

@hdadkhah1
Copy link

hdadkhah1 commented Mar 18, 2024

Summary of problem

Not able to see flame graph for db calls, redis, endpoint etc in CI Visibility, we want to use this data to see what our test bottlenecks etc is. We have enabled ddtrace to use dd agent (so it can send traces) and also are running with --ddtrace --ddtrace-patch-all . It seems to be it does not connect parent_id from traces to the test run spans.

DD_CIVISIBILITY_AGENTLESS_ENABLED="false"

using datadog agent:
[] [2024-03-18 20:08:01,766] INFO in recorder: Requests mode: agent EVP proxy

I do see in ddtrace debug logs, we do capture postgres call in this example:
Screenshot 2024-03-18 at 2 18 39 PM

Which version of dd-trace-py are you using?

2.7.3

Which version of pip are you using?

pip 23.3.1

Which libraries and their versions are you using?

`pip freeze` ach==0.2 aiofiles==23.1.0 aiohttp==3.9.2 aiosignal==1.3.1 alembic==1.12.0 althaia==3.19.0 amqp==5.1.1 analytics-python==1.2.9 aniso8601==9.0.1 annotated-types==0.6.0 anthropic==0.17.0 anyio==3.7.1 apispec==6.0.2 appdirs==1.4.4 appnope==0.1.3 APScheduler==3.10.1 argcomplete==3.2.2 argh==0.28.1 asn1==2.7.0 asn1crypto==1.5.1 astroid==2.15.2 asttokens==2.2.1 async-timeout==4.0.2 asynch==0.2.2 asyncpg==0.28.0 attrs==23.1.0 Authlib==1.0.1 autoflake==1.3 azure-ai-formrecognizer==3.3.0b1 azure-common==1.1.28 azure-core==1.28.0 Babel==2.12.1 backcall==0.2.0 backoff==2.2.1 basistheory==2.0.0 bcrypt==4.0.1 beautifulsoup4==4.12.2 bidict==0.22.1 billiard==4.1.0 black==22.3.0 blinker==1.6.2 bloom-filter2==2.0.0 boto3==1.34.7 botocore==1.34.7 Brotli==1.0.9 bytecode==0.14.2 CacheControl==0.13.1 cached-property==1.5.1 cachelib==0.9.0 cachetools==4.2.4 cairocffi==1.6.0 CairoSVG==2.7.0 casefy==0.1.7 cattrs==23.1.2 cbor2==5.4.6 celery==5.3.1 celery-redbeat==2.1.1 certifi==2023.7.22 cffi==1.15.1 cfgv==3.3.1 chardet==5.1.0 charset-normalizer==3.2.0 chdb==1.2.1 ciso8601==2.3.0 cli-helpers==2.3.0 click==8.1.5 click-completion==0.5.2 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.3.0 clickhouse-cityhash==1.0.2.4 clickhouse-driver==0.2.6 clickhouse-sqlalchemy @ file:///Users/hdadkhah/workspace/core/libs/clickhouse_sqlalchemy-0.2.4-py3-none-any.whl cloudpickle==2.2.1 colorama==0.4.6 configobj==5.0.8 confluent-kafka==2.3.0 coverage==7.4.1 coverage-badge==1.1.0 crayons==0.4.0 croniter==1.4.1 cryptography==42.0.5 cssselect2==0.7.0 Cython==0.29.34 dacite==1.8.1 dataclasses-avroschema==0.53.1 datadog==0.46.0 datamodel-code-generator==0.25.4 dateparser==1.1.8 ddsketch==2.0.4 ddtrace==2.7.3 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.14 devcycle-python-server-sdk==3.4.0 diff-cover==7.7.0 dill==0.3.6 distlib==0.3.7 distro==1.8.0 dnspython==2.5.0 docopt==0.6.2 doit==0.36.0 ecdsa==0.18.0 email-validator==2.1.0.post1 enum-compat==0.0.3 envier==0.5.1 et-xmlfile==1.1.0 evervault==4.2.0 evervault_attestation_bindings==0.3.3 exceptiongroup==1.1.2 execnet==2.0.2 executing==1.2.0 expiringdict==1.2.2 factory-boy==3.2.1 Faker==17.6.0 fakeredis==1.10.2 fastavro==1.8.1 filelock==3.12.2 filetype==1.2.0 firebase-admin==6.2.0 flaky==3.7.0 Flask==2.2.5 flask-babel==3.1.0 Flask-Caching==2.1.0 Flask-Cors==3.0.10 flask-marshmallow==0.14.0 Flask-Migrate==2.7.0 Flask-Pydantic==0.11.0 flask-restx==1.1.0 Flask-Script==2.0.6 flask-shell-ipython==0.4.1 Flask-SQLAlchemy==3.0.5 fonttools==4.42.0 fp-sdk @ https://assets.ramp.com/packages/ramp-reduced/fp_sdk-1.1.7-py3-none-any.whl freezegun==1.2.2 frozenlist==1.4.0 fsspec==2023.12.2 future==0.18.3 fuzzysearch==0.7.3 genson==1.2.2 geographiclib==2.0 geopy==2.3.0 gevent==22.10.2 gocardless-pro==1.45.0 google-api-core==2.11.1 google-api-python-client==2.94.0 google-auth==2.22.0 google-auth-httplib2==0.1.0 google-cloud-core==2.3.3 google-cloud-documentai==2.14.0 google-cloud-firestore==2.11.1 google-cloud-storage==2.10.0 google-crc32c==1.5.0 google-i18n-address==3.1.0 google-resumable-media==2.5.0 google-search-results==1.8.3 googleapis-common-protos==1.59.1 gprof2dot==2022.7.29 graphviz==0.20.1 greenlet==2.0.2 grpcio==1.56.0 grpcio-status==1.56.0 gunicorn==20.1.0 guru @ https://github.com/guruhq/py-sdk/archive/refs/tags/1.1.0.zip h11==0.14.0 haversine==2.8.0 html5lib==1.1 httpcore==0.16.3 httplib2==0.22.0 httpx==0.23.3 huggingface-hub==0.20.2 hypothesis==6.81.2 icdiff==2.0.6 identify==2.5.24 idna==3.4 import-deps==0.2.0 import-profiler==0.0.3 importlib-metadata==6.8.0 importlib-resources==6.1.1 inflect==5.6.2 Inflector==3.1.0 iniconfig==2.0.0 inscribe==3.1.0 intuit-oauth==1.2.4 ipython==8.14.0 iso3166==2.1.1 iso4217==1.11.20220401 isodate==0.6.1 isort==5.13.2 itsdangerous==2.1.2 jdcal==1.4.1 jedi==0.18.2 Jinja2==3.1.3 jmespath==1.0.1 jsonschema==4.17.3 jwcrypto==1.5.1 kafka-python==2.0.2 kombu==5.3.1 launchdarkly-eventsource==1.1.0 launchdarkly-server-sdk==8.1.4 lazy-object-proxy==1.9.0 ldclient==0.0.1 leb128==1.0.5 lob==4.5.4 lupa==1.14.1 lxml==4.9.3 lz4==4.3.2 Mako==1.2.4 Markdown==3.5.1 MarkupSafe==2.1.3 marshmallow==3.18.0 marshmallow-dataclass==8.5.14 marshmallow-oneofschema==3.0.1 marshmallow-sqlalchemy==1.0.0 matplotlib-inline==0.1.6 mccabe==0.7.0 mmh3cffi==0.2.1 mockcache @ https://github.com/lunant/mockcache/archive/914749d0f0b2a9230ab2c8592abe6a67aae461cf.zip monotonic==1.6 more-itertools==10.2.0 moto==4.1.3 msal==1.23.0 msgpack==1.0.5 msrest==0.7.1 multidict==6.0.4 mypy==1.0.0 mypy-extensions==1.0.0 nameparser==1.1.2 nodeenv==1.8.0 oauthlib==3.2.2 onelogin==1.9.0 onfido-python==2.7.0 openai==1.10.0 openapi-schema-validator==0.2.3 openapi-spec-validator==0.4.0 openfeature-sdk==0.4.1 openpyxl==3.1.2 opensearch-py==2.2.0 opentelemetry-api==1.23.0 orjson==3.9.15 packaging==23.1 paramiko==3.4.0 parso==0.8.3 pathspec==0.12.1 pdf2image==1.16.3 pdfminer.six==20211012 pendulum==2.1.2 pexpect==4.8.0 pgcli==3.5.0 pgspecial==2.1.0 phonenumbers==8.13.16 pickleshare==0.7.5 Pillow==10.0.1 platformdirs==3.9.1 pluggy==1.2.0 ply==3.11 posthog==3.0.2 pprintpp==0.4.0 pre-commit==2.21.0 probableparsing==0.0.1 prompt-toolkit==3.0.39 proto-plus==1.22.3 protobuf==4.23.4 psutil==5.9.5 psycogreen==1.0.2 psycopg==3.1.9 psycopg2==2.9.6 ptyprocess==0.7.0 pudb==2022.1.3 pure-eval==0.2.2 py==1.11.0 pyasn1==0.4.8 pyasn1-modules==0.3.0 pycountry==22.3.5 pycparser==2.18 pycryptodome==3.18.0 pycryptodomex==3.18.0 pycycle==0.0.8 pydantic==2.6.0 pydantic-extra-types==2.5.0 pydantic-settings==2.1.0 pydantic_core==2.16.1 pydeps==1.12.1 pydyf==0.7.0 pyflakes==3.0.1 Pygments==2.15.1 pyheif==0.7.1 PyJWT==2.7.0 pylint==2.17.2 PyNaCl==1.5.0 pyOpenSSL==24.0.0 pyotp==2.8.0 pyparsing==3.1.0 pypdf==3.17.1 PyPDF2==1.25.1 pyphen==0.14.0 pyRFC3339==1.1 pyroute2==0.7.9 pyrsistent==0.19.3 pysftp==0.2.9 pytest==7.4.0 pytest-aiohttp==1.0.5 pytest-asyncio==0.23.5 pytest-cov==4.1.0 pytest-icdiff==0.6 pytest-incremental==0.6.0 pytest-mock==3.11.1 pytest-profiling==1.7.0 pytest-reportlog==0.4.0 pytest-rerunfailures==11.1.2 pytest-testmon==2.0.9 pytest-xdist==3.3.1 python-crfsuite==0.9.9 python-dateutil==2.8.2 python-dotenv==1.0.0 python-gnupg==0.4.9 python-http-client==3.3.7 python-jose==3.3.0 python-json-logger==2.0.7 python-Levenshtein==0.12.2 python-quickbooks==0.9.1 python-schema-registry-client==2.4.3 python3-saml==1.16.0 pytz==2023.3 pytz-deprecation-shim==0.1.0.post0 pytzdata==2020.1 pyxero==0.9.3 PyYAML==6.0.1 pyzipper==0.3.6 rapidfuzz==2.15.1 rauth==0.7.3 redis==4.4.4 regex==2023.6.3 reportlab==4.0.7 requests==2.31.0 requests-cache==0.9.8 requests-file==1.5.1 requests-oauthlib==1.3.1 requests-toolbelt==1.0.0 responses==0.23.1 retrying==1.3.4 rfc3986==1.5.0 rsa==4.9 rule-engine==3.5.1 s3transfer==0.10.0 scaleapi==2.14.4 schwifty==2022.9.0 semver==2.13.0 sendgrid==6.10.0 sentry-sdk==1.28.1 setproctitle==1.3.2 shellingham==1.5.0.post1 signxml==3.2.2 simple-salesforce==1.12.4 simplejson==3.17.4 six==1.16.0 slack-sdk==3.21.3 slackclient==2.9.4 smartystreets-python-sdk==4.12.0 sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve==2.4.1 splitio-client==9.4.0 sqlakeyset==2.0.1695177552 SQLAlchemy==1.4.49 sqlalchemy-citext==1.8.0 SQLAlchemy-Continuum==1.3.14 SQLAlchemy-Utils==0.39.0 sqlglot==19.4.0 sqlparse==0.4.4 stack-data==0.6.2 starkbank-ecdsa==2.2.0 stdlib-list==0.8.0 stringcase==1.2.0 stripe==2.76.0 structlog==22.3.0 structlog-sentry==2.0.3 svix==0.85.1 tabulate==0.9.0 temporalio==1.5.1 tenacity==8.2.2 tiktoken==0.4.0 time-machine==2.13.0 tinycss2==1.2.1 tldextract==3.4.4 tokenizers==0.13.3 toml==0.10.2 tomli==2.0.1 tomlkit==0.11.7 tqdm==4.65.0 traitlets==5.9.0 tuna==0.5.11 twilio==6.50.1 typeguard==3.0.2 types-babel==2.11.0.8 types-cachetools==4.2.4 types-certifi==2021.10.8.0 types-cryptography==3.3.21 types-Deprecated==1.2.9.3 types-factory-boy==0.4.1 types-jsonschema==4.7.0 types-jwt==0.1.3 types-orjson==3.6.2 types-protobuf==4.24.0.20240106 types-psutil==5.9.5.12 types-python-dateutil==2.8.11 types-pytz==2021.3.2 types-PyYAML==6.0.1 types-redis==4.2.6 types-requests==2.28.1 types-six==1.16.10 types-toml==0.10.8.5 types-urllib3==1.26.25.10 types-xmltodict==0.13.0.1 typing-inspect==0.9.0 typing_extensions==4.7.1 tzdata==2023.3 tzlocal==5.0.1 Unidecode==1.3.6 unsync==1.4.0 uritemplate==4.1.1 url-normalize==1.4.3 urllib3==1.26.18 urllib3-secure-extra==0.1.0 urwid==2.1.2 urwid-readline==0.13 usaddress==0.5.10 uuid_utils==0.6.1 validators==0.18.2 veryfi==3.3.0 vine==5.0.0 virtualenv==20.24.0 wasmtime==9.0.0 watchdog==2.3.1 wcwidth==0.2.6 weasyprint==59.0 webargs==8.3.0 webauthn==1.11.1 webencodings==0.5.1 Werkzeug==2.2.3 workflow-runner @ file:///Users/hdadkhah/workspace/core/rust/workflow_runner-0.1.0-cp310-cp310-macosx_11_0_arm64.whl#sha256=26f7f3b49b6166c11b915b3c91465a8f6d79baa0a19aa85951bdd274570f0b5e wrapt==1.15.0 xmlsec==1.3.13 xmltodict==0.13.0 yarl==1.9.2 zeep==4.2.1 zenpy==2.0.25 zipcodes==1.2.0 zipp==3.16.2 zope.event==5.0 zope.interface==6.0 zopfli==0.2.2 zstd==1.5.5.1

How can we reproduce your problem?

Not sure.

What is the result that you get?

Only top level trace
Screenshot 2024-03-18 at 2 17 06 PM

What is the result that you expected?

full trace with db calls, etc

@emmettbutler
Copy link
Collaborator

Thanks @hdadkhah1. Can you provide any additional detail on "It seems to be it does not connect parent_id from traces to the test run spans"? Some example debug logs could be helpful.

cc @romainkomorndatadog

@romainkomorndatadog romainkomorndatadog self-assigned this Mar 19, 2024
@romainkomorndatadog
Copy link
Collaborator

I noticed this yesterday as well while I was playing around trying to get a sense of what it would take to support tracing in agentless mode (for #7989 ). There are a couple of things going on (at least):

  • we sometimes lose the race, when calling ddtrace.patch_all(), because of conftest.py (in the pytest context)
  • even when we do patch in time (and when we've got the agent), the traces get sent but they don't show up

I suspect it's because we end up setting up separate tracers for CI and for "regular" tracing (even though it happens in the context of CI). I thought that wouldn't be a problem because the traces should just appear as distributed traces, but that's not the case.

@hdadkhah1 , not absolutely necessary, but it would help if you happened to have an easy repro case.

@hdadkhah1
Copy link
Author

We are not doing anything special to be honest, I was able to go to dd traces and see my traces without it being linked to any tests, but it never showed up in CI view, so the info was not useful. Not sure if this helps.

@hdadkhah1
Copy link
Author

hdadkhah1 commented Mar 29, 2024

Is there a chance we can get this expedited?

@romainkomorndatadog
Copy link
Collaborator

@hdadkhah1 , yes, this is one of the things I'm planning on tackling over the next few weeks.

@romainkomorndatadog
Copy link
Collaborator

Somewhat annoyingly, I didn't have the historical context to understand that this was actually a regression, and used to work, so the fix may actually be as trivial as #9016 .

@hdadkhah1 , any chance I could get you to give that change a go with ddtrace installed by

pip install "ddtrace @ git+https://github.com/DataDog/dd-trace-py.git@79366300167b0a00886ee60b8db5839df23ef864"

?

Looks to work nicely with Flask, when using --ddtrace-patch-all:
image

@hdadkhah1
Copy link
Author

hdadkhah1 commented Apr 18, 2024

it worked!!! but something else seems to fail and it seems slower by few mins, is it expected?

..[] [2024-04-18 18:07:24,142] ERROR in writer: failed to encode trace with encoder <ddtrace.internal.ci_visibility.encoder.CIVisibilityEncoderV01 object at 0x78650a0dd530> Traceback (most recent call last): File "ddtrace/internal/_encoding.pyx", line 1008, in ddtrace.internal._encoding.Packer._pack OverflowError: Python int too large to convert to C unsigned long During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/ddtrace/internal/writer/writer.py", line 382, in _flush_queue_with_client encoded = client.encoder.encode() File "/usr/local/lib/python3.10/site-packages/ddtrace/internal/ci_visibility/encoder.py", line 65, in encode payload = self._build_payload(self.buffer) File "/usr/local/lib/python3.10/site-packages/ddtrace/internal/ci_visibility/encoder.py", line 73, in _build_payload return CIVisibilityEncoderV01._pack_payload( File "/usr/local/lib/python3.10/site-packages/ddtrace/internal/ci_visibility/encoder.py", line 79, in _pack_payload return msgpack_packb(payload) File "ddtrace/internal/_encoding.pyx", line 1107, in ddtrace.internal._encoding.packb File "ddtrace/internal/_encoding.pyx", line 1088, in ddtrace.internal._encoding.Packer.pack File "ddtrace/internal/_encoding.pyx", line 1084, in ddtrace.internal._encoding.Packer.pack File "ddtrace/internal/_encoding.pyx", line 1059, in ddtrace.internal._encoding.Packer._pack File "ddtrace/internal/_encoding.pyx", line 1069, in ddtrace.internal._encoding.Packer._pack File "ddtrace/internal/_encoding.pyx", line 1059, in ddtrace.internal._encoding.Packer._pack File "ddtrace/internal/_encoding.pyx", line 1059, in ddtrace.internal._encoding.Packer._pack File "ddtrace/internal/_encoding.pyx", line 1019, in ddtrace.internal._encoding.Packer._pack OverflowError: Integer value out of range ...................................................................... [ 10%]

Thanks so much!

@romainkomorndatadog
Copy link
Collaborator

romainkomorndatadog commented Apr 19, 2024

The "integer out of range" issue is definitely not expected, I'll take a look at that.

For performance/slowness, it's more complicated. I'll list my questions first, so they're easy to find. Answers would help for context (with whatever precision you feel comfortable with on a public issue).

  1. how much of an overhead does "slower by a few minutes" represent, roughly? Is it closer to say, 10%, or 100%?
  2. is your code using a lot of threads under test load? I wonder if our writer service doing more work becomes a bigger problem as threading scales up.
  3. how large is your test suite, in number of tests?
  4. harder to measure, but eyeballing would be good enough: how many more spans do you think your test suite is creating with --ddtrace-patch-all ? eg: if you look at an average test's trace, how many extra spans are you seeing?

The amount of overhead depends on how many extra traces are generated by --ddtrace-patch-all, so it's a bit hard to predict. I usually test with Flask or Starlette , but they're small repos that run quickly, which means it's easy for the perf impact to get lost in as noise (I need to find a bigger repo for testing), but based on my testing so far:

If you're already using --ddtrace-patch-all, around 10% of overhead is expected (as a result of more traces having to be encoded and sent out).

If you're not already using --ddtrace-patch-all, the slowdown would be on the order of 50%, since there's noticeably more instrumentation and trace collection happening.

With Flask, I'm seeing averages of:

  • on dd-trace-py's current main branch:
    • without --ddtrace-patch-all: 1.17s per session
    • with --ddtrace-patch-all: 1.55s per session (32% slower)
    • using ddtrace-run pytest: 1.54s per session (same as with --ddtrace-patch-all)
  • on the PR's branch:
    • without --ddtrace-patch-all: 1.17s per session (not slower than main makes sense since we're not recording any extra traces)
    • with --ddtrace-patch-all: 1.68s per session (43% slower than without --ddtrace-patch-all, 8% slower than main with --ddtrace-patch-all
    • using ddtrace-run pytest : 1.67s per session (same as with --ddtrace-patch-all)

The fact that pytest --ddtrace-patch-all and ddtrace-run pytest (without --ddtrace-patch-all) take the same amount of time makes sense and confirms the overhead's from the combination of our patched modules and trace submissions.

@romainkomorndatadog
Copy link
Collaborator

@hdadkhah1 is the encoding exception happening consistently?

@romainkomorndatadog
Copy link
Collaborator

Looking a bit more into the OverflowError: Integer value out of range error, it happens during encoding which means that we've got some JSONified int that should actually be a string.

The various ints we use as ids for spans etc are already stringified, and calling Span.set_tag() with an int larger than 2**53 converts the value into a string anyway, so, in theory, our spans shouldn't be the source of this error.

Our payload's metadata also shouldn't be to blame since it looks like {'*': {'language': 'python', 'env': 'itr-off', 'runtime-id': '87eadf1ab7c643cda918d30e71f5fbca', 'library_version': '2.9.0.dev85+g793663001.d20240419'}} .

The only way I was able to reproduce this was by manually setting values in our spans' _meta attributes, but that's something no one is supposed to do.

@hdadkhah1 , I put up a new branch with a small change that looks for ints greater than 2**64 and logs errors whenever our packb() attempt fails. Installable with:

pip install "ddtrace @ git+https://github.com/DataDog/dd-trace-py.git@d76a5af7e5cbcb5da5c733cd453bdf4741af1062"

It should log something like:

Found bad int: metadata.*.breaking_nested_dict.baddict.badint: 36893488147419103232
Found bad int: metadata.*.breaking_nested_dict.badlist.[1]: 36893488147419103232
Found bad int: metadata.*.breaking_nested_dict.bad_nested_list.[0].[1]: 36893488147419103232
Found bad int: metadata.*.breaking_nested_dict.bad_nested_list.[1].[0].1.[0]: 36893488147419103232
Found bad int: metadata.*.breaking_nested_dict.bad_nested_list.[2].[1]: 36893488147419103232
Found bad int: metadata.*.breaking_nested_dict.bad_nested_list.[2].[3].1.[0]: 36893488147419103232

If you were able to test it, that'd be fantastic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants