Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Constant HTTP 500 with preloading enabled on version 0.48.3 #1042

Closed
Majkl578 opened this issue Oct 8, 2020 · 30 comments
Closed

[BUG] Constant HTTP 500 with preloading enabled on version 0.48.3 #1042

Majkl578 opened this issue Oct 8, 2020 · 30 comments
Labels
🐛 bug Something isn't working c-extension Apply this label to issues and prs related to the C-extension cat:app-crash sev-1

Comments

@Majkl578
Copy link

Majkl578 commented Oct 8, 2020

Bug description

We're seeing obscure but constant HTTP 500 with PHP preloading enabled and Datadog extension loaded. Once the extension is removed (rm /etc/php/7.4/fpm/conf.d/98-ddtrace.ini /etc/php/7.4/fpm/conf.d/99-ddtrace-custom.ini) everything works fine.

PHP version

7.4.11 NTS

Installed extensions

[PHP Modules]
amqp
bcmath
blackfire
calendar
Core
ctype
curl
date
ddtrace
dom
exif
FFI
fileinfo
filter
ftp
gettext
hash
iconv
igbinary
imap
intl
json
libxml
mbstring
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
Phar
posix
readline
redis
Reflection
session
shmop
SimpleXML
sockets
sodium
SPL
standard
sysvmsg
sysvsem
sysvshm
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zlib

[Zend Modules]
Zend OPcache
blackfire
ddtrace

OS info

Debian slim (Docker)

Diagnostics and configuration

Output of phpinfo() (ddtrace >= 0.47.0)

Output from php -i (same configuration as FPM):

ddtrace

Datadog PHP tracer extension
For help, check out the documentation at https://docs.datadoghq.com/tracing/languages/php/
(c) Datadog 2020

Datadog tracing support => enabled
Version => 0.48.3
DATADOG TRACER CONFIGURATION => {"date":"2020-10-08T21:11:35Z","os_name":"Linux 1fccd4138f01 5.9.0-rc6-amd64 #1 SMP Debian 5.9~rc6-1~exp1 (2020-09-21) x86_64","os_version":"5.9.0-rc6-amd64","version":"0.48.3","lang":"php","lang_version":"7.4.2","env":null,"enabled":true,"
service":null,"enabled_cli":false,"agent_url":"http://localhost:8126","debug":false,"analytics_enabled":false,"sample_rate":1.000000,"sampling_rules":null,"tags":null,"service_mapping":null,"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"dd_version":n
ull,"architecture":"x86_64","sapi":"cli","ddtrace.request_init_hook":"/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":fa
lse,"generate_root_span":true,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":false,"integrations_disabled":null,"enabled_from_env":true,"opcache.file_cach
e":"/app/tracking/var/opcache","agent_error":"Failed to connect to localhost port 8126: Connection refused","opcache_file_cache_set":"The opcache.file_cache INI setting is set. This setting can cause unexpected behavior with the PHP tracer due to a bug in OPcache: https:/
/bugs.php.net/bug.php?id=79825"}

                               Diagnostics                               
agent_error => Failed to connect to localhost port 8126: Connection refused
opcache_file_cache_set => The opcache.file_cache INI setting is set. This setting can cause unexpected behavior with the PHP tracer due to a bug in OPcache: https://bugs.php.net/bug.php?id=79825
Diagnostic checks => failed

Directive => Local Value => Master Value
ddtrace.disable => Off => Off
ddtrace.request_init_hook => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php
@Majkl578 Majkl578 added the 🐛 bug Something isn't working label Oct 8, 2020
@labbati
Copy link
Member

labbati commented Oct 9, 2020

Hi @Majkl578, thanks for reporting this and I am sorry you experienced this issue. Looking into it right now.

I have a few questions, though, in case I'll not be able to recreate it immediately:

  • are you using any specific frameworks?
  • what your loading file looks like, e.g. preload.php? Which files you preload?
  • which approach are you following? require(_once) vs include vs require_once?
  • [Added] is this happening only on 0.48.3 that you mentioning in the title and was working before? Was it working on any version before?

@labbati labbati added the c-extension Apply this label to issues and prs related to the C-extension label Oct 9, 2020
@Majkl578
Copy link
Author

Majkl578 commented Oct 9, 2020

are you using any specific frameworks?

The most imporant bits are IMO Symfony 4.4.15, Doctrine ORM 2.7, Doctrine DBAL 2.11, Guzzle 6.5, Sentry 2.5.

Our app on top of Symfony 5.1 seems to be working.

what your loading file looks like, e.g. preload.php? Which files you preload?

which approach are you following? require(_once) vs include vs require_once?

We use a combo of opcache_preload_file() (for a few PHP files that have side effects) and require_once for the rest. First we load generated DIc using require_once (var/cache/*/*Container.php``) and its preload file (var/cache/*/*Container.preload.php), then we opcache_compile_file() index.phpandbootstrap.php, then we use require_onceover everything insrc/`.

is this happening only on 0.48.3 that you mentioning in the title and was working before? Was it working on any version before?

This was happening with 0.45.2 too.

I also tried turning verbose opcache logging on and found nothing useful either. 😕

@Majkl578
Copy link
Author

Majkl578 commented Oct 9, 2020

@labbati Are there any debug builds available that I could try if they produce anything? I can probably also try to compile PHP with debug mode and assertions enabled (we use PHP from deb.sury.org).

@labbati
Copy link
Member

labbati commented Oct 9, 2020

@Majkl578 thanks for the info.

Are there any debug builds available that I could try if they produce anything?

  • if you hit a segfault we distributed .so with symbols (of ddtrace). You find them in /opt/datadog-php/extensions/ with the -debug suffix. In your case you should use extension=/opt/datadog-php/extensions/ddtrace-20190902.so. Note, though, that it only add symbols (-g option), it would not work with php in debug mode (--with-debug).
  • in case you believe the error happens during the actual request, you can also set env[DD_TRACE_DEBUG]=true in your www.conf and you will see more informative output in your error log.

I have a question for you as well. If you are on a testing machine, can you confirm that removing the opcache.preload setting everything works with the tracer installed?

Also, so far I was not able to reproduce. Will keep you posted.

@labbati
Copy link
Member

labbati commented Oct 9, 2020

Also, I have a favor to ask that might help us moving in the right direction: again if and only if you are able to reproduce in any environment other than production.

Attached there is the setup I am using. Inside the zip there is a app directory that is the web root. Would it be possible for you to deploy it to your web root and try to see if the error shows up? If it does show up than it means it is a configuration/env difference, otherwise it's the app.

gh-1042-apmphp-453-php74-preloading.zip

@Majkl578
Copy link
Author

Majkl578 commented Oct 9, 2020

can you confirm that removing the opcache.preload setting everything works with the tracer installed?

Yes when I remove opcache.preload directive everything works fine. That's actually how production is running right now as we speak. :)

This is what I got with DD_TRACE_DEBUG=1 enabled. Missing agent localy doesn't seem to be an issue though the other app works fine in the same setup:

NOTICE: PHP message: DATADOG TRACER DIAGNOSTICS - agent_error: Failed to connect to localhost port 8126: Connection refused NOTICE: PHP message: DATADOG TRACER DIAGNOSTICS - opcache_file_cache_set: The opcache.file_cache INI setting is set. This setting can cause unexpected behavior with the PHP tracer due to a bug in OPcache: https://bugs.php.net/bug.php?id=79825 NOTICE: PHP message: DATADOG TRACER CONFIGURATION - {"agent_error":"Failed to connect to localhost port 8126: Connection refused","opcache_file_cache_set":"The opcache.file_cache INI setting is set. This setting can cause unexpected behavior with the PHP tracer due to a bug in OPcache: https://bugs.php.net/bug.php?id=79825","date":"2020-10-09T11:42:19Z","os_name":"Linux 3a0d39c9918d 5.9.0-rc8-amd64 #1 SMP Debian 5.9~rc8-1~exp1 (2020-10-05) x86_64","os_version":"5.9.0-rc8-amd64","version":"0.48.3","lang":"php","lang_version":"7.4.2","env":null,"enabled":true,"service":null,"enabled_cli":false,"agent_url":"http://localhost:8126","debug":true,"analytics_enabled":false,"sample_rate":1.000000,"sampling_rules":null,"tags":null,"service_mapping":null,"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"dd_version":null,"architecture":"x86_64","sapi":"fpm-fcgi","ddtrace.request_init_hook":"/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":false,"generate_root_span":true,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":false,"integrations_disabled":null,"enabled_from_env":true,"opcache.file_cache":"/app/xxx/var/opcache"} NOTICE: PHP message: For additional diagnostic checks such as Agent connectivity, see the 'ddtrace' section of a phpinfo() page. Alternatively set DD_TRACE_DEBUG=1 to add diagnostic checks to the error logs on the first request of a new PHP process. Set DD_TRACE_STARTUP_LOGS=0 to disable this tracer configuration message.

Also no luck after disabling opcache.file_cache.

Attached there is the setup I am using. Inside the zip there is a app directory that is the web root. Would it be possible for you to deploy it to your web root and try to see if the error shows up?

I tried locally, docker-compose up, with valid API key, then curl http://localhost:8000 worked. I'll try to fiddle with app dependencies and see if it's possibly an integration issue. Our apps normally run without an agent to connect to locally.

@labbati
Copy link
Member

labbati commented Oct 13, 2020

Hi @Majkl578 just a quick update to let you know that we were not able to recreate this so far and we are still looking into it.

I'll try to fiddle with app dependencies and see if it's possibly an integration issue. Our apps normally run without an agent to connect to locally.

Please let me know if you have updates on this side.

@labbati
Copy link
Member

labbati commented Oct 20, 2020

Just an update on this. We tried hard to replicate the issue, both on symfony apps and a app that we use to perform some fuzz testing and we were not able to replicate, in some cases after hours under heavy load.

Any chance that you @Majkl578 (or anyone else that upvoted the issue) can provide a sample app or snippet (and at this point I assume also method of installation for PHP) that recreates the issue?

@labbati
Copy link
Member

labbati commented Oct 22, 2020

Hello @Majkl578, we are still not able to recreate this issue. I am open to join on a pair programming session with you, if you are available, so I can look at your configuration/application and I can try to see if anything stands out that we can use.

If you are open to that, please reach out on our public slack datadoghq.slack.com. My handle is @luca.abbati.

@labbati
Copy link
Member

labbati commented Oct 27, 2020

Hi @Majkl578, not sure if you had a chance to read my last message. Just checking in to see if we can meet somehow to investigate the issue more.

@labbati
Copy link
Member

labbati commented Nov 9, 2020

Hi @Majkl578. Unfortunately this issue went stale and we were not able to recreate your problem. My offer to schedule a pair programming session when we can start disabling features/integrations and enabling them one by one so we can at least isolate the offending is still valid. Otherwise I don't see many options we can attempt.

@Majkl578
Copy link
Author

Majkl578 commented Jan 4, 2021

Hi @labbati. Sorry for no response recently, I was quite busy before christmas and didn't have time for anything here.
I got back to this today and tried a few other things. So far the issue is still present with ddtrace 0.53.0. All requests end up with HTTP 500 when OPcache preloading is enabled and DD_TRACE_ENABLED=1. Note that it's HTTP 500, not 502 - it's not a segfault or similar and no coredump is generated.

Interesting obesrvation: Setting DD_TRACE_ENABLED=0 and keeping ddtrace extension loaded doesn't lead to errors.

I tried disabling integrations and keeping the tracer itself enabled, unfortunately did not help:

DD_TRACE_ENABLED=1
DD_TRACE_DEBUG=1
DD_TRACE_SYMFONY_ENABLED=0
DD_TRACE_CURL_ENABLED=0
DD_TRACE_GUZZLE_ENABLED=0
DD_TRACE_PDO_ENABLED=0
DD_TRACE_PREDIS_ENABLED=0

I'll ping you on Slack and we can brainstorm how to debug this. Thank you! 👍

@fhferreira
Copy link

+1

@fhferreira
Copy link

image

same behavior here with opcache

@labbati
Copy link
Member

labbati commented Jun 10, 2021

Hello @fhferreira, we tried with several combinations but we were not able previously to recreate this error. Are you able to share a minimal reproduction case? It must be a non obvious combination of preload + various php/libs versions + configurations.

@fhferreira
Copy link

Hello @fhferreira, we tried with several combinations but we were not able previously to recreate this error. Are you able to share a minimal reproduction case? It must be a non obvious combination of preload + various php/libs versions + configurations.

Hi @labbati after many tries we got it to works, it is inside a Bitbucket pipeline and when we start the docker containers they didn't get the real AWS IP to go through the agent port :8126 and connect the agent.

Using it inside the Dockerfile

COPY ./scripts/entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh
ENTRYPOINT [""]
CMD ["/bin/bash", "/usr/local/bin/entrypoint.sh"]

./scripts/entrypoint.sh

 #!/bin/sh
echo 'Starting entrypoint.sh' \
       && export DD_AGENT_HOST=$(curl --silent http://169.254.169.254/latest/meta-data/local-ipv4) \
       && echo env[DD_AGENT_HOST]= \'$DD_AGENT_HOST\' >> /etc/php/7.4/fpm/pool.d/www.conf \
       && echo $DD_AGENT_HOST \
       && echo 'Ending entrypoint.sh' \
       && cat /etc/php/7.4/fpm/pool.d/www.conf | grep 'env' \
       && /usr/bin/supervisord -n -c /etc/supervisor/conf.d/supervisord.conf

It works

@labbati
Copy link
Member

labbati commented Jun 10, 2021

@fhferreira I think that your problem was with the datadog agent not being reachable, rather than opcache.preload causing your application to crash. In case I am missing something please let me know.

Good you got it to work, though

@fhferreira
Copy link

@labbati it appears to work with opcache preload here

@labbati
Copy link
Member

labbati commented Jun 15, 2021

I am closing this issue since without a core dump and/or a minimal reproduction case we are not able to proceed with investigation. In case we are able to get a core dump or need assistance to configure your system to generate one please reach our to our support team https://www.datadoghq.com/support/ or feel free to slack me as you were suggesting #1042 (comment)

@labbati labbati closed this as completed Jun 15, 2021
@olsavmic
Copy link

olsavmic commented Aug 7, 2021

Hi @labbati, I was debugging the same codebase (as the author of this thread) after some time once again with preloading enabled (and ddtrace 0.62.1) and although I was not able to extract any specific error logs, I found out that the issue happens only when any of the DDTrace classes (DDTrace\GlobalTracer, DDTrace\NoopTracer) is listed in the preload.php file.

We have Tracer registered as a service in Symfony DI container with GlobalTracer::get as a factory. Symfony automatically adds this file to the preload.php and during preload, all types discovered as method parameters in those classes are preloaded as well (symfony/dependency-injection -- Dumper/Preloader.php::doPreload).

Everything seems working fine if DDTrace namespace is used only inside method bodies (eg retriving tracer via GlobalTracer::get()).

Do you have any idea what might be happening? We'll get rid of the service definition in favour of the static calls GlobalTracer::get everywhere but I'd still like to know what is the root cause.

@olsavmic
Copy link

olsavmic commented Aug 8, 2021

From what I see in the bridge/_generated.php file, it looks like the cause is a collision in declared class names - trying to declare the same class twice - once during preload using specific files given by composer autoloader, second time by require_once _generated.php (all DDTrace classes/functions/symbols are copied over there using ClassPreloader. It worked fine without preloading due to the fact that all DDTrace symbols were known so the project autoloader never tried to import them.

I don't see much into the PHP module internals though, can you validate it please? @labbati

If that's the case, I can see two options:

  1. Adding class_exists to the _generated.php file for each entry

  2. Creating a PR in Symfony repository to explicitly forbid some classes from preloading as [DI] add tags container.preload/.no_preload to declare extra classes to preload/services to not preload symfony/symfony#36195 does not help in this case.

@labbati
Copy link
Member

labbati commented Aug 16, 2021

Thank you @olsavmic for the additional information. I am reopening the issue since this is now actionable. We will look into this as soon as possible.

@labbati labbati reopened this Aug 16, 2021
@labbati
Copy link
Member

labbati commented Aug 19, 2021

Just a quick heads up. The issue is clear now and I was able to recreate. While in the long term we are working to remove the root cause of this (all the PHP code in _generated.php will go away for good), in the short term it requires a fix.

Problem

The problem was hard to find because it is caused by the combination of 3 factors and your message #1042 (comment) hints in the right direction:

  1. preload;
  2. you do manual instrumentation, hence you require "datadog/dd-trace" in composer;
  3. you need DDTrace\* classes from composer at preload time, hence you load the composer autoloader in the preload script.

When our module is loaded, we hard load _generated.php since we assume that we run before any require __DIR__ . '/vendor/autoload.php';. This is not true in case of opcache preload that REQUIRES composer autoloader.

Long term

This issue will go entirely away once we will be able to get rid of _generated.php, but this will require months.

Short term

Adding class_exists to the _generated.php file for each entry

Loading each class separately clashes with the original reason why we moved to _generated.php in the first place, which is lowering latency overhead due to quite some additional I/O operations, that can be relatively more than noticeable in simple scripts (not using some heavy framework like Symfony or Laravel) or when users do not use opcache. We have to be compatible with as many different setups as possible.

Creating a PR in Symfony repository to explicitly forbid some classes from preloading

This is an interesting idea, but I'd rather prefer to solve the root cause, so if another framework decides to follow a similar approach then users don't see their apps failing in production.

One possibility I see, but I want to discuss it internally with the team and get their feedback/preference before I commit to any solution, is to divide _generated.php into _generated_api.php and _generated_internal.php. The first one would contain only classes and interfaces provided by composer from src/api, we do one single class_exist() check only on one representative class from src/api before deciding to load them. The second one (classes not provided by composer) will always be loaded by our module, no matter what.

There might be other options, though, that I am not thinking about. We will get back to you soon.

@olsavmic
Copy link

@labbati Thanks a lot!

@labbati
Copy link
Member

labbati commented Aug 23, 2021

@olsavmic we are considering all possible use cases and I have one final question.

Some context first, there are two topics here: (1) our tracing library MUST NEVER break your application even if manual tracing does not work, and (2) you should be able to do manual tracing in any scenario.

I think we have a good solution for (1) in preloaded scenarios. About (2):

  • would it be possible for you to use Symfony's lazy initialization?
  • other than retrieving GloballTracer::get() and registering it in the container, are you doing anything else with our classes during preload?

@olsavmic
Copy link

Thanks @labbati.

  • Lazy initialization would be not a problem (I already tried this approach during debugging this issue, it did not help with preloading but did not break anything)
  • No, preloading is just about including those classes registered in our container, nothing else is being done.

@Majkl578
Copy link
Author

Thank you @olsavmic for picking this up where I stopped and @labbati for working on the issue. 🚀

@labbati
Copy link
Member

labbati commented Sep 3, 2021

0.64.0 that fixes this issue. Please let us know if you still experience problems and I will reopen it. Thanks for the patience and the excellent investigation.

@labbati labbati closed this as completed Sep 3, 2021
@phroggyy
Copy link

@labbati I'm currently experiencing an issue that seems very similar to this. When deployed with opcache & preload disabled, everything works and traces end up in DD. Enable it, and the entire FPM process fails to start.

I'm just about to start debugging this but thought it worth sharing already.

@phroggyy
Copy link

It seems that this particular line is what's causing issues for us:

require_once __DIR__ . '/vendor/autoload.php';

I'd guess that it's because of how the DD extension wraps the autoloader. The reason we require the autoloader in our preload script as that we rely on autoloading to load a full dependency tree, to load as much of our application as possible, but just providing the top-level dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working c-extension Apply this label to issues and prs related to the C-extension cat:app-crash sev-1
Projects
None yet
Development

No branches or pull requests

5 participants