Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP-FPM: 8.2 random lockups of daemon #12449

Open
Elkropac opened this issue Oct 16, 2023 · 21 comments
Open

PHP-FPM: 8.2 random lockups of daemon #12449

Elkropac opened this issue Oct 16, 2023 · 21 comments
Assignees

Comments

@Elkropac
Copy link

Elkropac commented Oct 16, 2023

Description

Hi,
we have long standing problem with php-fpm 8.2 which forces us to stay on php8.1.
We experience random lockups of php-fpm daemon, which is caused by all child processes staying in D or R state and not finishing.

Our production machine was running Debian 11, which we updated to Debian 12 recently. It's amd64 architecture, running VM in vmware esxi 6.7 on Dell Poweredge servers. We use php packages from deb.sury.org repository.

Our test machine is Debian 11 with same settings, running on same server.

We are testing it on php 8.2.11 , but when we tried to switch to php8.2 initialy, it was version 8.2.3 , i believe.

I can reproduce it by running concurent load on 3 php-fpm pools under single master process:

  • user1 (user1) - i run ab -c 2 -t 1800 url1 , url1 has phpinfo() function in index.php
  • user2 (user2) - i run ab -c 2 -t 1800 url2 , url2 has phpinfo() function in index.php
  • user3 (user3) - customer provided application written in Nette. It's used to upload image galleries. You select multiple images on your disk and it uploads them using separate ajax requests per image, resizes them using GD library and moves the images to final destionation on disk (i select around 20 images for upload)

When i start to upload the images while ab processes are hitting the other users, our php-fpm locks up, here is process list from one of the test runs:

root     2605722  0.2  0.1 125060  6540 ?        Ds   11:32   0:11 php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)
www-data 2605723  0.0  0.1 125320  5896 ?        S    11:32   0:00  \_ php-fpm: pool www
www-data 2605724  0.0  0.1 125320  5896 ?        S    11:32   0:00  \_ php-fpm: pool www
user3+   2605796  0.2  3.8 272720 153200 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605797  0.2  3.8 272720 153988 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605798  0.2  0.0 270648  3436 ?        D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605799  0.2  3.5 270648 144800 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605800  0.2  3.6 270648 147576 ?       D    11:32   0:14  \_ php-fpm: pool user3
user3+   2605801  0.2  3.4 270648 139888 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605802  0.2  1.3 270648 55772 ?        D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605807  0.0  0.5 270648 23072 ?        S    11:32   0:01  \_ php-fpm: pool user3
user3+   2605810  0.2  0.0 138032  3440 ?        D    11:32   0:12  \_ php-fpm: pool user3
user3+   2605814  0.2  0.1 140920  5064 ?        D    11:32   0:12  \_ php-fpm: pool user3
user3+   2605819  0.2  3.7 270672 151336 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605822  0.2  0.0 268600  3864 ?        D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605835  0.2  0.4 268600 18640 ?        D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605841  0.2  3.4 268600 140204 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605853  0.2  3.6 268600 147264 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605861  0.2  3.6 268604 146504 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605868  0.2  3.7 268600 149392 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605871  0.0  0.4 138032 18000 ?        S    11:32   0:00  \_ php-fpm: pool user3
user3+   2605872  0.2  3.7 268600 150344 ?       D    11:32   0:13  \_ php-fpm: pool user3
user3+   2605875  0.2  0.0 133800  3628 ?        D    11:32   0:12  \_ php-fpm: pool user3
user3+   2605876  0.2  0.1 133800  4500 ?        D    11:32   0:12  \_ php-fpm: pool user3
user3+   2606291  0.3  7.0 402592 283984 ?       D    11:34   0:16  \_ php-fpm: pool user3
user3+   2606292  0.2  3.6 264536 146268 ?       D    11:34   0:10  \_ php-fpm: pool user3
user3+   2606891  0.2  0.1 125352  7100 ?        D    11:35   0:11  \_ php-fpm: pool user3
user1     2607258  0.2  0.1 125352  6380 ?        D    11:37   0:11  \_ php-fpm: pool user1
user2     2607261  0.2  0.1 125352  6696 ?        D    11:37   0:11  \_ php-fpm: pool user2
user2     2607270  0.2  0.1 125352  7340 ?        D    11:37   0:11  \_ php-fpm: pool user2
user2     2607287  0.2  0.1 125352  6092 ?        D    11:37   0:11  \_ php-fpm: pool user2
user1     2607292  0.2  0.1 125352  6124 ?        D    11:37   0:11  \_ php-fpm: pool user1
user2     2607294  0.2  0.1 125352  7336 ?        D    11:37   0:11  \_ php-fpm: pool user2
root     2625624  0.2  0.2 125252 10548 ?        D    12:43   0:02  \_ php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)

You can see, there is some new child in the end unable to switch to it's final user.
When opcache is enabled , i can see this locks

lslocks  | grep php
php-fpm8.2      2649780 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650194 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650192 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650212 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650213 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2649782 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650238 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650267 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)
php-fpm8.2      2650195 POSIX      READ   0     1   1 /tmp/.ZendSem.Rpie5w (deleted)

These locks are not there with opcache disabled.

When i try to run whole php-fpm inside strace, running the upload does not block the whole php-fpm, but child processes of user user3 slowly die out and no new processes are spawn.

When we use php8.1, with the same settings as in php8.2, we cannot lock it, it runs ok.

I tried to

No success with any of that, lock still occurs.

Our php ini modification is

[PHP]
short_open_tag = On
max_input_vars = 3000
max_input_time = 180
memory_limit = 256M
error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT & ~E_NOTICE
post_max_size = 256M
upload_max_filesize = 256M

[mail function]
mail.add_x_header = On

[Session]
;session.use_strict_mode = 0
session.use_cookies = 1
;session.cookie_secure = 1
session.use_only_cookies = 1
session.cookie_httponly = 1

[opcache]
opcache.memory_consumption=512
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=10000
;opcache.revalidate_freq=2

our fpm pool config is the same for all users (only socket path changes)

[user1]
user = user1
group = user1
listen = /var/run/php8.2-fpm-user1.sock
listen.owner = www-data
listen.group = www-data

pm = ondemand
pm.max_children = 30
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
pm.process_idle_timeout = 60s;
pm.max_requests = 100
pm.status_path = /status

ping.path = /ping
chdir = /
security.limit_extensions = .php .phtml

I can add more information when asked.
Thanks

PHP Version

PHP 8.2.11

Operating System

Debian 11

@Elkropac
Copy link
Author

Elkropac commented Oct 17, 2023

Customer prepared minimal application in nette to reproduce it (thanks @kravcik)

I have attached it in zip.

Directories user1, user2 and user3 contains php code for three separate php-fpm users (pools).

You need to replace image.jpg with some real file, our test photo is around 7MBs in size.

Script test starts apache benchmark on first two urls and then it tries to upload file in 16 concurent curl processes) for url3

@Elkropac
Copy link
Author

php-test.zip

@Elkropac
Copy link
Author

Hi, our test machine has updated to php 8.2.12 and i'm unable to reproduce the problem anymore. Was there some change? I don't think i see anything related in changelog

@bukka
Copy link
Member

bukka commented Nov 3, 2023

I have been looking into this and don't see anything suspicious in the config or app code.

I'm not exactly sure what could get process to D (Uninterruptible sleep) in FPM. It's usually caused by device waiting on IO from what I gather.

I have been checking the diff between 8.2.11 and 8.2.12 and I don't see anything that could have any impact on this. I'm wondering if you maybe updated other system packages as well that might be causing this?

Are you able to recreate this locally with 8.2.11 ideally with vanilla PHP build (compile PHP) or only on this machine / system with Debian packages? Also is there anything filesystem specific used (e.g. NFS) or some device that could affect this? It would be also good to provide more info about web server used (configuration) and system if you can't recreate it locally.

@Elkropac
Copy link
Author

Elkropac commented Nov 8, 2023

Hi,
we are not exactly sure what causes this problem either.

We switched our production to php8.2 and problem occured again. I will have to be more creative/aggressive on our test server I think.

Filesystem is btrfs on production, ext4 on test server. Apache is configured with mpm-event and proxy_fcgi to php. Both VMs are running on Dell servers inside vmware. Storage is Dell ME5024 over SAS, all-flash storage.

It always locks only the php8.2-fpm , all other processes (like webs running under php8.1, behind the same apache web server) are not affected.

@bukka
Copy link
Member

bukka commented Nov 8, 2023

When you say We switched our production to php8.2 does that mean that only PHP changed or it's the whole new image with different kernel?

It's possible that some change in PHP might have caused this but it might be just some other OS / kernel issue that it hit. Unfortunately it's hard to debug this because D means that process is in kernel space waiting on some IO so it is not possible to attach debugger to it. I found some hints here that might help with debugging: https://unix.stackexchange.com/questions/303613/find-the-cause-of-a-permanently-blocked-i-o-process-in-uninterruptible-sleep . So maybe try that.

@Elkropac
Copy link
Author

Elkropac commented Nov 8, 2023

No images, it's ordinary web server with apache2 daemon and some php-fpm daemons (each having multiple pools, with unix socket, per user).

When i say, that we switchted production, we swtiched fcgi in apache for example from

        <Directory /path/user1/web/www>
                <FilesMatch \.php$>
                        SetHandler "proxy:unix:///run/php8.1-fpm-user1.sock|fcgi://localhost"
                </FilesMatch>
        </Directory>

to

        <Directory /path/user1/web/www>
                <FilesMatch \.php$>
                        SetHandler "proxy:unix:///run/php8.2-fpm-user1.sock|fcgi://localhost"
                </FilesMatch>
        </Directory>

Thanks for the link. I will try to figure out something. I noticed, it's difficult to debug with strace. When i start whole php-fpm under strace, it does not lock, but the children crash instead

@Elkropac
Copy link
Author

Elkropac commented Nov 8, 2023

I captured this on our production today, just before restarting php-fpm

First capture - the last process is new child?

root     2931215  2.0  0.2 972808 37724 ?        Rs   10:55   0:13 php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)
user1    2931217  1.1  0.5 1050248 106292 ?      R    10:55   0:07  \_ php-fpm: pool user1   
user1    2931219  1.2  0.6 1050012 111688 ?      R    10:55   0:08  \_ php-fpm: pool user1   
user2    2931677  1.1  0.3 974164 57664 ?        D    10:56   0:07  \_ php-fpm: pool user2   
user2    2931783  1.0  0.3 974180 58076 ?        R    10:56   0:06  \_ php-fpm: pool user2   
user3    2931790  0.1  0.4 979976 76536 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2931813  0.2  0.4 1194416 85400 ?       S    10:56   0:01  \_ php-fpm: pool user3   
user3    2931837  0.1  0.4 980020 81396 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user1    2931951  1.9  0.5 978088 96260 ?        R    10:56   0:12  \_ php-fpm: pool user1   
user1    2931959  1.1  0.5 1050036 104908 ?      R    10:56   0:07  \_ php-fpm: pool user1   
user1    2931978  1.1  0.5 1052028 99620 ?       D    10:56   0:07  \_ php-fpm: pool user1   
user1    2931979  1.1  0.5 1052072 94460 ?       R    10:56   0:07  \_ php-fpm: pool user1   
user1    2931993  1.1  0.6 1050444 114696 ?      R    10:56   0:07  \_ php-fpm: pool user1   
user1    2931998  2.4  0.6 1050024 110740 ?      R    10:56   0:15  \_ php-fpm: pool user1   
user1    2932003  1.4  0.5 1051608 99108 ?       R    10:56   0:09  \_ php-fpm: pool user1   
user1    2932027  1.6  0.5 1052044 108996 ?      R    10:56   0:10  \_ php-fpm: pool user1   
user1    2932038  1.1  0.5 1051888 110468 ?      D    10:56   0:07  \_ php-fpm: pool user1   
user1    2932065  1.0  0.5 1051744 101412 ?      R    10:56   0:06  \_ php-fpm: pool user1   
user3    2932106  0.1  0.4 1193060 74464 ?       S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932107  0.2  1.4 1193124 271228 ?      S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932108  0.0  0.3 976000 66484 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932109  0.1  1.4 1196168 274044 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932110  0.3  1.5 1196276 278316 ?      S    10:56   0:02  \_ php-fpm: pool user3   
user3    2932111  0.1  1.2 1196256 226404 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932112  0.0  0.3 978032 63744 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932113  0.0  0.3 982000 69900 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932114  0.0  0.4 977972 75720 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932115  0.2  1.4 1196456 270640 ?      S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932116  0.1  1.4 1196240 273452 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932117  0.1  0.5 1196420 96876 ?       S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932118  0.0  0.3 978960 63484 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932119  0.0  0.3 982100 61792 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932121  0.0  0.3 978052 62960 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932122  0.0  0.3 979876 72536 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932123  3.0  1.4 1195028 274992 ?      R    10:56   0:19  \_ php-fpm: pool user3   
user3    2932124  0.0  0.3 978032 58016 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932125  0.0  0.3 976004 68400 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user4    2932437  0.4  0.5 1074352 92868 ?       S    10:56   0:02  \_ php-fpm: pool user4   
user4    2932438  0.4  0.4 983696 78756 ?        S    10:56   0:02  \_ php-fpm: pool user4   
user4    2932439  0.3  0.4 978992 82620 ?        S    10:56   0:02  \_ php-fpm: pool user4   
user4    2934711  0.5  0.3 988968 73364 ?        S    10:58   0:02  \_ php-fpm: pool user4   
user4    2934712  0.3  0.4 990936 90548 ?        S    10:58   0:01  \_ php-fpm: pool user4   
user4    2934713  0.2  0.4 992468 77484 ?        S    10:58   0:01  \_ php-fpm: pool user4   
user4    2934714  0.4  0.3 988216 72428 ?        S    10:58   0:02  \_ php-fpm: pool user4   
user4    2934715  0.6  0.3 984280 73088 ?        S    10:58   0:03  \_ php-fpm: pool user4   
user4    2934716  0.0  0.3 976828 64748 ?        S    10:58   0:00  \_ php-fpm: pool user4   
user4    2934717  1.9  0.4 995960 78284 ?        S    10:58   0:09  \_ php-fpm: pool user4   
user4    2934718  0.2  0.3 985656 71868 ?        S    10:58   0:01  \_ php-fpm: pool user4   
user2    2940292  1.8  0.2 973880 51728 ?        R    11:01   0:06  \_ php-fpm: pool user2   
www-data 2948228  7.1  0.0 972992 11112 ?        R    11:05   0:06  \_ php-fpm: pool www
user3    2948328  0.5  0.1 973880 28368 ?        S    11:05   0:00  \_ php-fpm: pool user3   
user3    2948468 16.2  0.0 973044 10440 ?        R    11:05   0:12  \_ php-fpm: pool user3   
root     2950407 15.8  0.0 972552  6556 ?        R    11:06   0:02  \_ php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)

Second capture - the last process is new child? It did not change it's caption to php-fpm: pool user1

root     2931215  2.8  0.1 972808 36372 ?        Ds   10:55   0:19 php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)
user1    2931217  1.9  0.5 1050248 104992 ?      D    10:55   0:13  \_ php-fpm: pool user1   
user1    2931219  1.9  0.5 1050012 110432 ?      R    10:55   0:13  \_ php-fpm: pool user1   
user2    2931677  1.9  0.3 974164 56388 ?        D    10:56   0:13  \_ php-fpm: pool user2   
user2    2931783  1.8  0.3 974180 56692 ?        D    10:56   0:13  \_ php-fpm: pool user2   
user3    2931790  0.1  0.4 979976 75200 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2931813  0.2  0.4 1194416 84064 ?       S    10:56   0:01  \_ php-fpm: pool user3   
user3    2931837  0.0  0.4 980020 80060 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user1    2931951  2.8  0.5 978088 94976 ?        R    10:56   0:19  \_ php-fpm: pool user1   
user1    2931959  2.0  0.5 1050036 103832 ?      D    10:56   0:14  \_ php-fpm: pool user1   
user1    2931978  1.9  0.5 1052028 98800 ?       R    10:56   0:13  \_ php-fpm: pool user1   
user1    2931979  1.9  0.5 1052072 93476 ?       D    10:56   0:13  \_ php-fpm: pool user1   
user1    2931993  2.0  0.6 1050444 113148 ?      D    10:56   0:13  \_ php-fpm: pool user1   
user1    2931998  3.2  0.5 1050024 109096 ?      R    10:56   0:22  \_ php-fpm: pool user1   
user1    2932003  2.2  0.5 1051608 97692 ?       R    10:56   0:15  \_ php-fpm: pool user1   
user1    2932027  2.4  0.5 1052044 107680 ?      R    10:56   0:17  \_ php-fpm: pool user1   
user1    2932038  2.0  0.5 1051888 109048 ?      R    10:56   0:13  \_ php-fpm: pool user1   
user1    2932065  1.8  0.5 1051744 100816 ?      R    10:56   0:12  \_ php-fpm: pool user1   
user3    2932106  0.1  0.3 1193060 73260 ?       S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932107  0.2  1.4 1193124 270024 ?      S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932108  0.0  0.3 976000 65280 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932109  0.1  1.4 1196168 272784 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932110  0.3  1.5 1196276 276980 ?      S    10:56   0:02  \_ php-fpm: pool user3   
user3    2932111  0.1  1.2 1196256 225036 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932112  0.0  0.3 978032 62408 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932113  0.0  0.3 982000 68640 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932114  0.0  0.4 977972 74384 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932115  0.2  1.4 1196456 268768 ?      S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932116  0.1  1.4 1196240 272116 ?      S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932117  0.1  0.5 1196420 95672 ?       S    10:56   0:01  \_ php-fpm: pool user3   
user3    2932118  0.0  0.3 978960 62148 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932119  0.0  0.3 982100 60512 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932121  0.0  0.3 978052 61644 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932122  0.0  0.3 979876 71200 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932123  4.1  1.4 1195028 274116 ?      D    10:56   0:28  \_ php-fpm: pool user3   
user3    2932124  0.0  0.3 978032 56812 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user3    2932125  0.0  0.3 976004 67196 ?        S    10:56   0:00  \_ php-fpm: pool user3   
user4    2932437  0.4  0.4 1074352 91528 ?       S    10:56   0:02  \_ php-fpm: pool user4   
user4    2932438  0.3  0.4 983696 77416 ?        S    10:56   0:02  \_ php-fpm: pool user4   
user4    2932439  0.3  0.4 978992 81280 ?        S    10:56   0:02  \_ php-fpm: pool user4   
user4    2934711  0.7  0.3 988968 71644 ?        D    10:58   0:03  \_ php-fpm: pool user4   
user4    2934712  0.3  0.4 990936 89212 ?        S    10:58   0:01  \_ php-fpm: pool user4   
user4    2934713  0.9  0.4 992468 76068 ?        R    10:58   0:05  \_ php-fpm: pool user4   
user4    2934714  1.1  0.3 988216 70696 ?        R    10:58   0:06  \_ php-fpm: pool user4   
user4    2934715  1.3  0.3 984280 71212 ?        R    10:58   0:07  \_ php-fpm: pool user4   
user4    2934716  0.0  0.3 976828 63404 ?        S    10:58   0:00  \_ php-fpm: pool user4   
user4    2934717  1.8  0.4 995960 76940 ?        S    10:58   0:09  \_ php-fpm: pool user4   
user4    2934718  0.1  0.3 985656 70528 ?        S    10:58   0:01  \_ php-fpm: pool user4   
user2    2940292  3.2  0.2 973880 50252 ?        D    11:01   0:12  \_ php-fpm: pool user2   
www-data 2948228  6.7  0.0      0     0 ?        Z    11:05   0:08  \_ [php-fpm8.2] <defunct>
user3    2948328  0.3  0.1 973880 27984 ?        S    11:05   0:00  \_ php-fpm: pool user3   
user3    2948468 15.3  0.0 973044  9716 ?        R    11:05   0:19  \_ php-fpm: pool user3   
user1    2950407 13.8  0.0 973044 11872 ?        R    11:06   0:08  \_ php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)

@bukka
Copy link
Member

bukka commented Nov 9, 2023

I have been thinking about this and there is one change in PHP 8.2 that might have potentially effect on this. The stream copy function is now using copy_file_range. I can't see it used anywhere in the image code but maybe there's some other place that I missed. Considerng this app seems to be doing lots of image copying and this is probably the only change that I can think of, I think it's worth to verify it.

You might potentially see some hints which syscall is used by the locked up process so I would really recommend to check /proc/PID/syscall and also /proc/PID/stack as linked above.

You could also verify if disabling copy_file_range helps but you would need to recompile PHP with following change applied for that:

diff --git a/main/streams/streams.c b/main/streams/streams.c
index 4cd211ad85..0262160d90 100644
--- a/main/streams/streams.c
+++ b/main/streams/streams.c
@@ -1556,7 +1556,7 @@ PHPAPI zend_result _php_stream_copy_to_stream_ex(php_stream *src, php_stream *de
                return SUCCESS;
        }
 
-#ifdef HAVE_COPY_FILE_RANGE
+#if 0
        if (php_stream_is(src, PHP_STREAM_IS_STDIO) &&
                        php_stream_is(dest, PHP_STREAM_IS_STDIO) &&
                        src->writepos == src->readpos) {

If you want to test it on your test server I would recommend you to set it up in the same system configuration as prod - especially with use of btrfs (copy_file_range is fylesystem specific so this can play a role).

@Elkropac
Copy link
Author

Ok, i need to find out how to get our test server to lockup again. I will update it to Debian12 and i will try to use btrfs.

@CasperLinden2
Copy link

CasperLinden2 commented May 13, 2024

I'm also seeing this exact same issue.

  • On multiple servers running different applications
  • One server is primarily just running mediawiki
  • As with OP, servers are configured with multiple fpm pools
  • As with OP, it seems to trigger after periods of high load
  • Also using btrfs, server running ext4 appears unaffected
  • As with OP, also using php-8.2 and issues started after upgrading from 8.1
  • PHP 8.2.18

However I tried the test Nette application and i'm unable to reproduce with this.

@Elkropac
Copy link
Author

Hi, try checking how much memory (and swap) you have assigned to php-fpm unit (service). Most of our problems dissapeared when we lifted the limits. When php-fpm ran out of memory, it started to swap heavily, causing IO. It was hard to diagnose, becasue there was plenty of free memory in the OS

@Elkropac
Copy link
Author

Elkropac commented May 13, 2024

There may be some problem with memory freeing in php8.2, we have one server, where php-fpm slice consumes around 2-3GB of RAM. Sometimes, it starts to creep up, we have had to increase memory limit of slice/unit up to 12GB. It acts strangely, newly spawned processes already have bigger memory footprint then usually

@CasperLinden2
Copy link

@Elkropac What limits are you referring to?

@Elkropac
Copy link
Author

We have had file /etc/systemd/system/php8.2-fpm.service.d/override-memory.conf
with

[Service]
MemoryHigh=2G
MemoryMax=3G
MemorySwapMax=1G

When it uses all available memory, it starts to swap heavily. Rest of system has plenty of memory.

We increased these limits and most problems went away

@CasperLinden2
Copy link

CasperLinden2 commented May 13, 2024

Hm, I don't have that set, and i'm not convinced that actually does anything... it would just cap the memory usage of the php-fpm daemon itself, not the workers, surely?

@CasperLinden2
Copy link

I've enabled pm.max_requests = 10 in php-fpm just in case there may be a leak, but I don't think it is, there was plenty available in my case.

@Elkropac
Copy link
Author

Elkropac commented May 13, 2024

Hm, I don't have that set, and i'm not convinced that actually does anything... it would just cap the memory usage of the php-fpm daemon itself, not the workers, surely?

No, it should cap memory of entire unit, ie. main daemon and all children.

You can see it in output of systemctl status php8.2-fpm.service (if you have the limits set), like this

     Memory: 1.6G (high: 10.0G max: 12.0G swap max: 2.0G available: 8.3G)

I've enabled pm.max_requests = 10 in php-fpm just in case there may be a leak, but I don't think it is, there was plenty available in my case.

We use this setting for all users

pm = ondemand
pm.max_children = 50
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
pm.process_idle_timeout = 360s;
pm.max_requests = 500

but i think, i was experimenting with lowering pm.max_requests too

@CasperLinden2
Copy link

Hello

The issue just struck again.

srv ~ # ps aux | grep 1451966
legacy 1451966 0.0 0.1 227220 21220 ? S 21:21 0:00 php-fpm: pool inworld

srv ~ # cat /proc/1451966/syscall
43 0xf 0x7fffa59d0ad0 0x7fffa59d0ac8 0x0 0x0 0x0 0x7fffa59d0aa8 0x7fd6f7854964

srv ~ # cat /proc/1451966/stack
[<0>] inet_csk_accept+0x22c/0x450
[<0>] inet_accept+0x43/0xa0
[<0>] do_accept+0xf5/0x170
[<0>] __sys_accept4+0x5b/0xc0
[<0>] __x64_sys_accept+0x17/0x20
[<0>] do_syscall_64+0x3d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0xe2

All processes are in "S" state

Is there anything which can be gleamed from this?

@CasperLinden2
Copy link

Hey - I found the root cause. The issue appears to be a kernel NFS client bug. I had a misconfiguration which was causing files to be served via nfs rather than the local filesystem. Normally this had little impact, but at times of heavy load it occasionally caused the kernel nfs driver to just completely lock up, putting any process which tried to read from nfs into an uninterruptible sleep.

When the problem occurs, it's a transient issue (subsequent attempts to read from NFS work fine) but the processes never recover - they have to be killed and restarted.

So - not a PHP issue. But i thought i'd leave this here in case it helps anyone.

@TheRaven500
Copy link

I think i have the same problem. Randomly (more or less every 2 months) my server is loocking up and all services from apache (nextcloud, wiki etc.) are not accessable. After restarting fpm it works again. I am running Debian 12 with ext4. Until now i have not looked deeper in to the problem, i just restarted fpm and it worked again. But it would be nice if i don't need to do this 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants