Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dokku retire service fails to acquire ps:retire lock #5432

Closed
robotski opened this issue Oct 28, 2022 · 12 comments · Fixed by #5489
Closed

Dokku retire service fails to acquire ps:retire lock #5432

robotski opened this issue Oct 28, 2022 · 12 comments · Fixed by #5489

Comments

@robotski
Copy link

Description of problem

We're seeing frequent failures of dokku-retire.service in systemd.
Server is an EC2 running Ubuntu 20.04.5 LTS

Steps to reproduce

dokku-retire.timer fires.

dokku report $APP_NAME

-----> uname: Linux prod 5.15.0-1022-aws #26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
-----> memory:
total used free shared buff/cache available
Mem: 3921 527 2159 1 1234 3158
Swap: 0 0 0
-----> docker version:
Client: Docker Engine - Community
Version: 20.10.21
API version: 1.41
Go version: go1.18.7
Git commit: baeda1f
Built: Tue Oct 25 18:02:21 2022
OS/Arch: linux/amd64
Context: default
Experimental: true

   Server: Docker Engine - Community
    Engine:
     Version:          20.10.21
     API version:      1.41 (minimum version 1.12)
     Go version:       go1.18.7
     Git commit:       3056208
     Built:            Tue Oct 25 18:00:04 2022
     OS/Arch:          linux/amd64
     Experimental:     false
    containerd:
     Version:          1.6.9
     GitCommit:        1c90a442489720eec95342e1789ee8a5e1b9536f
    runc:
     Version:          1.1.4
     GitCommit:        v1.1.4-0-g5fd4c4d
    docker-init:
     Version:          0.19.0
     GitCommit:        de40ad0

-----> docker daemon info:
Client:
Context: default
Debug Mode: true
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
scan: Docker Scan (Docker Inc., v0.21.0)

   Server:
    Containers: 5
     Running: 1
     Paused: 0
     Stopped: 4
    Images: 24
    Server Version: 20.10.21
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
     userxattr: false
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Cgroup Version: 1
    Plugins:
     Volume: local
     Network: bridge host ipvlan macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: inactive
    Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 1c90a442489720eec95342e1789ee8a5e1b9536f
    runc version: v1.1.4-0-g5fd4c4d
    init version: de40ad0
    Security Options:
     apparmor
     seccomp
      Profile: default
    Kernel Version: 5.15.0-1022-aws
    Operating System: Ubuntu 20.04.5 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 2
    Total Memory: 3.829GiB
    Name: prod
    ID: HU6U:7ZXG:25SG:KZE7:7OSN:AC66:IGYU:VJNJ:YUOZ:OXSD:P3EC:DAWK
    Docker Root Dir: /var/lib/docker
    Debug Mode: false
    Username: XXXXXX
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false

-----> git version: git version 2.25.1
-----> sigil version: 0.9.0build+bc921b7
-----> herokuish version:
herokuish: 0.5.37
buildpacks:
heroku-buildpack-multi v1.2.0
heroku-buildpack-ruby v244
heroku-buildpack-nodejs v198
heroku-buildpack-clojure v90
heroku-buildpack-python v214
heroku-buildpack-java v72
heroku-buildpack-gradle v38
heroku-buildpack-scala v94
heroku-buildpack-play v26
heroku-buildpack-php v223
heroku-buildpack-go v166
heroku-buildpack-nginx v16
buildpack-null v3
-----> dokku version: dokku version 0.28.4
-----> plugn version: plugn: 0.12.0build+3a27594
-----> dokku plugins:
00_dokku-standard 0.28.4 enabled dokku core standard plugin
20_events 0.28.4 enabled dokku core events logging plugin
app-json 0.28.4 enabled dokku core app-json plugin
apps 0.28.4 enabled dokku core apps plugin
builder 0.28.4 enabled dokku core builder plugin
builder-dockerfile 0.28.4 enabled dokku core builder-dockerfile plugin
builder-herokuish 0.28.4 enabled dokku core builder-herokuish plugin
builder-lambda 0.28.4 enabled dokku core builder-lambda plugin
builder-null 0.28.4 enabled dokku core builder-null plugin
builder-pack 0.28.4 enabled dokku core builder-pack plugin
buildpacks 0.28.4 enabled dokku core buildpacks plugin
caddy-vhosts 0.28.4 enabled dokku core caddy-vhosts plugin
certs 0.28.4 enabled dokku core certificate management plugin
checks 0.28.4 enabled dokku core checks plugin
common 0.28.4 enabled dokku core common plugin
config 0.28.4 enabled dokku core config plugin
cron 0.28.4 enabled dokku core cron plugin
docker-options 0.28.4 enabled dokku core docker-options plugin
domains 0.28.4 enabled dokku core domains plugin
enter 0.28.4 enabled dokku core enter plugin
git 0.28.4 enabled dokku core git plugin
letsencrypt 0.13.0 enabled Automated installation of let's encrypt TLS certificates
logs 0.28.4 enabled dokku core logs plugin
network 0.28.4 enabled dokku core network plugin
nginx-vhosts 0.28.4 enabled dokku core nginx-vhosts plugin
plugin 0.28.4 enabled dokku core plugin plugin
postgres 1.19.3 enabled dokku postgres service plugin
proxy 0.28.4 enabled dokku core proxy plugin
ps 0.28.4 enabled dokku core ps plugin
registry 0.28.4 enabled dokku core registry plugin
repo 0.28.4 enabled dokku core repo plugin
resource 0.28.4 enabled dokku core resource plugin
run 0.28.4 enabled dokku core run plugin
scheduler 0.28.4 enabled dokku core scheduler plugin
scheduler-docker-local 0.28.4 enabled dokku core scheduler-docker-local plugin
scheduler-null 0.28.4 enabled dokku core scheduler-null plugin
shell 0.28.4 enabled dokku core shell plugin
ssh-keys 0.28.4 enabled dokku core ssh-keys plugin
storage 0.28.4 enabled dokku core storage plugin
trace 0.28.4 enabled dokku core trace plugin
traefik-vhosts 0.28.4 enabled dokku core traefik-vhosts plugin
=====> portal app-json information
App json computed selected: app.json
App json global selected: app.json
App json selected:
=====> portal app information
App created at: 1666907959
App deploy source: portal
App deploy source metadata: portal
App dir: /home/dokku/portal
App locked: false
=====> portal builder information
Builder build dir:
Builder computed build dir:
Builder computed selected:
Builder global build dir:
Builder global selected:
Builder selected:
=====> portal builder-dockerfile information
Builder dockerfile computed dockerfile path: Dockerfile
Builder dockerfile global dockerfile path: Dockerfile
Builder dockerfile dockerfile path:
=====> portal builder-lambda information
Builder lambda computed lambdayml path: lambda.yml
Builder lambda global lambdayml path: lambda.yml
Builder lambda lambdayml path:
=====> portal builder-pack information
Builder pack computed projecttoml path: project.toml
Builder pack global projecttoml path: project.toml
Builder pack projecttoml path:
=====> portal buildpacks information
Buildpacks computed stack: gliderlabs/herokuish:latest-20
Buildpacks global stack:
Buildpacks list:
Buildpacks stack:
=====> portal ssl information
Ssl dir: /home/dokku/portal/tls
Ssl enabled: true
Ssl hostnames: *.XXXXXX XXXXXX
Ssl expires at: Jan 11 22:20:08 2023 GMT
Ssl issuer: C = US, O = Let's Encrypt, CN = R3
Ssl starts at: Oct 13 22:20:09 2022 GMT
Ssl subject: subject=CN = *.XXXXXX
Ssl verified: self signed
=====> portal checks information
Checks disabled list: none
Checks skipped list: none
=====> portal cron information
Cron task count: 0
=====> portal docker options information
Docker options build:
Docker options deploy: --restart=on-failure:10
Docker options run:
=====> portal domains information
Domains app enabled: true
Domains app vhosts: portal.XXXXXX
Domains global enabled: true
Domains global vhosts: XXXXXX
=====> portal git information
Git deploy branch: master
Git global deploy branch: master
Git keep git dir: false
Git rev env var: GIT_REV
Git sha: 1acea54
Git last updated at: 1665693701
=====> portal logs information
Logs computed max size: 10m
Logs global max size: 10m
Logs global vector sink:
Logs max size:
Logs vector sink:
=====> portal network information
Network attach post create:
Network attach post deploy:
Network bind all interfaces: false
Network computed attach post create:
Network computed attach post deploy:
Network computed bind all interfaces: false
Network computed initial network:
Network computed tld:
Network global attach post create:
Network global attach post deploy:
Network global bind all interfaces: false
Network global initial network:
Network global tld:
Network initial network:
Network static web listener:
Network tld:
Network web listeners: 172.17.0.3:5000
=====> portal nginx information
Nginx access log format:
Nginx access log path: /var/log/nginx/portal-access.log
Nginx bind address ipv4:
Nginx bind address ipv6: ::
Nginx client max body size:
Nginx disable custom config: false
Nginx error log path: /var/log/nginx/portal-error.log
Nginx global hsts: true
Nginx computed hsts: true
Nginx hsts:
Nginx hsts include subdomains: true
Nginx hsts max age: 15724800
Nginx hsts preload: false
Nginx proxy buffer size: 4096
Nginx proxy buffering: on
Nginx proxy buffers: 8 4096
Nginx proxy busy buffers size: 8192
Nginx proxy read timeout: 60s
Nginx last visited at: 1666977368
Nginx x forwarded for value: $remote_addr
Nginx x forwarded port value: $server_port
Nginx x forwarded proto value: $scheme
Nginx x forwarded ssl:
=====> portal proxy information
Proxy enabled: true
Proxy port map: http:80:5000 https:443:5000
Proxy type: nginx
=====> portal ps information
Deployed: true
Processes: 1
Ps can scale: true
Ps computed procfile path: Procfile
Ps global procfile path: Procfile
Ps procfile path:
Ps restart policy: on-failure:10
Restore: true
Running: true
Status web 1: running (CID: a0219d12b15)
=====> portal registry information
Registry computed image repo: dokku/portal
Registry computed push on release: false
Registry computed server:
Registry global push on release:
Registry global server:
Registry image repo:
Registry push on release:
Registry server:
Registry tag version:
=====> portal resource information
=====> portal scheduler information
Scheduler computed selected: docker-local
Scheduler global selected: docker-local
Scheduler selected:
=====> portal scheduler-docker-local information
Scheduler docker local disable chown:
Scheduler docker local init process: true
Scheduler docker local parallel schedule count:
=====> portal storage information
Storage build mounts:
Storage deploy mounts:
Storage run mounts:

Additional information

No response

Output of failing commands after running: dokku trace:off

Oct 27 18:55:01 prod systemd[1]: Starting Dokku retire service...
Oct 27 18:55:01 prod dokku[2747215]:  !     Failed to acquire ps:retire lock
Oct 27 18:55:01 prod systemd[1]: dokku-retire.service: Main process exited, code=exited, status=1/FAILURE
Oct 27 18:55:01 prod systemd[1]: dokku-retire.service: Failed with result 'exit-code'.
Oct 27 18:55:01 prod systemd[1]: Failed to start Dokku retire service.

Output of failing commands after running: dokku trace:on

Oct 28 17:20:01 prod systemd[1]: Starting Dokku retire service...
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_HOST_ROOT=/home/dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_HOST_ROOT=/home/dokku
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_DISTRO
Oct 28 17:20:01 prod dokku[371840]: ++ . /etc/os-release
Oct 28 17:20:01 prod dokku[371840]: ++ echo ubuntu
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_DISTRO=ubuntu
Oct 28 17:20:01 prod dokku[371836]: + export DOCKER_BIN=docker
Oct 28 17:20:01 prod dokku[371836]: + DOCKER_BIN=docker
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_IMAGE=gliderlabs/herokuish:latest-20
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_IMAGE=gliderlabs/herokuish:latest-20
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_CNB_BUILDER=heroku/buildpacks
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_CNB_BUILDER=heroku/buildpacks
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_LIB_ROOT=/var/lib/dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_LIB_ROOT=/var/lib/dokku
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_PATH=/var/lib/dokku/plugins
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_PATH=/var/lib/dokku/plugins
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_AVAILABLE_PATH=/var/lib/dokku/plugins/available
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_AVAILABLE_PATH=/var/lib/dokku/plugins/available
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_ENABLED_PATH=/var/lib/dokku/plugins/enabled
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_ENABLED_PATH=/var/lib/dokku/plugins/enabled
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_CORE_PATH=/var/lib/dokku/core-plugins
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_CORE_PATH=/var/lib/dokku/core-plugins
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_CORE_AVAILABLE_PATH=/var/lib/dokku/core-plugins/available
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_CORE_AVAILABLE_PATH=/var/lib/dokku/core-plugins/available
Oct 28 17:20:01 prod dokku[371836]: + export PLUGIN_CORE_ENABLED_PATH=/var/lib/dokku/core-plugins/enabled
Oct 28 17:20:01 prod dokku[371836]: + PLUGIN_CORE_ENABLED_PATH=/var/lib/dokku/core-plugins/enabled
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_SYSTEM_GROUP=dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_SYSTEM_GROUP=dokku
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_SYSTEM_USER=dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_SYSTEM_USER=dokku
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_API_VERSION=1
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_API_VERSION=1
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_NOT_IMPLEMENTED_EXIT=10
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_NOT_IMPLEMENTED_EXIT=10
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_VALID_EXIT=0
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_VALID_EXIT=0
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_PID=371836
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_PID=371836
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_LOGS_DIR=/var/log/dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_LOGS_DIR=/var/log/dokku
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_LOGS_HOST_DIR=/var/log/dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_LOGS_HOST_DIR=/var/log/dokku
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_EVENTS_LOGFILE=/var/log/dokku/events.log
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_EVENTS_LOGFILE=/var/log/dokku/events.log
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_CONTAINER_LABEL=dokku
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_CONTAINER_LABEL=dokku
Oct 28 17:20:01 prod dokku[371836]: + export 'DOKKU_GLOBAL_BUILD_ARGS=--label=org.label-schema.schema-version=1.0 --label=org.label-schema.vendor=dokku --label=dokku'
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_GLOBAL_BUILD_ARGS='--label=org.label-schema.schema-version=1.0 --label=org.label-schema.vendor=dokku --label=dokku'
Oct 28 17:20:01 prod dokku[371836]: + export 'DOKKU_GLOBAL_RUN_ARGS=--label=org.label-schema.schema-version=1.0 --label=org.label-schema.vendor=dokku --label=dokku'
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_GLOBAL_RUN_ARGS='--label=org.label-schema.schema-version=1.0 --label=org.label-schema.vendor=dokku --label=dokku'
Oct 28 17:20:01 prod dokku[371836]: + source /var/lib/dokku/core-plugins/available/common/functions
Oct 28 17:20:01 prod dokku[371836]: ++ set -eo pipefail
Oct 28 17:20:01 prod dokku[371836]: ++ [[ -n 1 ]]
Oct 28 17:20:01 prod dokku[371836]: ++ set -x
Oct 28 17:20:01 prod dokku[371836]: + parse_args ps:retire
Oct 28 17:20:01 prod dokku[371836]: + declare 'desc=top-level cli arg parser'
Oct 28 17:20:01 prod dokku[371836]: + local next_index=1
Oct 28 17:20:01 prod dokku[371836]: + local skip=false
Oct 28 17:20:01 prod dokku[371836]: + args=("$@")
Oct 28 17:20:01 prod dokku[371836]: + local args
Oct 28 17:20:01 prod dokku[371836]: + local flags
Oct 28 17:20:01 prod dokku[371836]: + for arg in "$@"
Oct 28 17:20:01 prod dokku[371836]: + [[ false == \t\r\u\e ]]
Oct 28 17:20:01 prod dokku[371836]: + case "$arg" in
Oct 28 17:20:01 prod dokku[371836]: + [[ false == \t\r\u\e ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire == \-\-\a\p\p ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire =~ ^--.* ]]
Oct 28 17:20:01 prod dokku[371836]: + next_index=2
Oct 28 17:20:01 prod dokku[371836]: + [[ -z '' ]]
Oct 28 17:20:01 prod dokku[371848]: ++ echo -e ''
Oct 28 17:20:01 prod dokku[371849]: ++ sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_GLOBAL_FLAGS=
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_GLOBAL_FLAGS=
Oct 28 17:20:01 prod dokku[371836]: + return 0
Oct 28 17:20:01 prod dokku[371836]: + args=("$@")
Oct 28 17:20:01 prod dokku[371836]: + skip_arg=false
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire =~ ^--.* ]]
Oct 28 17:20:01 prod dokku[371836]: + has_tty
Oct 28 17:20:01 prod dokku[371836]: + declare 'desc=return 0 if we have a tty'
Oct 28 17:20:01 prod dokku[371836]: + [[ '' == \t\r\u\e ]]
Oct 28 17:20:01 prod dokku[371850]: ++ LC_ALL=C
Oct 28 17:20:01 prod dokku[371850]: ++ /usr/bin/tty
Oct 28 17:20:01 prod dokku[371850]: ++ true
Oct 28 17:20:01 prod dokku[371836]: + [[ not a tty == \n\o\t\ \a\ \t\t\y ]]
Oct 28 17:20:01 prod dokku[371836]: + return 1
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_QUIET_OUTPUT=1
Oct 28 17:20:01 prod dokku[371855]: ++ id -un
Oct 28 17:20:01 prod dokku[371836]: + [[ dokku != \d\o\k\k\u ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire =~ ^plugin:.* ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire == \s\s\h\-\k\e\y\s\:\a\d\d ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire == \s\s\h\-\k\e\y\s\:\r\e\m\o\v\e ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ -n '' ]]
Oct 28 17:20:01 prod dokku[371836]: + dokku_auth ps:retire
Oct 28 17:20:01 prod dokku[371836]: + declare 'desc=calls user-auth plugin trigger'
Oct 28 17:20:01 prod dokku[371836]: + export SSH_USER=dokku
Oct 28 17:20:01 prod dokku[371836]: + SSH_USER=dokku
Oct 28 17:20:01 prod dokku[371836]: + export SSH_NAME=default
Oct 28 17:20:01 prod dokku[371836]: + SSH_NAME=default
Oct 28 17:20:01 prod dokku[371836]: + export DOKKU_COMMAND=ps:retire
Oct 28 17:20:01 prod dokku[371836]: + DOKKU_COMMAND=ps:retire
Oct 28 17:20:01 prod dokku[371864]: ++ wc -l
Oct 28 17:20:01 prod dokku[371863]: ++ find /var/lib/dokku/plugins/enabled/20_events/user-auth
Oct 28 17:20:01 prod dokku[371836]: + local user_auth_count=1
Oct 28 17:20:01 prod dokku[371836]: + [[ 1 == 0 ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ 1 == 1 ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ -f /var/lib/dokku/plugins/enabled/20_events/user-auth ]]
Oct 28 17:20:01 prod dokku[371836]: + return 0
Oct 28 17:20:01 prod dokku[371836]: + case "$1" in
Oct 28 17:20:01 prod dokku[371836]: + execute_dokku_cmd ps:retire
Oct 28 17:20:01 prod dokku[371836]: + declare 'desc=executes dokku sub-commands'
Oct 28 17:20:01 prod dokku[371836]: + local PLUGIN_NAME=ps:retire
Oct 28 17:20:01 prod dokku[371836]: + local PLUGIN_CMD=ps:retire
Oct 28 17:20:01 prod dokku[371836]: + local implemented=0
Oct 28 17:20:01 prod dokku[371836]: + local script
Oct 28 17:20:01 prod dokku[371836]: + argv=("$@")
Oct 28 17:20:01 prod dokku[371836]: + local argv
Oct 28 17:20:01 prod dokku[371836]: + case "$PLUGIN_NAME" in
Oct 28 17:20:01 prod dokku[371866]: ++ readlink -f /var/lib/dokku/plugins/enabled/ps
Oct 28 17:20:01 prod dokku[371836]: + [[ /var/lib/dokku/core-plugins/available/ps == *core-plugins* ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ ps:retire == \p\s\:\r\e\t\i\r\e ]]
Oct 28 17:20:01 prod dokku[371836]: + shift 1
Oct 28 17:20:01 prod dokku[371836]: + [[ -n '' ]]
Oct 28 17:20:01 prod dokku[371836]: + set -- ps:retire
Oct 28 17:20:01 prod dokku[371836]: + [[ -x /var/lib/dokku/plugins/enabled/ps:retire/subcommands/default ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ -x /var/lib/dokku/plugins/enabled/ps:retire/subcommands/ps:retire ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ -x /var/lib/dokku/plugins/enabled/ps/subcommands/retire ]]
Oct 28 17:20:01 prod dokku[371836]: + [[ -n retire ]]
Oct 28 17:20:01 prod dokku[371836]: + /var/lib/dokku/plugins/enabled/ps/subcommands/retire ps:retire
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect [--global]
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr: + source /var/lib/dokku/core-plugins/available/common/functions
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr: ++ set -eo pipefail
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr: ++ [[ -n 1 ]]
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr: ++ set -x
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr: + [[ ! -n '' ]]
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stderr:
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stdout: docker-local
Oct 28 17:20:01 prod dokku[371871]:  ?     plugn trigger scheduler-detect stdout:
Oct 28 17:20:01 prod dokku[371871]:  !     Failed to acquire ps:retire lock
Oct 28 17:20:01 prod systemd[1]: dokku-retire.service: Main process exited, code=exited, status=1/FAILURE
Oct 28 17:20:01 prod systemd[1]: dokku-retire.service: Failed with result 'exit-code'.
Oct 28 17:20:01 prod systemd[1]: Failed to start Dokku retire service.
@josegonzalez
Copy link
Member

When you see that, are you running any other dokku commands? I wonder if you have a ps:retire command still running in the background (there are no timeouts implemented at this time), which to me points to a funky docker issue (since docker is performing the majority of the work here, with a small amount of shell scripting on top).

@robotski
Copy link
Author

I'm not running any other dokku commands manually, it probably is the previous scheduled retire command.

@josegonzalez
Copy link
Member

What is the output of docker ps -a? It might be that the host has a ton of old docker containers that its retiring or that docker itself is wedged and isn't allowing you to kill the containers in a reasonable timeframe.

@robotski
Copy link
Author

robotski commented Nov 14, 2022

CONTAINER ID   IMAGE                 COMMAND         CREATED        STATUS       PORTS      NAMES
befb1a12e047   dokku/portal:latest   "supervisord"   13 days ago    Up 13 days   5000/tcp   portal.web.1
3e408c53f394   8eada9fa95b3          "supervisord"   5 months ago   Created                 portal.web.1.upcoming-22645
84a32b8ebb85   7014a61219bf          "supervisord"   5 months ago   Created                 portal.web.1.upcoming-11897
9b8a42b06883   86444adf9b13          "supervisord"   5 months ago   Created                 portal.web.1.upcoming-29482
dc7dfd80746b   057b0d3e8fb7          "supervisord"   5 months ago   Created                 portal.web.1.upcoming-16840

@josegonzalez
Copy link
Member

Can you docker rm -f those last three? And also what is the output of you manually calling dokku ps:retire?

@robotski
Copy link
Author

Removed the last three containers. Output of dokku ps:retire both before and after was

ubuntu@prod:~$ dokku ps:retire
-----> Retiring old containers and images
       Attempting to retire portal image 1b5f12d1762db4f1d2af709125a0eb49a7f2009ae5ee9ece1eaa492b2eb6fa66
 !     Image 1b5f12d1762db4f1d2af709125a0eb49a7f2009ae5ee9ece1eaa492b2eb6fa66 has children images and has 1 tags, skipping rm

@josegonzalez
Copy link
Member

Alright well I guess its working now? It should only error about locks if there is a concurrent retire occurring (which could happen if a previous one takes longer than expected).

Can you see if you're still getting errors?

@robotski
Copy link
Author

The systemd job running every five minutes is still failing intermittently.

@josegonzalez
Copy link
Member

I'll add some extra logging to note if the lock already exists (which would indicate that a separate ps:retire call somehow exited without cleaning up) but otherwise, what would you like to happen in this case?

@robotski
Copy link
Author

I would do:

  • Have dokku ps:retire return some code higher than 1 if we can't acquire the lock, and add that new code to SuccessExitStatus in dokku-retire.service (here we assume that another instance is still alive and doing work)
  • Check the age of the lock, if it's older than some timeout then we're probably in a deadlock state and should actually return a failure exit code.

@josegonzalez
Copy link
Member

There isn't a a handle on the lock age (see gofrs/flock#62 for maybe future developments) so for now, I can only implement the first part there (which is ~good enough).

@josegonzalez
Copy link
Member

Closing as there is a pull request open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants