Old EFS certificates not removed #124

ballock · 2022-03-21T22:54:59Z

We are running an EC2 instance with 512MB memory with 3 EFS mounts, using the EFS helper.

After 6 months of instance's uptime, the machine failed the mounts and got a number of issues caused by full /run filesystem.
du shows
13556 ./fs-3ac8f8f3.efs.ROTATED_OUT.20137+/certs
certs# ls -l |wc -l
3390

The directory holds hourly certificates for the last 6 months. There are 3 EFS mounts on the machine, so all of those filled up the 47MB /run filesystem that is there.

Please implement garbage collector for the certs.

OS: Debian 10.11 (has /var/run symlinked to /run on ramfs)
EFS helper version: 1.30.2 (but I checked the latest doesn't have cleanup, either)

Cappuccinuo · 2022-03-24T05:53:35Z

Hey,

Thanks for the report.

The certs are stored on /var/run/efs, instead of /run, from the log you posted, the certs take 13556KB, which is 13MiB. Can you double confirm that the efs-utils pem is causing system issue?

We do have cleanup logic running in our watchdog (https://github.com/aws/efs-utils/blob/master/src/watchdog/__init__.py#L824-L834). If the file system is umounted and then mount again, the certs should be cleaned up then. Can you elaborate on the failed the mount part, is that mount failed, or the mount is not making any progress? If the folder cannot be cleaned up, can you

Make sure the watchdog is running by using systemctl status amazon-efs-mount-watchdog
Turn on the debug log by modify the config file(/etc/amazon/efs/efs-utils.conf) item to logging_level = DEBUG, restart the watchdog process, and see whether there is error when removing those mount state dir?

ballock · 2022-03-30T14:54:37Z

The certs are stored on /var/run/efs, instead of /run,

At least on Debian-based systems, /var/run is a symlink to /run. Thus, /var/run/efs is effectively in the /run tmpfs filesystem.

from the log you posted, the certs take 13556KB, which is 13MiB.

That's correct for one filesystem mount. I have 3 EFS mounts on the machine, and together with some system files normally in /run they take up all 47MB that are available on the 512MB memory machine.

This is output from the current run. It's after cleaning up the pem files. You can see it can house up to 39M of more pem files.

admin@VM:~$ df -h /var/run
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            47M  7.7M   39M  17% /run
admin@VM:~$ free
              total        used        free      shared  buff/cache   available
Mem:         476032      159396       68440        7884      248196      296536
Swap:             0           0           0

Can you double confirm that the efs-utils pem is causing system issue?

Yes.

We do have cleanup logic running in our watchdog (https://github.com/aws/efs-utils/blob/master/src/watchdog/__init__.py#L824-L834). If the file system is umounted and then mount again, the certs should be cleaned up then. Can you elaborate on the failed the mount part, is that mount failed, or the mount is not making any progress?

I guess I was ambiguous. There was no umount attempt on the current 3 EFS mounts there. These were running for 6 months, after which the EFS watchdog failed to create new pem certificates, and couldn't fetch them to stunnel. Stunnel failed to re-establish the link, and the mounts became stale. It was also impossible to re-mount the EFS mounts.

I guess you can reproduce the problem by filling up /run on a Debian machine with random data and waiting for another re-keying attempt from the EFS watchdog.

Cappuccinuo · 2022-04-06T07:21:23Z

Thanks, got your point.

While we have someone investigating the issue, can you for now unmount the file system on a monthly frequency so that watchdog can clean up the state file directory?

ballock · 2022-04-06T20:58:52Z

Thanks for taking this seriously. I'll work around the issue for the time being.

Cappuccinuo added the enhancement label Apr 6, 2022

RyanStan mentioned this issue Jan 23, 2023

/usr/bin/amazon-efs-mount-watchdog - OSError: [Errno 28] No space left on device #154

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Old EFS certificates not removed #124

Old EFS certificates not removed #124

ballock commented Mar 21, 2022

Cappuccinuo commented Mar 24, 2022

ballock commented Mar 30, 2022

Cappuccinuo commented Apr 6, 2022

ballock commented Apr 6, 2022

Old EFS certificates not removed #124

Old EFS certificates not removed #124

Comments

ballock commented Mar 21, 2022

Cappuccinuo commented Mar 24, 2022

ballock commented Mar 30, 2022

Cappuccinuo commented Apr 6, 2022

ballock commented Apr 6, 2022