Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS mount fails on ECS task with awsvpc network mode on host using systemd DNS stub resolver #143

Open
ltm opened this issue Oct 3, 2022 · 2 comments
Labels

Comments

@ltm
Copy link

ltm commented Oct 3, 2022

If the host system is configured to use the systemd DNS stub resolver, then an ECS task using the awsvpc network mode will fail to mount an EFS volume.

When the systemd DNS stub resolver is enabled, the resolver configuration file will specify 127.0.0.53 as the name server (by symlinking to /run/systemd/resolve/stub-resolv.conf):

$ readlink /etc/resolv.conf
/run/systemd/resolve/stub-resolv.conf
$ grep nameserver /etc/resolv.conf
nameserver 127.0.0.53

Note: On Amazon Linux 2022 upgrading the systemd-resolved package will enable the DNS stub resolver even if it was previously disabled.

ECS containers using the awsvpc network mode are isolated from the host by a network namespace and are therefore not able to use 127.0.0.53 as a name server. Docker detects this condition and configures the containers to use the VPC name server configured in /run/systemd/resolve/resolv.conf.

$ journalctl -u docker
Sep 29 06:54:47 ip-10-2-128-71.us-west-2.compute.internal dockerd[3402328]: time="2022-09-29T06:54:47.583869871Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
$ grep nameserver /run/systemd/resolve/resolv.conf
nameserver 10.2.0.2

When the amazon-ecs-volume-plugin service through the mount.efs script attempts to mount an EFS volume for an ECS container using the awsvpc network mode it will use nsenter to invoke the stunnel and mount.nfs4 commands in the same network namespace as the container.

$ cat /var/log/amazon/efs/mount.log
2022-09-29 07:00:54 UTC - INFO - version=1.33.2 options={'rw': None, 'tls': None, 'netns': '/proc/3404229/ns/net'}
2022-09-29 07:00:54 UTC - INFO - binding 20240
2022-09-29 07:00:54 UTC - WARNING - stunnel does not support "b'libwrap'"
2022-09-29 07:00:54 UTC - INFO - Starting TLS tunnel: "nsenter --net=/proc/3404229/ns/net /usr/bin/stunnel /var/run/efs/stunnel-config.fs-[efs_id].var.lib.ecs.volumes.ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601.20240"
2022-09-29 07:00:54 UTC - INFO - Started TLS tunnel, pid: 3404373
2022-09-29 07:00:54 UTC - INFO - Executing: "nsenter --net=/proc/3404229/ns/net /sbin/mount.nfs4 127.0.0.1:/ /var/lib/ecs/volumes/ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601 -o rw,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20240" with 15 sec time limit.
2022-09-29 07:01:09 UTC - ERROR - Mounting fs-[efs_id].efs.us-west-2.amazonaws.com to /var/lib/ecs/volumes/ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601 failed due to timeout after 15 sec, mount attempt 1/3, wait 0 sec before next attempt. 

However, since the amazon-ecs-volume-plugin service and its children run outside Docker they are not subject to the same workaround and will attempt to use 127.0.0.53 as a name server. Ultimately, stunnel will fail to resolve the EFS endpoint because it is trying to use the DNS stub resolver from a network namespace.

$ journalctl -u amazon-ecs-volume-plugin
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: 2022/09/29 07:00:54 Entering go-plugins-helpers getPath
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: 2022/09/29 07:00:54 Entering go-plugins-helpers createPath
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: level=info time=2022-09-29T07:00:54Z msg="Creating new volume ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601"
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: level=info time=2022-09-29T07:00:54Z msg="Creating mount target for new volume ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601"
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: level=info time=2022-09-29T07:00:54Z msg="Validating create options for volume ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601"
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal amazon-ecs-volume-plugin[3393399]: level=info time=2022-09-29T07:00:54Z msg="Mounting volume ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601 of type efs at path /var/lib/ecs/volumes/ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601"
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: stunnel 5.58 on x86_64-koji-linux-gnu platform
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: Compiled with OpenSSL 3.0.0 7 sep 2021
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: Running  with OpenSSL 3.0.3 3 May 2022
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,OCSP,PSK,SNI
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: Reading configuration from file /run/efs/stunnel-config.fs-[efs_id].var.lib.ecs.volumes.ecs-[service_name]-12-[volume_name]-e49ee4deabc3f5939601.20240
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: UTF-8 byte order mark not detected
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: FIPS mode disabled
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[ui]: Configuration successful
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[0]: Service [efs] accepted connection from 127.0.0.1:40200
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG3[0]: Error resolving "fs-[efs_id].efs.us-west-2.amazonaws.com": Neither nodename nor servname known (EAI_NONAME)
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG3[0]: No remote host resolved
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[0]: Connection reset: 0 byte(s) sent to TLS, 0 byte(s) sent to socket
[..]
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[166]: Service [efs] accepted connection from 127.0.0.1:41742
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG3[166]: Error resolving "fs-[efs_id].efs.us-west-2.amazonaws.com": Neither nodename nor servname known (EAI_NONAME)
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG3[166]: No remote host resolved
Sep 29 07:00:54 ip-10-2-128-71.us-west-2.compute.internal stunnel[3404373]: LOG5[166]: Connection reset: 0 byte(s) sent to TLS, 0 byte(s) sent to socket
@RyanStan RyanStan added the bug label May 15, 2023
@jbarnett1981
Copy link

jbarnett1981 commented Oct 13, 2023

Experiencing the same issue and would love to see what the solution is here. And for reference we're seeing this on Amazon Linux 2023

@nthienan
Copy link

Experiencing the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants