libimage: pull: increase timeout running under systemd #1616

vrothberg · 2023-08-16T11:50:58Z

Set the EXTEND_TIMEOUT_USEC over DBUS when pulling an image from a registry and when running under systemd. This will prevent a frequent issue when running Quadlets and exceeding the default systemd start timeout of 90 seconds when pulling the image takes too long.

Fixes: containers/podman/issues/18353

Entirely untested but it may aid in deciding what to do with containers/podman/issues/18353.
@rhatdan @ygalblum WDYT?

ygalblum · 2023-08-16T12:22:00Z

Look like this will work. I do have two comments though:

Since the intention is to never let Systemd timeout during pull I think we can extend it to a larger number and have less loops.
Not sure I love the fact that the extend mechanism is not controlled by the caller at all. That is, the decision is made according to the existence of the environment variable without allowing the caller to do anything about it.

libimage/pull.go

vrothberg · 2023-08-16T13:01:09Z

Look like this will work. I do have two comments though:

1. Since the intention is to never let Systemd timeout during pull I think we can extend it to a larger number and have less loops.

What do you consider a reasonable number? 20 seconds?

2. Not sure I love the fact that the extend mechanism is not controlled by the caller at all. That is, the decision is made according to the existence of the environment variable without allowing the caller to do anything about it.

I was going back and forth on making it conditional. In a way it is already conditional via the env. It doesn't really make sense in podman system service but for that, the server should unset the SOCKET after using it.

libimage/pull.go

ygalblum · 2023-08-16T15:00:44Z

What do you consider a reasonable number? 20 seconds?

In the first PR, @rhatdan proposed 30 seconds. Maybe go with that

the server should unset the SOCKET after using it

Do we know that it does?

vrothberg · 2023-08-16T15:02:21Z

What do you consider a reasonable number? 20 seconds?

In the first PR, @rhatdan proposed 30 seconds. Maybe go with that

SGTM

the server should unset the SOCKET after using it

Do we know that it does?

With containers/podman#19644 it will. I was convinced it was doing it (but double checked).

ygalblum · 2023-08-16T15:12:39Z

With containers/podman#19644 it will. I was convinced it was doing it (but double checked).

Then I think we're good

rhatdan · 2023-08-16T18:42:06Z

LGTM

vrothberg · 2023-08-17T09:03:24Z

Now with 30 seconds. I'll keep it as a draft until I've figured out how to test it.

Luap99 · 2023-08-17T09:18:37Z

The one problem with this approach is that it never actually checks for actual data progress. This works fine in the good case but if we now run into a deadlock in the pull code this goroutine will extend the time-out forever and the systemd unit will never time out. I don't think it is likely to happen but certainly a thing to consider and why I think the correct thing to do is to check for actual pull progress and only send a new message if we know new progress has been made.

libimage/pull.go

vrothberg · 2023-08-17T09:30:20Z

[...] I think the correct thing to do is to check for actual pull progress and only send a new message if we know new progress has been made

Dan asked the same thing in #1616 (comment) but I don't feel very comfortable doing it. The standard lib will take care of throwing an error in case the downloads starve. There may be a bug in our stack for sure that may cause it to block. But that would also be true if we'd check for pull progress where we some bug may indicate progress indefinitely. Schema 1 images don't indicate the size of blobs.

ygalblum · 2023-08-17T12:56:12Z

The standard lib will take care of throwing an error in case the downloads starve

I agree with @vrothberg here with one caveat. Can a call to podman pull hang forever (assuming the user does not kill it)? If no, then we're good. If yes, maybe the fix should be there.

vrothberg · 2023-08-17T13:02:39Z

Can a call to podman pull hang forever (assuming the user does not kill it)?

In theory any call to podman can hang forever if there's a bug. My main point is that using pull progress as a line of defense isn't much more robust either since the standard lib already times out. So at the moment we rely on that. If we'd also rely on pull progress being made (ie., listen on the channel), I don't think we add much robustness.

Maybe we should limit the number of timeout extensions to 10 or something like that? So something super high that should largely be sufficient but deterministic. If users have gigantic images or so low bandwidth that 5 minutes + 90 seconds are not enough, then I'd argue that they need to tweak the timeout in their Quadlet files themselves.

Luap99 · 2023-08-17T13:05:23Z

Well I don't think there is any known deadlock in the pull code. This was more about being actually correct in all cases even when we introduce a deadlock in the future. Of course a normal podman pull would face the same thing.

I agree that this solution is simple enough and fixes a important problem that many users run into. My comments was only meant as maybe there is a even better way? If we agree that there is not without adding a lot of complexity and work I am favour of merging.

Luap99 · 2023-08-17T13:09:25Z

If users have gigantic images or so low bandwidth that 5 minutes + 90 seconds are not enough

I don't think this is how it works, form the sd_notify(3):

 EXTEND_TIMEOUT_USEC=...
 Tells the service manager to extend the startup, runtime or shutdown service timeout corresponding the current state. The value specified is a time in microseconds during which the service
 must send a new message. A service timeout will occur if the message isn't received, but only if the runtime of the current state is beyond the original maximum times of TimeoutStartSec=,
 RuntimeMaxSec=, and TimeoutStopSec=. See systemd.service(5) for effects on the service timeouts.

So it will time-out when it does not revive a new EXTEND_TIMEOUT_USEC message in the given time and it is over the normal default configured timeout. It does not add the time to the default value.

vrothberg · 2023-08-17T13:14:19Z

So it will time-out when it does not revive a new EXTEND_TIMEOUT_USEC message in the given time and it is over the normal default configured timeout. It does not add the time to the default value.

Thanks. Yes, once it exceeds beyond the specific timeout, the timeout will be ignored.

vrothberg · 2023-08-17T13:16:09Z

My comments was only meant as maybe there is a even better way?

Absolutely. I think it's more than fair to reflect on how to prevent a potential deadlock. So far, I didn't like the progress-check proposal for the reasons mentioned above.

What do you think about a fixed limit of how often Podman would increase the timeout? I'd guess it's fair to help in most cases but the extreme ones.

libimage/pull.go

Luap99 · 2023-08-17T13:28:47Z

I think having a limit of 5min sounds good to me. If someone pull super large images then they likely should pre populate them and not have their service start-up delayed so long and if they truly want that I agree they can still set the normal TimeoutStartSec= in the unit.

vrothberg · 2023-08-17T13:31:06Z

In that case, I think we can simplify the entire thing and try handling that in Quadlet?

If the default timeout is bigger than 5 minutes (not sure if/how Quadlet can access this value) or if a custom limit is set, Quadlet leaves things as they are. Otherwise, it'll add a TimeoutStartSec=5mins in the unit.

Luap99 · 2023-08-17T13:39:38Z

You do not need to access anything if the default is bigger then systemd will use the bigger value and not time-out, again the docs are very explicit about the behaviour:

but only if the runtime of the current state is beyond the original maximum times of TimeoutStartSec=,
RuntimeMaxSec=, and TimeoutStopSec=. See systemd.service(5) for effects on the service timeouts.

And setting a default TimeoutStartSec=5min is also not a good idea because that would effect all of podman and and there we definitely have more deadlocks. To me the extend feature is great because we only extend for things we know should take longer, i.e. pulls. There might be other special cases in the future were we want to extend it for example chown or selinux label setting on large volumes.

ygalblum · 2023-08-21T11:41:00Z

Now I'm a bit confused. @vrothberg do you want to go forward with it?

@Luap99 another advantage of this change is that it limits the time extension to the pull process while in the Quadlet solution the longer value will be set to subsequent runs as well (runs that will not require pulling the image).

Having said that, now I'm also thinking about the health checks and the new sd-notify option. IIUC, once the extend message is sent, the process is expected to send additional ones or ready within the extension period. Am I correct in my understanding? If yes, can this cause an issue?

vrothberg · 2023-08-21T11:41:17Z

And another qualdet problem is kube support, if I have a yaml with 10 images 5 mins may not be enough but this EXTEND_TIMEOUT_USEC patch here would be per image so it would give us much more time.

Very good point!

Of course this be seen as negative point as well.

Yes, I find it difficult to decide. There's pros and cons on both alternatives. I am OK to move forward with this idea. We can always change in the future. The important part is that we don't see a big blocker.

vrothberg · 2023-08-21T11:45:08Z

Now I'm a bit confused. @vrothberg do you want to go forward with it?

I am OK with moving forward. I pinged you and Dan to have some more thoughts as I find it hard to balance the alternatives.

Having said that, now I'm also thinking about the health checks and the new sd-notify option. IIUC, once the extend message is sent, the process is expected to send additional ones or ready within the extension period.

AFAIUI: once the default timeout has been exceeded, the EXTEND_TIMEOUT_USEC one extends the timeout for the specified time. On each send it will be extended by the specified value. If 10 seconds are left after a successful pull, then Podman has 10 seconds to finish the job until the timeout kicks in.

Am I correct in my understanding? If yes, can this cause an issue?

At the moment, I am not worried about health checks etc. If users expect a delayed start I think it's fair to make them responsible for taking care of the timeout. Probably something to highlight in the docs?

ygalblum · 2023-08-21T12:09:04Z

Podman has 10 seconds to finish the job until the timeout kicks in

If users expect a delayed start I think it's fair to make them responsible for taking care of the timeout. Probably something to highlight in the docs?

Combining these two comments, it means that the 10 seconds are not only for Podman, but also for the health check to pass. Now, if the checks take 15 seconds from the start of the container the service will fail. So, maybe the extension period should be longer that 30 seconds?

vrothberg · 2023-08-21T12:15:19Z

Now, if the checks take 15 seconds from the start of the container the service will fail. So, maybe the extension period should be longer that 30 seconds?

In that case, I would expect the --sdnotify=wait code to have a similar extension mechanism than we add here for pull. But I advocate for doing that outside this PR and always at the specific code sites.

libimage/pull.go

Set the `EXTEND_TIMEOUT_USEC` over DBUS when pulling an image from a registry and when running under systemd. This will prevent a frequent issue when running Quadlets and exceeding the default systemd start timeout of 90 seconds when pulling the image takes too long. Fixes: containers/podman/issues/18353 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

vrothberg · 2023-08-24T11:29:06Z

containers/podman#19644 s passing (minus the CI farts), so this PR is good to go from my end.

@ygalblum @Luap99 @rhatdan PTanotherL

Luap99

LGTM

rhatdan · 2023-08-24T13:57:48Z

/approve
/lgtm
Nice work.

openshift-ci · 2023-08-24T13:57:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, rhatdan, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,rhatdan,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mtrmac · 2023-08-24T14:29:11Z

The standard lib will take care of throwing an error in case the downloads starve.

It… will? In principle, a very slow HTTP response body is not unusual. I can’t see any default timeout on reading a HTTP body, neither when making progress nor when the body is idle (if there were one, I think we would already have received bug reports about layer downloads not fitting into the default value).

AFAICS Dialer.KeepAlive ensures the socket is live, but not that the download is making any progress. So checking that at least one byte has arrived in the last 25 seconds seems reasonable to me.

(Warning: I didn’t actually try any of this in practice, I have just briefly read the HTTP 1.x version of the net/http implementation and public API.)

vrothberg · 2023-08-24T14:44:58Z

AFAICS Dialer.KeepAlive ensures the socket is live, but not that the download is making any progress. So checking that at least one byte has arrived in the last 25 seconds seems reasonable to me.

TLS timeouts are common at least. I expected similar timeouts when the download stalls. Wouldn't it otherwise be just too easy to create a DOS server/registry?

mtrmac · 2023-08-24T14:46:54Z

It’s quite possible that the server has timeouts set up to protect itself.

I haven’t found any in the standard library (although I didn’t look that hard), and AFAIK c/image does nothing special other than respect the provided context.Context, if that is configured with a timeout. There is a separate timeout for setting up the connection, but not one for consuming the response body.

vrothberg · 2023-08-24T14:51:10Z

@mtrmac would it make sense to move such a monitoring logic into c/image/copy directly? I see benefit for all consumers and outside the context of running under systemd.

mtrmac · 2023-08-24T14:58:20Z

Compare https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/ .

I think adding a c/image option to limit… something (e.g. time the connection can be idle while downloading the body) could make sense; it would have more sense if we had a user with a specific use case.

But that would, without other work, do nothing to extend the systemd-imposed timeout.

mtrmac · 2023-08-24T14:59:27Z

(Note that it is already possible to set a global timeout. I think that one is easy for users to understand; once we start explaining network idle times and various connections states, we have probably lost a lot of users’ attention.)

openshift-ci bot added do-not-merge/work-in-progress approved labels Aug 16, 2023

rhatdan reviewed Aug 16, 2023

View reviewed changes