Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout #4115

Closed
wants to merge 1 commit into from
Closed

Timeout #4115

wants to merge 1 commit into from

Conversation

drewwells
Copy link

relates #4102

Description of the change:
Adds a timeout flag to helm operator's manager. This instructs the helm action to timeout after a set duration, default is set to 5m just like helm cli.

I updated the context passed down from reconcile. The packages being called ignore their context, so it's not currently useful for timing out those packages.

Motivation for the change:
Helm by default implements a timeout. This timeout is used when using the --wait command to have helm give up after attempting an action for a period of time. The current operator does not implement any timeout and instead returns immediately after any action.

Checklist

If the pull request includes user-facing changes, extra documentation is required:

@joelanford
Copy link
Member

The current operator does not implement any timeout and instead returns immediately after any action.

This is by design based on the discussion in #4102. Though as I've researched this, I realize that we're using the default .Timeout (0s) for install/upgrade/uninstall. This translates to infinite waits when the timeout is actually used, which is to wait for hook jobs and pods to be ready. It is also used to wait for release resources to be ready when .Wait is true, but we currently never set .Wait to true.

The packages being called ignore their context, so it's not currently useful for timing out those packages.

Since Helm's kube.Interface client interface doesn't support contexts, I don't think we should use anything other than context.TODO() for now. Using contexts with timeouts in the reconciler may confuse users and contributors, who will get the impression that the underlying manager calls will be cancelled by those timeouts. Based on comments in helm/helm#7958, it seems unlikely that this will change before Helm v4 is released.

The fact that we're using infinite waits and context.TODO() means that many client calls cannot be cancelled, which is not ideal. But since the Helm SDK doesn't honor or use contexts, there isn't a whole lot we can do. The one clear place that can be improved is the .Timeout fields used in hooks and waits.

Would you be okay with holding on this PR and opening an issue about all the cancellation concerns in the helm operator reconciler. I think given the direction of #4102, the cancellation concerns I can think of are:

  1. waiting for hook jobs to complete and hook pods to be ready (we currently don't timeout)
  2. cancelling helm client calls (seems unlikely before helm v4)
  3. any other cancellable operations outside the scope of the helm action package
  4. if any changes will be necessary to handle the fact that controller-runtime v0.7.0 will pass a context.Context in the Reconcile method signature.

@drewwells
Copy link
Author

Since wait is not currently supported, it’s fairly harmless to allow timeout to be specified despite it doing nothing. It can be added in helm’s half implemented way just as well here as in the binary.

I’m fine with reverting contexts back to TODO(). I think either approach TODO() or WithTimeout() is confusing as the context are not used.

@joelanford
Copy link
Member

Since wait is not currently supported, it’s fairly harmless to allow timeout to be specified despite it doing nothing. It can be added in helm’s half implemented way just as well here as in the binary.

Perhaps I'm missing something, but what's the value in adding a configurable timeout if it does nothing? That would lead to confusion and users rightfully opening issues asking why setting the timeout doesn't work.

I think either approach TODO() or WithTimeout() is confusing as the context are not used.

One of the stated purposes of context.TODO() is to signal that the use of a context is not yet available or when the surrounding function doesn't support it. Both of these cases apply to the helm operator's Reconcile function.

The SDK release.Manager interface has existed since helm v2 when the tiller libraries we used did accept a context. When we made the switch to Helm v3, we lost the ability to pass contexts but did not want to break the Manager interface by removing contexts from its methods.

@camilamacedo86
Copy link
Contributor

HI @drewwells,

Thank you for your contribution. 🥇

Since wait is not currently supported, it’s fairly harmless to allow timeout to be specified despite it doing nothing.

I agree with @joelanford position as well That would lead to confusion and users rightfully opening issues asking why setting the timeout doesn't work.

I do not think that we should add because any additional should address a need and adds real value to the project. Otherwise, it only brings confusion for end-users and maintainers as well. I mean, reduce the maintainability of the project.

In this, I hope that would be fine we close this one and we open an issue to address it as suggested by @joelanford on he comment:

opening an issue about all the cancellation concerns in the helm operator reconciler. I think given the direction of #4102, the cancellation concerns I can think of are:

  • waiting for hook jobs to complete and hook pods to be ready (we currently don't timeout)
  • cancelling helm client calls (seems unlikely before helm v4)
  • any other cancellable operations outside the scope of the helm action package
  • if any changes will be necessary to handle the fact that controller-runtime v0.7.0 will pass a context.Context in the Reconcile method signature.

WDYT? Could you please open this issue?

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Oct 30, 2020

I hope that is fine we close this one based on the previous comments see; #4115 (comment) and #4115 (comment). however, please feel free to ping us if you think that it should be re-opened and would be very nice to get your help to track the issue.

@drewwells drewwells deleted the timeout branch October 31, 2020 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants