Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: Retry helper exits before callback finishes #529

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Aug 11, 2020

  1. resource: Retry helper exits before callback finishes

    On conditions where callback supplied to resource.Retry() takes a long
    time to execute, a timeout error is returned to caller while the
    callback is still running.
    
    To remind the reader, WaitForState creates a separate goroutine for
    calling Refresh() callback from StateChangeConf. The execution of
    Refresh() goroutine is monitored in the main thread with a timeout.
    
    However, while timeout does issue a cancellation for the Refresh()
    goroutine, it does not interrupt the already started Refresh() calls.
    In addition, cancellation mechanism itself gave up after grace period
    expired on timeouts. This potentially left the Refresh() callback
    running in the background while returning TimeoutError to the caller.
    
    When timeout error is returned, plugin(s) incorrectly assume that it is
    safe to rerun the Refresh() content. In reality, the state after
    timeout is completely unknown.
    
    Annotated example from AWS plugin:
    
        var createResp *iam.CreateRoleOutput
        err := resource.Retry(30*time.Second, func() *resource.RetryError {
            var err error
            createResp, err = iamconn.CreateRole(request)                <-- Has internally a retry loop, can block more then 30 seconds
            // IAM users (referenced in Principal field of assume policy)
            // can take ~30 seconds to propagate in AWS
            if isAWSErr(err, "MalformedPolicyDocument", "Invalid principal in policy") {
                return resource.RetryableError(err)
            }
            return resource.NonRetryableError(err)
        })
        if isResourceTimeoutError(err) {                                <-- Goroutine started in Retry (WaitForState) can still be running
            createResp, err = iamconn.CreateRole(request)               <-- Issues another blocking CreateRole
        }
    
    The fix is simply to remove the grace period and the matching select
    timeout. This makes WaitForState() block until Refresh() goroutine
    returns something to result channel.
    enool committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    9d9ae7c View commit details
    Browse the repository at this point in the history