Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retry on transient errors #53

Merged
merged 16 commits into from May 21, 2020

Conversation

stuartleeks
Copy link
Contributor

@stuartleeks stuartleeks commented May 14, 2020

As discussed in #33, there are cases where the API endpoint returns errors such as "Succeeded{"error_code":"INVALID_PARAMETER_VALUE","message":"Unknown worker environment WorkerEnvId(workerenv-4312344789891641)"}.

These errors have been observed shortly after a workspace has been created. Additionally, we have seen these errors after successful API calls, so simply waiting for a successful API response isn't sufficient.

The PR changes the service.DBApiClientConfig to use retryablehttp.Client instead of the standard http.Client. HTTP responses with a status code >= 400 and where the message contains any of the strings identified in #33 are treated as retryable errors.

We included logging, so if TF_LOG=debug is set then the log output will include Failed request detected: Retryable type found. Attempting retry... when a retryable error is encountered.

@lawrencegripper has run a number of test deployments where we have seen this output, demonstrating that the retry logic is kicking in.

lawrencegripper and others added 7 commits May 14, 2020 13:55
Co-authored-by: Lawrence Gripper <info@grippers.co.uk>
…ound.

Co-authored-by: Stuart Leeks <stuart@leeks.net>
Handle empty lines
Handle lines starting with
Handle final line not having a newline
Handle spaces in values
lawrencegripper and others added 7 commits May 20, 2020 06:48
Co-authored-by: Lawrence Gripper <info@grippers.co.uk>
…ound.

Co-authored-by: Stuart Leeks <stuart@leeks.net>
Handle empty lines
Handle lines starting with
Handle final line not having a newline
Handle spaces in values
@stuartleeks stuartleeks changed the title WIP Implement retry on transient errors Implement retry on transient errors May 20, 2020
@stuartleeks
Copy link
Contributor Author

stuartleeks commented May 20, 2020

@stikkireddy - have rebased this on master now that you have merged #51
Will take a look at the test failure later...

@codecov-commenter
Copy link

codecov-commenter commented May 20, 2020

Codecov Report

Merging #53 into master will decrease coverage by 0.17%.
The diff coverage is 76.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #53      +/-   ##
==========================================
- Coverage   49.10%   48.93%   -0.18%     
==========================================
  Files          44       44              
  Lines        5908     5902       -6     
==========================================
- Hits         2901     2888      -13     
- Misses       2959     2963       +4     
- Partials       48       51       +3     
Flag Coverage Δ
#client 89.42% <83.72%> (-0.35%) ⬇️
#provider 37.32% <0.00%> (-0.41%) ⬇️
Impacted Files Coverage Δ
databricks/azure_auth.go 0.00% <0.00%> (-21.28%) ⬇️
databricks/provider.go 59.60% <0.00%> (-0.40%) ⬇️
client/service/client.go 70.12% <83.72%> (+0.12%) ⬆️

@lawrencegripper
Copy link
Contributor

I've fixed up the build. I'd like to run another local test of the full integration test suite to check there isn't anything I've missed along the same lines.

@lawrencegripper
Copy link
Contributor

Looks like this is behaving nicely when testing with ./integration-environment-azure/run.sh acceptance tests for mounts.

Unrelated: Should I update make terraform-acc-azure to run this script too?

@stikkireddy
Copy link
Contributor

This looks good to me @lawrencegripper I will integrate this into make terraform-acc-azure as part of the travis-ci branch to be able to run these tests along with the aws during merge to master. But if you would like to make the changes in this PR please go right ahead. Just let me know if this is ready for me to test & merge.

@stikkireddy
Copy link
Contributor

NOTE: This PR is marked as WIP as it builds on some of the capabilities added in #55 so doesn't make sense to merge until #55 is merged.

I think thats the incorrect issue number/pr number please let me know if that is the case @stuartleeks. Issue #55 is a potential nil pointer panic due to not verifying if the pointer is not nill.

@stuartleeks
Copy link
Contributor Author

Oops! That's also not relevant now, so will just remove it!

Copy link
Contributor

@stikkireddy stikkireddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment/thought wanted your thoughts on removing a few additional lines in the azure_auth file.

databricks/azure_auth.go Show resolved Hide resolved
@stuartleeks
Copy link
Contributor Author

@stikkireddy re the travis integration, it looks like you have that underway in another PR? If so, would you be happy to merge this PR and pick up the integration in the other PR?

@stikkireddy
Copy link
Contributor

Awesome this looks great to me! Thanks for the contribution! Merging this in.

@stikkireddy stikkireddy merged commit c72fd7e into databricks:master May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants