New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #21 Fix client_secret special characters #27
Conversation
…ing azure go adal library to fetch management token and platform token. Updated integration test to use new env variables
…atabricks-terraform into issue-21 � Conflicts: � go.sum
databricks/azure_auth.go
Outdated
return dbClient, err | ||
} | ||
|
||
func retry(numAttempts int, sleep time.Duration, do func() error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retry should help fix #33 as we will attempt to retry and above we will be retrying for about 1 minute to attempt to list the node types. This should verify that the apis are ready for requests. This is a hack, but I am internally trying to identify the root cause of why the apis are not initialized properly.
After testing this a couple of times... this PR fixes #21 but I still get #33
|
I'd expect increasing the retry number to a larger count would resolve the issue in #33. Just running a quick test now to see how long it usually takes to resolve itself in previous hacky provider. Hmm looks like roughly 30 seconds. Looks like current code is only going to wait about 30secs so likely just a little too short on some occations assuming it's a bit variable due to how long some resource takes to be created.
|
…ry 10 seconds. This should resolve clusters api not starting properly after workspace creation. currently a hack for now till the near future.
@sdebruyn hey just made some changes to use a much longer time out of 30 minutes, can you please run again and verify that this works. Unfortunately have not gotten much better news, in regards to resolving the root cause of the apis not working correctly after workspace creation. At this point in time I dont know if the problem is caused by the databricks deployment itself or if the azurerm provider is not waiting long enough. And I cant seem to reproduce the issue any more with the following change made. I was able to run 5/5 times after destroying the environment without causing that issue. |
Codecov Report
@@ Coverage Diff @@
## master #27 +/- ##
==========================================
- Coverage 49.05% 48.85% -0.20%
==========================================
Files 44 44
Lines 5853 5872 +19
==========================================
- Hits 2871 2869 -2
- Misses 2931 2954 +23
+ Partials 51 49 -2
|
It worked on the first try now :) |
I don't know if it's related, but just a heads-up, since moving to this branch, I am getting the following errors when I try to create secret scopes:
I thought it was a temporary issue, but it's been there for a few days now. This does not happen with the code that is currently on master. |
@sdebruyn is this only right after you make a workspace or at all times? |
@stikkireddy at all times, even hours after creating the workspace and the cluster |
hmm let me look into that and see if I can recreate this scenario, @sdebruyn what region are you deploying into? |
@stikkireddy eastus2 |
Would it be possible to merge the auth fixes and continue to look at the root cause of #33 on a separate PR? This would be super useful to unblock us as the temporary fix, while not ideal, does work. Great to get a longer term fix in a some point in the future. |
… auth. Still using adal based auth but still generating databricks personal access token for SP for 1 hour and using that to provision resources. This is because AAD tokens have issues when dealing with secrets apis.
…der deployment much easier for the acceptance testing.
…igure the azure provider to only fetch sp credentials from env. This makes testing much easier.
…les & roles where azure does not have that capability. Also when creating a scim user, there is a weird behavior that makes tests fail where some times the api accidentally creates entitlements of "allow-cluster-create" even though entitlements are empty. Thus the update was added right after to create a put/update request.
…This should not be nil.
…rallel test because parallel tests were causing issues with consistent outcome.
… added terraform-acc-azure and aws targets. Added readme for integration testing
…fo to the make target's echo statement
Hey @sdebruyn please take a look at the new changes, I have reverted the AAD token to using the PAT token. I realized there are still some issues in using AAD tokens to use the secrets, scopes, scope acls apis. So I reverted it to just use the 1 hr PAT (the tokens within Databricks) tokens. I also fixed up the acceptance tests, and split the AWS and Azure tests. The acceptance tests were failing for users & groups due to roles not existing on azure (aws iam roles). This should be all fixed now. Please give this a try, the retry functionality in the azure auth and the revert to using PAT tokens should resolve both issue #21, #33 and also avoid the following for secrets: TL;DR: @lawrencegripper this should resolve the issues. At least after the changes I cannot seem to be able to reproduce the problem. |
Now all of the following succeeds:
But then it fails when creating the mounts. Logs at https://gist.github.com/sdebruyn/e86699686b120b444b2d0be4dd10c922 |
I think that this might not be related to this PR, but to my service principal instead. I'll further look into it in the coming hours. |
That seems to be a auth level issue you may need to make your SP a storage blob data owner. Hmm not sure what it is at the moment. But will merge this as it seems your mount is a auth issue (403) and @lawrencegripper is going to work on better logging for the mounts and help terraform propagate the error a bit better. |
correct, one of the permissions was missing :) |
client_secret had special characters which were not unescaped when creating the management token and platform token. This caused issues when trying to make an api call which caused the secret to be invalid. This solution swaps making the api calls manually to using the MSFT autorest adal library for authentication. This should resolve issue #21. I have ran the integration test as well as a test with a sp password of:
value = "!@##@%#$^$#&%^*^(^*(&()l}$GWolISz+kyXy"