Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine provider SDK strategy for minimizing API calls #7

Open
radeksimko opened this issue Jun 11, 2019 · 14 comments
Open

Determine provider SDK strategy for minimizing API calls #7

radeksimko opened this issue Jun 11, 2019 · 14 comments
Labels
enhancement New feature or request upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol

Comments

@radeksimko
Copy link
Member

radeksimko commented Jun 11, 2019

Data sources

Data sources currently can't take advantage of Etag-based (or really any other) caching. Whenever a data source is refreshed, the SDK does not read any existing state, so etag is ignored, even if it's stored there as a computed attribute.

Resources

While resources can technically be cached on Etag basis, I think we could do better and make it easier for arbitrary APIs which follow the convention and use the Etag header, perhaps through some helper functions that can be plugged in as HTTP transports or something like that.

To start with we could adopt https://github.com/terraform-providers/terraform-provider-github/blob/2eaf170dbd1efa403d3bce08ade826c827a4063b/github/transport.go#L21-L40 in some form to the SDK.

@radeksimko radeksimko added the enhancement New feature or request label Jun 11, 2019
@radeksimko radeksimko changed the title Allow etag-based caching for data sources Implement Etag-based caching for data sources and resources Oct 29, 2019
@radeksimko
Copy link
Member Author

May be related to #134

@radeksimko radeksimko added the upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol label Dec 8, 2019
@paultyng paultyng changed the title Implement Etag-based caching for data sources and resources Determine provider SDK strategy for minimizing API calls Jan 26, 2020
@paultyng
Copy link
Contributor

Merging all the caching, batching, etc. issues to one to think through at once.

@armsnyder
Copy link
Contributor

Hi, I'm coming here from #66, which was merged with this issue.

My problem is I am trying to write a new Terraform provider for an upstream API that ONLY supports batch API calls. I could code each resource to make a batch API call containing a single entry; however, this is not what my API is designed for. #blockchainreasons

I also wanted to highlight this comment from @apparentlymart which summarized the design challenges nicely: #66 (comment)

@bcsgh
Copy link

bcsgh commented Jan 27, 2021

Where do things actually get throttled? If the core isn't throttling (or can be told not to) and the plugin controls things, then a plugin should be able to merge requests on it's own without any further support. (If a request would block on capacity; queue it up. If there is already a batch-able request of a compatible type queued; merge with it.)

@paultyng
Copy link
Contributor

Parallelism defaults to 10 in the CLI, so a provider can get up to that number of requests in parallel (or whatever the user overrides it to be). There are no guarantees its the same "resource type" or anything though.

Some providers (I think Google cloud?) use some naive batching for these kinds of reasons that can be better taken advantage of by increasing that via CLI flag, but there is no definitive protocol method to achieve this (or a "commit" type step). Yet at least. And there is a possibility you could cause deadlocking or other problems if not careful in implementation.

@bcsgh
Copy link

bcsgh commented Jan 28, 2021

As long as the naive provider implementation doesn't create new deadlock risks when increasing that parallelism then that seems like not-the-SDK's-problem. Dealing with any potential for deadlocks sounds like something that providers will already need to worry about given any parallelism and they are also in a way better position to deal with than the CLI or SDK would be.

How hard would it be to allow providers to override that default? That by it self would open up the ability for providers to solve this for themselves and avoids the CLI/SDK needing to try and understand the domain specific details.

@jason-swissre
Copy link

jason-swissre commented May 18, 2021

My issue was also merged here. My feature request was that terraform keep a bit more knowledge about the resources so that it will know on a destroy that it doesn't have to destroy every individual resource. If, for example, you destroy a keyvault that will destroy all secrets, policies, etc. without having to delete them all individually as is done now. Likewise, if you destroy the resource group then everything in the resource group is destroyed.

This would radically improve the speed of the destroy command on Azure.

@jason-swissre
Copy link

This issue is mentioned in regards to batching but in the case of delete no batching is needed. Just some kind of meta system which detects what things must be explicitly deleted and which things will be deleted by something higher up scheduled for deletion.

@AlexHunterCodes
Copy link

AlexHunterCodes commented Jul 23, 2022

detects what things must be explicitly deleted and which things will be deleted by something higher up scheduled for deletion

This would be risky if you have any child resources deployed which the Terraform config or state doesn't know about, either because:

  1. They're automatically created by the cloud API, in which case you either don't care and probably do want them to be automatically destroyed, OR maybe you do want to keep them (VM disks, static IPs) but you don't know they exist or don't know they will be destroyed as opposed to simply detached/orphaned.
  2. They are deployed by a different Terraform state, perhaps by different team, or as part of cross-team/cross-environment shared infrastructure.

It would be great if Terraform could understand these relationships to reduce API calls and save time, but it would also create an opportunity to identify out-of-state resources and warn the user in the plan.

@jason-swissre
Copy link

But cases 1 and 2 are not a new risk because it is already how terraform behaves. For an example, if I make a keyvault with terraform and someone else makes secrets that I did not know about and is not in terraform then when I run terraform destroy it will destroy the secrets it knows about and then destroy the keyvault. Destroying the keyvault will destroy the secrets it was not aware of.

I do wish there were more options for out-of-state resources. At the moment terraform just fails or ignores them (depending on what it was doing). It would be nice to have options for things like "import if already exists".

@AlexHunterCodes
Copy link

They're not a new risk, but if this issue turns into some new Terraform feature, they are a new way to break your apply.

A lot of cloud services I work with won't let you delete parent resources at all if they still have children, but those providers have no way of knowing that in advance when they decide to cancel out a batch of child API calls in favour of one parent call.

Terraform would have to make it very clear when it is about to merge deletions, and let you configure that behaviour on a per-resource basis.

@jason-swissre
Copy link

There is already precedent for this. There is a flag for telling terraform not to remove a resource group if it's not empty.

@gk-drw
Copy link

gk-drw commented Feb 14, 2024

Why not focus on Batch Get requests? They are the ones likely to make the most difference (due to extra long refresh times when running terraform plan) and their dependency issues are the easiest to figure out - no deletion involved

@bcsgh
Copy link

bcsgh commented Feb 14, 2024

@gk-drw I second that point. I strongly suspect that API calls are very biased (10x or grater) in terms of read-only calls and also that for non-read-only calls the overhead (quota, throttling, wall-clock time, etc.) of non-batched call is likely much more negligible compared to the work that's done.

It's probably worth thinking out in generalities how RW operations would be batched to avoid SDK mismatches down the line, but the RO cases shouldn't block on the details of something that may never even be worth implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol
Projects
None yet
Development

No branches or pull requests

7 participants