Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idle GRPC connections in controller #117

Open
Madhu-1 opened this issue Feb 11, 2022 · 7 comments
Open

idle GRPC connections in controller #117

Madhu-1 opened this issue Feb 11, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@Madhu-1
Copy link
Member

Madhu-1 commented Feb 11, 2022

As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?

Advantages

  • Reuse of connection for faster communication
  • (anything else?)

Disadvantages

  • More number of idle connections in in-memory (if there are no addons operations)
  • More resources utilization in the controller pod
  • More network calls as connection keep-alive need to be checked in intervals to make sure the connection is not broken

I would like to hear thoughts from everyone on this one. cc @nixpanic @humblec @Rakshith-R @pkalever

@Madhu-1 Madhu-1 added the question Further information is requested label Feb 11, 2022
@Rakshith-R
Copy link
Member

As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?

Consider x number of csiaddons node object of the same csidriver,
Without connections being in memory.

For each operation:

  • The specific controller would have to list all the csiaddons node object
  • For each csiaddonsnode object:
    • Open a grpc connection with the pod
    • Fetch capability,
    • if it does not match, close it. pick the next one
    • if it does match, use it.

I think the above is enough reason to keep single pool of open connections which can be shared by all operations.

@Madhu-1
Copy link
Member Author

Madhu-1 commented Feb 14, 2022

As we already know currently, a csiaddons node object is created, we create the connections and keep it until the addons node object is deleted. there could be advantages/disadvantages of this one. As csiaddons is meant to be a generic component and it will be used by multiple csi drivers. Just for an example of 10 nodes cluster and 2 csidrivers are using the csiaddons. We have 20 or 2 connections (for both provisioner and node plugin sidecars are deployed) opened and kept in the in-memory, thinking about the scale what about the 100 nodes clusters or even more csidrivers in a cluster?

Consider x number of csiaddons node object of the same csidriver, Without connections being in memory.

For each operation:

  • The specific controller would have to list all the csiaddons node object

  • For each csiaddonsnode object:

    • Open a grpc connection with the pod
    • Fetch capability,
    • if it does not match, close it. pick the next one
    • if it does match, use it.

I think the above is enough reason to keep single pool of open connections which can be shared by all operations.

Keep the connection in memory and remove it after a certain interval of time. no need to keep it open always.

@Rakshith-R
Copy link
Member

Disadvantages

  • More number of idle connections in in-memory (if there are no addons operations)
  • More resources utilization in the controller pod
  • More network calls as connection keep-alive need to be checked in intervals to make sure the connection is not broken

Any other method will share these disadvantages too
I would prefer better to stick with the current implementation, unless we certainly have a better approach with complete design which covers all the requirements/perf metrics.
Otherwise, saving resources/performance in one step while introducing it another step does not make sense to me.

@Madhu-1
Copy link
Member Author

Madhu-1 commented Feb 14, 2022

Disadvantages

  • More number of idle connections in in-memory (if there are no addons operations)
  • More resources utilization in the controller pod
  • More network calls as connection keep-alive need to be checked in intervals to make sure the connection is not broken

Any other method will share these disadvantages too

what are other methods that will have these disadvantages?

I would prefer better to stick with the current implementation, unless we certainly have a better approach with complete design which covers all the requirements/perf metrics. Otherwise,

This issue is opened to check what is the problem with the current approach and how it can be better. I don't understand the reason for this one, why do you want to utilize resources when not needed?

unless we certainly have a better approach with complete design which covers all the requirements/perf metrics. Otherwise,

what do you mean by covering all the requirements/perf metrics? we don't need to scale test for larger numbers but we can check what can cause issues and how to fix them in long run.

saving resources/performance in one step while introducing it another step does not make sense to me.

can you please elaborate on this one?

@nixpanic
Copy link
Collaborator

I think it is fine to close idle connections. The CSIAddonsNode state that is kept in the controller (currently with connection alive) can be kept, and have an additional IsConnect kind of attribute. There is no real need to connect to all CSIAddonsNodes on startup of the controller, and only connect on-demand. A certain idle-timeout can be used to disconnect, and when needed the controller can call Connect() again.

This might improve stability in case of network issues too, where the connection between controller and CSIAddonsNode is interrupted for some unknown time.

@Rakshith-R
Copy link
Member

There is no real need to connect to all CSIAddonsNodes on startup of the controller, and only connect on-demand.

This is needed to fetch capabilities and store it.

I think it is fine to close idle connections. The CSIAddonsNode state that is kept in the controller (currently with connection alive) can be kept, and have an additional IsConnect kind of attribute.
A certain idle-timeout can be used to disconnect, and when needed the controller can call Connect() again.
This might improve stability in case of network issues too, where the connection between controller and CSIAddonsNode is interrupted for some unknown time.

I think this can be done.
Start a go routine to check lastUsedTime and close conection if it greater than x seconds(probably 5 mins= 300secs). Every time Get connection is called, check for connected State, if not connected, connect before returning it.
This should be completely internal to connection package.

@Madhu-1 wdyt ?

@Madhu-1
Copy link
Member Author

Madhu-1 commented Feb 15, 2022

Start a go routine to check lastUsedTime and close conection if it greater than x seconds(probably 5 mins= 300secs). Every time Get connection is called, check for connected State, if not connected, connect before returning it. This should be completely internal to connection package.

This looks okay 👍

@Madhu-1 Madhu-1 added enhancement New feature or request and removed question Further information is requested labels Feb 15, 2022
@Rakshith-R Rakshith-R self-assigned this Feb 15, 2022
nixpanic pushed a commit to nixpanic/kubernetes-csi-addons that referenced this issue Mar 4, 2024
Syncing latest changes from upstream main for kubernetes-csi-addons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants