Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: healer controller design proposal #94

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pkalever
Copy link

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

@pkalever
Copy link
Author

As discussed, had opened ceph/ceph-csi#2794 for the VOLUME_CONDITION capability.
Please comment below if we can split this into more granular parts.

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
Comment on lines 44 to 45
* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC.
* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sent a heal call to csidriver and let it decide whether node stage req is required or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks simple but, I think this needs more cycles for fetching secrets, formatting the staging path, and capturing/collecting more details needed to make the NodeStageCall(), for almost all the volume even if they are healthy.

sidecar is just a relay,
its better to have more api calls in single request flow from controller to sidecar to csidriver
than to have back and forth between sidecar and csi-driver while controller sits waiting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a bit debatable topic on its own.

  1. For restart of the nodeplugin case, it will mostly need NodeStageVolume() for 100% of PVs
  2. For periodic checks, I'm optimistic that most of the time the health of the volume is in a good state, only the corner cases are process restarts (considering userspace mounters). Hence I think sending NodeStageVolume() all the time is a bit costly.

So maybe a hybrid model is needed to satisfy and save some jiffies here, like

  • For restart case send a NodeStageVolume()
  • For periodic health checks scenarios, send a NodeGetVolumeStats() first and then send a NodeStageVolume() if the status of the PV is set to abnormal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI-Addons can not send CSI commands, those are reserved for the CO.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we can implement a function like HealVolume() that can be called from sidecar which does NodeGetVolumeStats() and Heal() in one go, as @Rakshith-R pointed above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function like HealVolume() would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.

NodeGetVolumeStats() is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

Copy link
Author

@pkalever pkalever Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function like HealVolume() would need implementation in the CSI-driver (called from the sidecar). CSI-drivers needs to handle healing differently, I expect.

Yes

NodeGetVolumeStats() is a CSI call, so can not be called from CSI-Addons operations. You need to design an operation that the CSI-Addons sidecar can call on the CSI-driver. The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

@nixpanic
oh! sorry if that confused you, I now understand why we do not want to use NodeGetVolumeStats() served by the CSI driver which was for CSI Spec. I mean NodeGetVolumeStats() RPC is also served by CSI driver for CSI-addons sidecar. Do we see any reason why we do not want to use the same RPC signature logically, but by a second server? do you think it just creates some additional confusion? if so we can just name it NodeGetVolumeStatus().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I will just have the new RPC naming it as NodeGetVolumeStatus() which is mostly a copy of NodeGetVolumeStats()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design for Heal() and a HealthCheck() function need to go to csi-addons/spec.

@nixpanic I expect I will have more clarity on Heal() and a HealthCheck() functions while I work on this RFE, so I hope to start on the draft implementation first and then send a PR to csi-addons/spec, only after the spec gets merged I will send the actual controller PR.

Hope that won't hurt.

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
Comment on lines 44 to 45
* The side-car works as relay, upon receiving a gRPC call from healer controller, the CSI-Addons side-car will then check volume's mounting conditions collected via the CSI NodeGetVolumeStats RPC.
* In case if any volume condition is reported abnormal, the CSI-Addons side-car will act on it, it does some internal operation like fetching secrets, constructing staging path and finally sends NodeStageVolume request to the CSI driver and finally returns back the response to the controller.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI-Addons can not send CSI commands, those are reserved for the CO.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

There is a difference between regular health checks, and an informational message "you restarted, and you should make sure the volume is still available". The last part is expected when a CSIAddonsNode is detected, the 1st is for https://github.com/kubernetes-csi/external-health-monitor/

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
@pkalever
Copy link
Author

We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.

I had already opened ceph/ceph-csi#2794, and will update the implementation details there soon.

I will also send the spec changes after some more clarity on the implementation side details(more on the internal structures). Maybe in parallel to the implementation PR.

This is just the overall design idea, for everyone's reference.

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented?
I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

Yes, definitely the CSI-Addons can implement its own RPC, just wanted to avoid another copy-paste.

@nixpanic
Copy link
Collaborator

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

@pkalever
Copy link
Author

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

Copy link
Member

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected that CSI-Driver will run two different servers one for CSI Spec and one for CSI-Addons spec.

I think, from CSIAddons sidecar perspective, we should only call CSI-Addons spec rpc calls and not mix both of them.

docs/design/proposals/healer-controller.md Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations
* Fetch all the VolumeAttachment` list
* Filter the `VolumeAttachment` list through matching driver name, node name and status attached
* For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send request irrespective of any particular PV config option ?
Or
let the driver include more information in its capabilities response, so this decision can be made with the help of the extra info provided by the driver itself ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is similar to the latter one, somehow get the choice like, what kind of mounter types need what services.
Example:

{
  rbd-nbd: healing
  krbd: monitoring
  xyz: disable
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to leave this more for an implementation-specific detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI-Addons capabilities from a CSI-driver should include if it supports healing a volume. If the CSI-driver does not need healing, it can omit the capability.
In case the CSI-driver needs to perform heal checks on a volume (depending on the volume options?), it can either return things like HEALTH_OK (to recheck later), HEALTH_UNSUPPORTED (no need to check, this volume does not support healing), HEALTH_BAD (needs healing) and so on. But this is a discussion for csi-addons/spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI-Addons should not need to know about different mounter types, that would require extending the list of types and the implementation for each CSI-driver. Better have a generic approach that works for all CSI-drivers, so that any CSI-driver can opt-in without changing CSI-Addons itself.

Copy link
Author

@pkalever pkalever Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable.

HEALTH_OK, HEALTH_UNSUPPORTED, HEALTH_BAD are some kind of volume attributes that CSI driver sets on the persistent volume?

In my previous comment, I mean something like:

  • capability response returns some string that holds
{
  Mounter1: NEED_HEALING
  Mounter2: NEED_MONITORING
  Mounter3: UNKNOWN
}
  • If the PV attribute mounter matches with Mounter1, attempt to heal it.
  • If it is of type Mounter2 only check the health,
  • If it is of type Mounter3 do not attempt anything from the healer controller.

isn't it generic enough? The CSI-addons controller doesn't have to know what is Mounter1, it just reads the capability options and matches with the PV attributes, we don't hardcode the mounter types in the addons.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.

I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
@pkalever
Copy link
Author

pkalever commented Feb 2, 2022

@Madhu-1 @nixpanic @Rakshith-R Can we finalize this, please? Let me know if we are missing any major calrifications in the design. If not, I hope we can discuss and address any nitty-gritty details in the implementation PR.

Thanks!

@nixpanic
Copy link
Collaborator

nixpanic commented Feb 2, 2022

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

Yes. CSI-Addons should not mix calling CSI operations. A CSI-driver will need to provide dedicated CSI-Addons operations that can be called instead. The CSI-driver is then in charge of making sure no conflicting calls from the CO are being processed.

Copy link
Member

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more question and I think we do without gRPC description in the doc ?
Everything else look good to me.

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
Rakshith-R
Rakshith-R previously approved these changes Feb 2, 2022
docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
* Log the final response returned from the side-car

## CSI-Addons side-car
* When a nodeplugin pod is restarted the side-car will create [CSIAddonsNode](https://github.com/csi-addons/kubernetes-csi-addons/blob/main/api/v1alpha1/csiaddonsnode_types.go) object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will create or update the object as per

if err != nil && !apierrors.IsAlreadyExists(err) {
return fmt.Errorf("failed to create csiaddonsnode object: %w", err)
}
?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced with a comment by Niels earlier:

I think you are concerned that the CSI-driver container in the pod restarted, and not the pod was re-created.
Only in that case, the CSIAddonsNode object would exist. It is not clear to me if this is something that can happen in reality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the objects already exist there should be an update operation to let the controller know that sidecar is restarted. am missing this point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1
As far as I understand from Niels previous comments, the object will be existing only when the container is restarted and it will be absent when the whole pod is restarted. If that is the case, when do we expect the container to restart, practically? Hence is it worth mentioning it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct i don't have a practical answer for this one either. i was thinking object update might help which is not present currently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take it later? maybe based on future discussions?

docs/design/proposals/healer-controller.md Outdated Show resolved Hide resolved
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Copy link
Collaborator

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see this in combination with a proposal in csi-addons/spec, with that it is much clearer how an interface between CSI-Addons and the CSI-driver looks like.

* Watch for the `CSIAddonsNode` object, if the object is created/modified by the side-car do below operations
* Fetch all the VolumeAttachment` list
* Filter the `VolumeAttachment` list through matching driver name, node name and status attached
* For each `VolumeAttachment` get the respective PersistentVolume information and check the criteria of PersistentVolume Bound (also we can have configuration options like mounter type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactions between CSI-Addons and the CSI-driver need to be documented in a csi-addons/spec. The protocol then describes what a CSI-driver needs to implement for supporting the healer operations. This will probably include a description for capabilities of the CSI-driver. Certain operations on a volume can return attributes that healing is not supported for the volume, but other volumes from the same driver might support more features.

I do not think CSI-Addons should know about internals of the CSI-driver, so mounters would not be needed,

nixpanic pushed a commit to nixpanic/kubernetes-csi-addons that referenced this pull request Jan 12, 2024
Syncing latest changes from main for kubernetes-csi-addons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants