Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Provide publish context during ControllerUnpublishVolume #496

Open
sumukhs opened this issue Nov 5, 2021 · 15 comments
Open

Proposal: Provide publish context during ControllerUnpublishVolume #496

sumukhs opened this issue Nov 5, 2021 · 15 comments

Comments

@sumukhs
Copy link

sumukhs commented Nov 5, 2021

Problem Overview:

ControllerPublishVolume has a contract to return FAILED_PRECONDITION with the status indicating the node id that the volume is currently published on.
This helps detecting stale ControllerPublishVolume API calls by the CO to the SP controller and take appropriate action.

However, during the controller unpublish, consider the following ordering of events:

  1. CO issues a ControllerUnpublishVolume(node1) to the Plugin.

  2. The above TCP message is delayed indefinitely over the network.

  3. Node on which CO is running experiences a network partition and is not part of a federated cluster anymore. As a result, the CO is now running in a different node that decides to place it on node2. NOTE that the old TCP connection from the CO is still alive to the Plugin.

  4. CO issues ControllerPublishVolume(node2) to the Plugin.

  5. CO then decides to place the volume on node1 again hence issues ControllerUnpublishVolume(node2) subsequently doing a ControllerPublishVolume(node1).

  6. Finally, the delayed TCP message from 2 arrives indicating a ControllerUnpublishVolume(node1)

Expectation:

The Plugin must be able to identify the stale API call that comes in step 6 and ignore it.

Proposal:

If the ControllerUnpublishVolume includes the original publish context that was returned as part of the ControllerPublishVolume, the Plugin/SP can sequence these and detect staless.

@jdef
Copy link
Member

jdef commented Nov 5, 2021 via email

@sumukhs
Copy link
Author

sumukhs commented Nov 5, 2021

Isn't the co storing/plumbing around the volume id and volume context already?

I didn't get how that is different from the publish context.

If the co is brain dead and loses all its context, how would it be able to controller unpublish anyway?

@jdef
Copy link
Member

jdef commented Nov 5, 2021 via email

@sumukhs
Copy link
Author

sumukhs commented Nov 5, 2021

Isn't the assumption that the volume_id is also lost though?

CO would still have to go through the flow of all the idempotent calls like CreateVolume (which gives back the volume_id), then ControllerPublishVolume(which being idempotent can give back the publish context again) and finally invoke the ControllerUnpublishVolume(publish_context)?

@sumukhs
Copy link
Author

sumukhs commented Nov 5, 2021

Unless you are referring to manual intervention by a human who is trying to patch this up and wants to unpublish the volume - I see your point then.

@jdef
Copy link
Member

jdef commented Nov 5, 2021 via email

@sumukhs
Copy link
Author

sumukhs commented Nov 5, 2021

That makes sense. Any suggestions on how this can be tackled or pointers to other approaches are appreciated!

Going back to your original response, what did you imply by "CO must do more". A CO doing anything more wouldn't be csi conformant and hence lose the flexibility of plugging in different SP's right?

@jdef
Copy link
Member

jdef commented Nov 5, 2021 via email

@sumukhs
Copy link
Author

sumukhs commented Nov 6, 2021

Yes, for the apis in csi that I cared about (create,controller publish/unpublish, node stage/unstage, node publish/unpublish, delete), it seems like we can deal with a network partition for all the cases except the controller unpublish and hence the question here.

I can take a look at the other proposal to see how the co could do more to handle this situation - can you point me to the issue number?

The way I think of a co, in spite of all its complexity and how many ever intermediate states it persists, a network partition causing a stale api call to arrive at a plugin during unpublish will not be distinguished because the spec only allows for a volume id.

I can think of passing in an optional "etag" that if absent will be treated as the safety net/human intervention where the call will be honored and the unpublish will be accepted.

However, if the value is set to something (which will be the case in the stale api call), it is ignored due to the etag checks. It is similar to database optimistic concurrency. Would this address the drawback of the principles of csi you pointed out in my original proposal?

@sumukhs
Copy link
Author

sumukhs commented Nov 9, 2021

@jdef - gentle ping!

@jdef
Copy link
Member

jdef commented Nov 9, 2021 via email

@sumukhs
Copy link
Author

sumukhs commented Nov 9, 2021

I can think of passing in an optional "etag" that if absent will be treated as the safety net/human intervention where the call will be honored and the unpublish will be accepted.

However, if the value is set to something (which will be the case in the stale api call), it is ignored due to the etag checks. It is similar to database optimistic concurrency. Would this address the drawback of the principles of csi you pointed out in my original proposal?

Wondering what your take on this approach is:

@jdef
Copy link
Member

jdef commented Nov 9, 2021

I think I misread the original proposal, mistakenly confusing volume-context for publish-context. That said...

"etag" sounds like additional state that:
(a) CO would need to track/remember as a result of controller-publish-volume, so that...
(b) CO would present to the controller-unpublish-volume call

"etag" seems like "special publish volume context". Let's avoid special case things like this.

So let's say that publish-context was added as an always-OPTIONAL field for controller-unpublish-volume:

message ControllerUnpublishVolumeRequest {
...
  map<string,string> publish_context = x; // OPTIONAL, always
}

... and the new RPC semantics become ...

when publish-context is empty:
(a) because the plugin never provided one as a result of controller-publish-volume; no change from today's behavior
(b) because the CO forgot the old value; no change from today's behavior
(c) because the CO implements an older version of CSI, it is not aware of publish-context for this unpublish call; no change in today's behavior

when publish-context is non-empty:
(d) the CO remembered it from the publish RPC, and ...
(d.1) it matches what the plugin expects; no change from today's behavior
(d.2) it does NOT match what the plugin expects; ** new behavior: RPC results in FAILED_PRECONDITION **
(e) the plugin is mid-upgrade, the older plugin version doesn't process the non-empty publish-context (because it implements an older CSI spec); no change from today's behavior

At first glance, this seems to be backwards compatible and the new behavior is completely opt-in from a plugin perspective.

I'm curious - did you actually run into this situation in a live cluster, or this proposal born of hypotheticals?

@sumukhs
Copy link
Author

sumukhs commented Nov 9, 2021

At first glance, this seems to be backwards compatible and the new behavior is completely opt-in from a plugin perspective.

Yes, given its optional behavior it would be a non-breaking change for existing plugin's/CO's. Is the expectation for the feature proposer to send out a PR/do one of the maintainers would take care of incorporating this in the next release?

I'm curious - did you actually run into this situation in a live cluster, or this proposal born of hypotheticals?

We did not hit this in a live cluster but we have a mock CO and CSI simulator that exercised this code path and ran into a coding assert.

@jdef
Copy link
Member

jdef commented Nov 9, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants