Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add doc to recover from pod from lost node #7282

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions Documentation/ceph-csi-troubleshooting.md
Expand Up @@ -441,3 +441,51 @@ $ rbd ls --id=csi-rbd-node -m=10.111.136.166:6789 --key=AQDpIQhg+v83EhAAgLboWIbl
```

Where `-m` is one of the mon endpoints and the `--key` is the key used by the CSI driver for accessing the Ceph cluster.

## Node Loss

When a node is lost, you will see application pods on the node stuck in the `Terminating` state while another pod is rescheduled and is in the `ContainerCreating` state.

To allow the application pod to start on another node:

### Force delete your application pod. After some timeout (8-10 minutes), your pod will be rescheduled
subhamkrai marked this conversation as resolved.
Show resolved Hide resolved
subhamkrai marked this conversation as resolved.
Show resolved Hide resolved

To force delete the pod stuck in the `Terminating` state:

```console
$ kubectl -n rook-ceph delete pod my-app-69cd495f9b-nl6hf --grace-period 0 --force
```

**And** then wait for timeout (8-10 minutes). If the pod still not in running state in thatcase , we need to wait more or go and blacklist the watcher(mentioned below).

### Shorten the timeout

To shorten the timeout, you can mark the node as "blacklisted" so Rook can safely failover the pod sooner.

```console
$ PVC_NAME= # enter pvc name
$ IMAGE=$(kubectl get pv $(kubectl get pv | grep $PVC_NAME | awk '{ print $1 }') -o jsonpath='{.spec.csi.volumeHandle}' | cut -d '-' -f 6- | awk '{print "csi-vol-"$1}') # enter the pvc name
$ echo $IMAGE
```

The solution is to remove the watcher, following the commands below from the [Rook toolbox](ceph-toolbox.md):

```console
rbd status <image> --pool=replicapool # get image from above output
```
>```
> Watchers:
> watcher=10.130.2.1:0/2076971174 client.14206 cookie=18446462598732840961
>```

```console
$ ceph osd blacklist add 10.130.2.1:0/207697117 # to know which watcher to block see above output
blacklisting 10.130.2.1:0/2076971174 until 2021-02-21T18:12:53.115328+0000 (3600 sec)
```

After running the above command within a few minutes the pod will be running. **But don't forget to remove the above blacklist watcher**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point should they remove the watcher? When the node is confirmed dead? What if they don't remove the watcher? Does it never get removed automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node is dead the watcher gets automatically removed. If the control plane is dead the application might be running, in that case, the watcher never gets removed then we need to blacklist it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we shouldn't tell them to un-blacklist, right? Or else we should be clear when to un-blacklist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when the control plane is dead in this case only we need un-blacklist because in this only we are blocking watcher, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from this comment, the rbd status may not be reliable to show a watcher, or at least it doesn't guarantee the node is dead. So it seems like we need to recommend they blacklist a whole node until they can guarantee that node is dead. Otherwise, they risk corruption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a node might be disconnected from the kubernetes cluster. we can blacklist the node IP in the ceph cluster.


```console
$ ceph osd blacklist rm 10.130.2.1:0/2076971174
un-blacklisting 10.130.2.1:0/2076971174
```