Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option (timeout) to switch from drain to drain with disable-eviction #303

Open
damoon opened this issue Feb 9, 2021 · 11 comments
Open
Labels
help wanted keep This won't be closed by the stale bot.

Comments

@damoon
Copy link
Contributor

damoon commented Feb 9, 2021

Details for the disable-eviction option
kubernetes/kubernetes#83307
kubernetes/kubernetes#85571

LD/DR
Allows to drain pod from a node while ignoring PodDisruptionBudgets.
Included in k8s 1.18.

Use case:
We run Kubernetes for CI. A lot of test environments contain broken containers. It is almost impossible to drain pods from a node respecting the pdb. For each node we need manual intervention and delete a few "bad" pods by hand.
I would like to have the option to say: After e.g. 1 hour of draining without success, drain the pods ignoring the PodDisruptionBudgets.
This would allow me to reboot nodes without supervision and free up a lot of time.

@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@evrardjp
Copy link
Collaborator

Let's keep this opened

@evrardjp
Copy link
Collaborator

It's a good feature, I think we probably want to expose this. Should we target this for 1.8.0? Will you tackle it @damoon ? That would be awesome.

@evrardjp evrardjp mentioned this issue Apr 14, 2021
@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@dholbach dholbach added the keep This won't be closed by the stale bot. label Jun 22, 2021
@atighineanu
Copy link
Contributor

I might have a glance at this feature. Testing/reproducing such an environment, however, might take longer than writing the feature itself :)

@ffilippopoulos
Copy link

The ability to drain nodes without respecting PDBs is an important feature for us as well. Currently the only way to really use kured in dev environments is by using --force-reboot flag, which is not ideal. Ideally, after the drain timeout is reached, we could try to drain with disable-eviction and reboot without releasing the lock at all.

@ckotzbauer
Copy link
Member

We would be happy to receive a PR for this. It seems to be a useful feature 👍

@ffilippopoulos
Copy link

reading a bit more regarding this, I think that kube's graceful shutdown: https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown should cover us in case we use --force-reboot and delete any pods stuck in the node before attempting a reboot. We'll try this for a while in our lower envs and if we see any issue I am happy to PR an optional disable evictions flag for the drainer.

@fa-at-pulsit
Copy link

Maybe it is better instead to implement the following logic, like in this article:

  1. collect all PDBs in the Cluster
kubectl get pdb -A -o json | jq -r '.items[] | select(.spec.minAvailable >= 1)  | [.metadata.name , .metadata.namespace, .spec.minAvailable] | @tsv' >> pdb_namespace_originalMinimalAvailable_list.txt
  1. patch the existing PDBs to have minAvailable=0
while read p; do \
PDB=$(echo $p | awk '{print $1}') ; \
echo PDB: ${PDB} ; \
NS=$(echo $p | awk '{print $2}') ; \
echo NS: ${NS} ; \
kubectl patch pdb $PDB -n $NS --type=merge -p '{"spec":{"minAvailable":0}}' ; \
done <  pdb_namespace_originalMinimalAvailable_list.txt
  1. drain & reboot
  2. restore original minAvailable setting
while read p; do \
PDB=$(echo $p | awk '{print $1}') ; \
echo PDB: ${PDB} ; \
NS=$(echo $p | awk '{print $2}') ; \
echo NS: ${NS} ; \
Original_Min_Available=$(echo $p | awk '{print $3}') ; \
echo  Original_Min_Available: ${Original_Min_Available} ; \
command=`echo "kubectl patch pdb ${PDB} -n ${NS} --type=merge -p 
'{\"spec\":{\"minAvailable\":${Original_Min_Available}}}'"` ;\
echo ${command} ;\
eval ${command} ;\
done < pdb_namespace_originalMinimalAvailable_list.txt

@ckotzbauer
Copy link
Member

@fa-at-pulsit I strongly disagree to implement such a logic in kured. This would mean that kured would change the settings of all PDBs in a way, that they loose there effect. This is the same behaviour as just deleting them (and restore them afterwards). This raises the question of why the cluster operator introduced PDBs in the first place then. Kured should not touch PDBs at all.

I'm 👍 to implement a timeout-solution as @damoon initially suggested, but all applications must have the possibility of a graceful shutdown.

@fa-at-pulsit
Copy link

@ckotzbauer, yes, you are right.... from this side, timeout-solution will be better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted keep This won't be closed by the stale bot.
Projects
None yet
Development

No branches or pull requests

7 participants