Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Reduce Latency to Create a new PVC when old PVC is removed during a manual rolling update #9732

Open
jonathan-innis opened this issue Feb 22, 2024 · 1 comment

Comments

@jonathan-innis
Copy link

Related problem

Kafka seems to be waiting until the end of its 5m timeout to validate that the pending pod that it tried to manually roll did not go ready. This introduces a lot of additional latency in the reconciliation when we know that the pod will never go Ready since it is referencing a PVC that doesn't exist. See logs from rolling one of these pods here

2024-02-15 06:36:30 INFO  AbstractOperator:265 - Reconciliation #123(watch) Kafka(kafka/cluster-13000): Kafka cluster-13000 will be checked for creation or modification
2024-02-15 06:36:30 WARN  AbstractOperator:557 - Reconciliation #102(timer) Kafka(kafka/cluster-13000): Failed to reconcile
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource cluster-13000-zookeeper-0 in namespace kafka to be ready
	at io.strimzi.operator.common.VertxUtil$1.lambda$handle$1(VertxUtil.java:126) ~[io.strimzi.operator-common-0.39.0.jar:0.39.0]
	at io.vertx.core.impl.future.FutureImpl$4.onFailure(FutureImpl.java:188) ~[io.vertx.vertx-core-4.5.0.jar:4.5.0]
	at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:75) ~[io.vertx.vertx-core-4.5.0.jar:4.5.0]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[io.netty.netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]
2024-02-15 06:36:30 WARN  ZookeeperLeaderFinder:248 - Reconciliation #123(watch) Kafka(kafka/cluster-13000): ZK cluster-13000-zookeeper-0.cluster-13000-zookeeper-nodes.kafka.svc.cluster.local:2181: failed to connect to zookeeper:

Suggested solution

Ideally, we could short-circuit this operation. Similar to initial pod startup, it would be nice if, when we manually rolled pods, that we recognized immediately that a pod that is referencing a PVC that doesn't exist is never going to go ready, bail out early and allow Strimzi to proceed to create the PVC for the new pod.

Theoretically, this wouldn't even have to occur in a separate loop. You could do this just by having the waiter that is waiting for the new pod to go ready recognize that the PVC that is being referenced doesn't exist on the cluster.

Alternatives

No response

Additional context

No response

@scholzj
Copy link
Member

scholzj commented Mar 7, 2024

Triaged on the community call on 7.3.2024: This would speed things up in certain edge situations. However, it would not be trivial to implement, so this should have a proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants