Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intermittent- RHEL cluster] - Knative activator pod is restarting continuously from crashloop back of with Liveness and Readiness probe failure #15171

Open
Subhankar-Adak opened this issue Apr 30, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Subhankar-Adak
Copy link

Subhankar-Adak commented Apr 30, 2024

What version of Knative?

V1.11.0

0.11.x

Output of git describe --dirty

Expected Behavior

As part of the Kserve deployment, we are deploying Istio, Cert Manager, and Knative as dependencies. Intermittently, we are facing an issue in the Knative deployment step where the Knative activator pod is not running properly and goes into crashloop backoff regularly. Other pods in the Knative namespace are running properly.

Versions of Dependencies:

  • Istio: 1.17.0
  • Certificate Manager: 1.13.0
  • Knative: 1.11.0

Environment Details:

  • Kubernetes Version: v1.26.12 (deployed via Kubespray on bare metal)
  • Operating System: RHEL 8.8 (issue mostly observed here, not on Ubuntu clusters)
  • Verification: Before deploying, we ensured that all pods from Kubernetes were running successfully.

Activator Pod description Log:

Knative activator:

  Exit Code:    0
  Started:      Thu, 25 Apr 2024 12:43:47 +0000
  Finished:     Thu, 25 Apr 2024 12:46:50 +0000
Ready:          False
Restart Count:  4
Limits:
  cpu:     1
  memory:  600Mi
Requests:
  cpu:      300m
  memory:   60Mi
Liveness:   http-get http://:8012/ delay=15s timeout=1s period=10s #success=1 #failure=12
Readiness:  http-get http://:8012/ delay=0s timeout=1s period=5s #success=1 #failure=5
Environment:
  GOGC:                       500
  POD_NAME:                   activator-59dff6d45c-wqt8w (v1:metadata.name)
  POD_IP:                      (v1:status.podIP)
  SYSTEM_NAMESPACE:           knative-serving (v1:metadata.namespace)
  CONFIG_LOGGING_NAME:        config-logging
  CONFIG_OBSERVABILITY_NAME:  config-observability
  METRICS_DOMAIN:             knative.dev/internal/serving
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8nkbz (ro)

Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-8nkbz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 12m default-scheduler Successfully assigned knative-serving/activator-59dff6d45c-wqt8w to v16regressionnode00002
Normal Pulled 12m kubelet Container image "gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:6b98eed95dd6dcc3d957e673aea3d271b768225442504316d713c08524f44ebe" already present on machine
Normal Created 12m kubelet Created container activator
Normal Started 12m kubelet Started container activator
Warning Unhealthy 11m (x5 over 12m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 2m20s (x137 over 12m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

Activator pod logs:

[root@v16regressionnode00003 ~]# kubectl logs activator-7bcc758ddd-wk7cd -n knative-serving
2024/04/25 11:22:05 Registering 2 clients
2024/04/25 11:22:05 Registering 3 informer factories
2024/04/25 11:22:05 Registering 4 informers
{"severity":"INFO","timestamp":"2024-04-25T11:22:05.716400581Z","logger":"activator","caller":"activator/main.go:140","message":"Starting the knative activator","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"INFO","timestamp":"2024-04-25T11:22:05.718542578Z","logger":"activator","caller":"activator/main.go:200","message":"Connecting to Autoscaler at ws://autoscaler.knative-serving.svc.cluster.local:8080","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"INFO","timestamp":"2024-04-25T11:22:05.718768882Z","logger":"activator","caller":"websocket/connection.go:161","message":"Connecting to ws://autoscaler.knative-serving.svc.cluster.local:8080","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"INFO","timestamp":"2024-04-25T11:22:05.719123778Z","logger":"activator","caller":"profiling/server.go:65","message":"Profiling enabled: false","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"INFO","timestamp":"2024-04-25T11:22:05.7237912Z","logger":"activator","caller":"activator/request_log.go:45","message":"Updated the request log template.","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd","template":""}
{"severity":"WARNING","timestamp":"2024-04-25T11:22:06.685484891Z","logger":"activator","caller":"handler/healthz_handler.go:36","message":"Healthcheck failed: connection has not yet been established","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"WARNING","timestamp":"2024-04-25T11:22:07.686801714Z","logger":"activator","caller":"handler/healthz_handler.go:36","message":"Healthcheck failed: connection has not yet been established","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd"}
{"severity":"ERROR","timestamp":"2024-04-25T11:22:08.719008181Z","logger":"activator","caller":"websocket/connection.go:144","message":"Websocket connection could not be established","commit":"f1617ef","knative.dev/controller":"activator","knative.dev/pod":"activator-7bcc758ddd-wk7cd","error":"dial tcp: lookup autoscaler.knative-serving.svc.cluster.local: i/o timeout","stacktrace":"knative.dev/pkg/websocket.NewDurableConnection.func1\n\tknative.dev/pkg@v0.0.0-20230718152110-aef227e72ead/websocket/connection.go:144\nknative.dev/pkg/websocket.(*ManagedConnection).connect.func1\n\tknative.dev/pkg@v0.0.0-20230718152110-aef227e72ead/websocket/connection.go:225\nk8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1\n\tk8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:222\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\tk8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:235\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\tk8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:228\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\tk8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:423\nknative.dev/pkg/websocket.(*ManagedConnection).connect\n\tknative.dev/pkg@v0.0.0-20230718152110-aef227e72ead/websocket/connection.go:222\nknative.dev/pkg/websocket.NewDurableConnection.func2\n\tknative.dev/pkg@v0.0.0-20230718152110-aef227e72ead/websocket/connection.go:162"}

Actual Behavior

Steps to Reproduce the Problem

  1. Deploy Kubernetes v1.26.12 using Kubespray on RHEL 8.8 cluster.
  2. Deploy Istio v1.17.0.
  3. Deploy Cert Manager v1.13.
  4. Deploy Knative using the following steps:
@Subhankar-Adak Subhankar-Adak added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2024
@skonto
Copy link
Contributor

skonto commented May 15, 2024

Hi @Subhankar-Adak could you provide the status of pods in knative-serving ns. It seems from your logs that activator cannot connect to the autoscaler pod and fails. Could you provide the logs of the autoscaler pod too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants