Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from v0.10 to v0.11 breaks the InferenceService at routing level #3611

Open
bgalvao opened this issue Apr 16, 2024 · 1 comment
Open
Labels

Comments

@bgalvao
Copy link

bgalvao commented Apr 16, 2024

/kind bug

cc: @gilcardoai

What steps did you take and what happened:

Upgraded from v0.10 to v0.11 to prepare for a cluster upgrade.
Spun up a new InferenceService to get the associated v0.11 resources to check if everything was working as before.
Whenever trying to get a prediction, It does not find the route anymore:

[2024-04-16T09:08:29.503Z] "POST /v1/models/lendingclub-application-model-predictor:predict HTTP/1.1" 404 NR route_not_found - "-" 0 0 0 - "10.0.61.171" "Python/3.10 aiohttp/3.8.3" "dce966b0-daf6-4bdd-a813-73a871195ce7" "lendingclub-default-prediction.model-serving-prod.example.com" "-" - - 10.0.123.64:8080 10.0.61.171:33928 - -

What did you expect to happen:

A smooth upgrade to v0.11 where the InferenceService works as before.

What's the InferenceService yaml:

inference_service.yaml

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: [redacted]
  creationTimestamp: '2024-04-16T11:19:30Z'
  finalizers:
    - inferenceservice.finalizers
  generation: 1
  labels:
    argocd.argoproj.io/instance: lendingclub-default-prediction-kserve-prod
    env: production
  managedFields:
    - apiVersion: serving.kserve.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:labels:
            .: {}
            f:argocd.argoproj.io/instance: {}
            f:env: {}
        f:spec:
          .: {}
          f:predictor:
            .: {}
            f:containers: {}
            f:minReplicas: {}
            f:serviceAccountName: {}
      manager: argocd-application-controller
      operation: Update
      time: '2024-04-16T11:19:28Z'
    - apiVersion: serving.kserve.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"inferenceservice.finalizers": {}
      manager: manager
      operation: Update
      time: '2024-04-16T11:19:30Z'
    - apiVersion: serving.kserve.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:address:
            .: {}
            f:url: {}
          f:components:
            .: {}
            f:predictor:
              .: {}
              f:address:
                .: {}
                f:url: {}
              f:latestCreatedRevision: {}
              f:latestReadyRevision: {}
              f:latestRolledoutRevision: {}
              f:traffic: {}
              f:url: {}
          f:conditions: {}
          f:modelStatus:
            .: {}
            f:copies:
              .: {}
              f:failedCopies: {}
              f:totalCopies: {}
            f:states:
              .: {}
              f:activeModelState: {}
              f:targetModelState: {}
            f:transitionStatus: {}
          f:observedGeneration: {}
          f:url: {}
      manager: manager
      operation: Update
      subresource: status
      time: '2024-04-16T11:19:54Z'
  name: lendingclub-default-prediction
  namespace: model-serving-prod
  resourceVersion: '944337397'
  uid: 765c64e7-f0cd-46f8-91f9-512d3eaff0cc
  selfLink: >-
    /apis/serving.kserve.io/v1beta1/namespaces/model-serving-prod/inferenceservices/lendingclub-default-prediction
status:
  address:
    url: >-
      http://lendingclub-default-prediction.model-serving-prod.svc.cluster.local/v1/models/lendingclub-default-prediction:predict
  components:
    predictor:
      address:
        url: >-
          http://lendingclub-default-prediction-predictor-default.model-serving-prod.svc.cluster.local
      latestCreatedRevision: lendingclub-default-prediction-predictor-default-00001
      latestReadyRevision: lendingclub-default-prediction-predictor-default-00001
      latestRolledoutRevision: lendingclub-default-prediction-predictor-default-00001
      traffic:
        - latestRevision: true
          percent: 100
          revisionName: lendingclub-default-prediction-predictor-default-00001
      url: >-
        http://lendingclub-default-prediction-predictor-default.model-serving-prod.example.com
  conditions:
    - lastTransitionTime: '2024-04-16T11:19:54Z'
      status: 'True'
      type: IngressReady
    - lastTransitionTime: '2024-04-16T11:19:53Z'
      severity: Info
      status: 'True'
      type: PredictorConfigurationReady
    - lastTransitionTime: '2024-04-16T11:19:53Z'
      status: 'True'
      type: PredictorReady
    - lastTransitionTime: '2024-04-16T11:19:53Z'
      severity: Info
      status: 'True'
      type: PredictorRouteReady
    - lastTransitionTime: '2024-04-16T11:19:54Z'
      status: 'True'
      type: Ready
  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: Loaded
    transitionStatus: UpToDate
  observedGeneration: 1
  url: http://lendingclub-default-prediction.model-serving-prod.example.com
spec:
  predictor:
    containers:
      - env:
          - name: STORAGE_URI
            value: >-
              [redacted]
          - name: MOUNT_DIR
            value: /mnt/models/
        image: [redacted]
        imagePullPolicy: Always
        name: kserve-container
        resources:
          limits:
            cpu: '1'
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 1Gi
    minReplicas: 1
    serviceAccountName: lendingclub-default-prediction-sa

Anything else you would like to add:

🤷🏼‍♂️

Environment:

  • Istio Version: v1.15
  • Knative Version: v1.7.0
  • KServe Version: v0.11
  • Kubeflow version: not installed
  • Cloud Environment: AWS EKS
  • Kubernetes version: (use kubectl version): 1.25
@Syntax-Error-1337
Copy link

Check Virtual Services and Routes

Verify that the appropriate virtual services and routes are configured in Istio, which are crucial for routing in KServe/Knative environments using Istio.

kubectl get virtualservices -n model-serving-prod
kubectl describe virtualservice lendingclub-default-prediction -n model-serving-prod

Ensure that the domain lendingclub-default-prediction.model-serving-prod.example.com resolves to the correct address and matches the expected service URL. Check DNS or service mesh settings if there's a mismatch.

Examine Istio Configuration

Check Istio’s configuration for any conflicts or misconfigurations, especially around gateways and virtual services.

kubectl logs -n istio-system -l app=istiod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants