Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC endpoint not responding properly after the InferenceService reports as Loaded #146

Open
kpouget opened this issue Nov 9, 2023 · 5 comments
Assignees

Comments

@kpouget
Copy link

kpouget commented Nov 9, 2023

As part of my automated scale test, I observe that the InferenceService sometimes reports as Loaded, but the call to GRPC endpoint returns with errors.

Examples:

<command>
set -o pipefail;
i=0;

GRPCURL_DATA=$(cat "subprojects/llm-load-test/openorca-subset-006.json" | jq .dataset[$i].input )

grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: flan-t5-small-caikit"    u0-m7-predictor-watsonx-serving-scale-test-u0.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
</command>

<stderr> ERROR:
<stderr>   Code: Unavailable
<stderr>   Message: connections to all backends failing; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused
<command>
set -o pipefail;
set -e;
dest=/mnt/logs/016__watsonx_serving__validate_model_all/u0-m6/answers.json
queries=/mnt/logs/016__watsonx_serving__validate_model_all/u0-m6/questions.json
rm -f "$dest" "$queries"

for i in $(seq 10); do
  GRPCURL_DATA=$(cat "subprojects/llm-load-test/openorca-subset-006.json" | jq .dataset[$i].input )
  echo $GRPCURL_DATA >> "$queries"
  grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: flan-t5-small-caikit"    u0-m6-predictor-watsonx-serving-scale-test-u0.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict    >> "$dest"
  echo "Call $i/10 passed"
done
</command>

<stdout> Call 1/10 passed
<stdout> Call 2/10 passed
<stdout> Call 3/10 passed
<stdout> Call 4/10 passed
<stdout> Call 5/10 passed
<stdout> Call 6/10 passed
<stdout> Call 7/10 passed
<stdout> Call 8/10 passed
<stdout> Call 9/10 passed
<stderr> ERROR:
<stderr>   Code: Unavailable
<stderr>   Message: error reading from server: EOF

Versions

NAME                          DISPLAY                                          VERSION    REPLACES                                   PHASE
jaeger-operator.v1.47.1-5     Red Hat OpenShift distributed tracing platform   1.47.1-5   jaeger-operator.v1.47.0-2-0.1696814090.p   Succeeded
kiali-operator.v1.65.9        Kiali Operator                                   1.65.9     kiali-operator.v1.65.8                     Succeeded
rhods-operator.2.3.0          Red Hat OpenShift Data Science                   2.3.0      rhods-operator.2.2.0                       Succeeded
serverless-operator.v1.30.1   Red Hat OpenShift Serverless                     1.30.1     serverless-operator.v1.30.0                Succeeded
servicemeshoperator.v2.4.4    Red Hat OpenShift Service Mesh                   2.4.4-0    servicemeshoperator.v2.4.3                 Succeeded
quay.io/opendatahub/text-generation-inference@sha256:0e3d00961fed95a8f8b12ed7ce50305acbbfe37ee33d37e81ba9e7ed71c73b69
quay.io/opendatahub/caikit-tgis-serving@sha256:ed920d21a4ba24643c725a96b762b114b50f580e6fee198f7ccd0bc73a95a6ab
@kpouget
Copy link
Author

kpouget commented Nov 10, 2023

I could work around the issue by increasing the memory limit of the Istio egress/ingress Pods (to 4GB, to be safe):

apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
  name: minimal
  namespace: istio-system
spec:
  gateways:
    egress:
      runtime:
        container:
          resources:
            limits:
              memory: 4Gi
    ingress:
      runtime:
        container:
          resources:
            limits:
              memory: 4Gi

image
image

but this wasn't happening a few weeks ago, with RHOAI 2.1.0 and 300 models (when running on AWS with 35 nodes, whereas this bug occured on a single-node OpenShift)

image
image

Can this be a regression, or is it somehow expected?

@bartoszmajsak
Copy link

@kpouget I am wondering if we can get some insights into these metrics as well:

  • pilot_xds_push_time_bucket
  • pilot_proxy_convergence_time_bucket
  • pilot_proxy_queue_time_bucket

@bartoszmajsak
Copy link

but this wasn't happening a few weeks ago, with RHOAI 2.1.0 and 300 models (when running on AWS with 35 nodes, whereas this bug occured on a single-node OpenShift)

@kpouget was it also running on istio underneath? if so - how was it configured?

@kpouget
Copy link
Author

kpouget commented Nov 29, 2023

@kpouget was it also running on istio underneath? if so - how was it configured?

yes it was. Istio was using these files for configuration (pinned commit from what I used at the time of the test)

@bartoszmajsak
Copy link

bartoszmajsak commented Nov 30, 2023

I managed to reduce resource consumption roughly by half. Here's the script which you can apply.

In short this script:

  • sets resource constraints for pilot and gateways
  • enables PILOT_FILTER_GATEWAY_CLUSTER_CONFIG
    • this reduces the amount of configuration data that Pilot sends to Istio gateways, specifically the egress and ingress gateways. It filters out unnecessary service registry information that is not relevant to a particular gateway.
  • limits outbound endpoints populated to sidecar proxies for each of the projects by using Sidecar resource.
    • This is based on the assumption that there is no cross-namespace communication in place. If that is not true we have to revise Sidecar settings @Jooho @israel-hdez
#!/bin/bash

cat <<EOF > smcp-patch.yaml 
apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:  
  name: data-science-smcp
  namespace: istio-system  
spec:
  gateways:
    egress:
      runtime:
        container:
          resources:
            limits:
              cpu: 1024m
              memory: 4G
            requests:
              cpu: 128m
              memory: 1G
    ingress:
      runtime:
        container:
          resources:
            limits:
              cpu: 1024m
              memory: 4G
            requests:
              cpu: 128m
              memory: 1G
  runtime:
    components:
      pilot:
        container:
          env:
            PILOT_FILTER_GATEWAY_CLUSTER_CONFIG: "true"
          resources:
            limits:
              cpu: 1024m
              memory: 4G
            requests:
              cpu: 128m
              memory: 1024Mi

EOF

trap '{ rm -rf -- smcp-patch.yaml; }' EXIT

kubectl patch smcp/data-science-smcp -n istio-system --type=merge --patch-file smcp-patch.yaml 

namespaces=$(kubectl get ns -ltopsail.scale-test -o name | cut -d'/' -f 2)


# limit sidecarproxy endpoints to its own ns and istio-system
for ns in $namespaces; do
    cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: default
  namespace: $ns
spec:
  egress:
  - hosts:
    - "./*"
    - "istio-system/*"
EOF
done

# force changes to take effect
for ns in $namespaces; do
    kubectl delete pods --all -n "${ns}"
done


# force re-creation of all pods with envoy service registry rebuilt
kubectl delete pods --all -n istio-system

Initial state

❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052

❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
1065

❯ kubectl top pods -n istio-system
NAME                                        CPU(cores)   MEMORY(bytes)   
istio-egressgateway-6b7fdb6cb9-lh5jg        100m         2519Mi          
istio-ingressgateway-7dbdc66dd7-nkxxq       91m          2320Mi          
istiod-data-science-smcp-65f4877fff-tndf4   82m          1392Mi 

❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD                                               NAME                    CPU(cores)   MEMORY(bytes)   
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq   POD                     0m           0Mi             
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq   istio-proxy             14m          372Mi           
...

Modifications

❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052 // it knows the whole world, so that is the same

❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
34

❯ kubectl top pods -n istio-system
NAME                                        CPU(cores)   MEMORY(bytes)   
istio-egressgateway-5778df8594-j869r        83m          444Mi           
istio-ingressgateway-6847d4b974-sk25z       77m          946Mi           
istiod-data-science-smcp-5568884d7d-45zkz   36m          950Mi 

❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD                                               NAME                    CPU(cores)   MEMORY(bytes)   
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq   POD                     0m           0Mi             
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq   istio-proxy             6m           136Mi           
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To-do/Groomed
Development

No branches or pull requests

3 participants