-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authorino operator pod fails on some open shift clusters #274
Comments
The version which fails is 0.11.1 |
Thanks for opening the issue and sorry for the struggle @eyalcha. Are there any logs you could share?
|
@eyalcha why are you changing the name on the authorino side though? Is it because of this problem opendatahub-io/opendatahub-operator#892? If that is the case, change configuration of service mesh control plane on the ODH side instead to match the actual authorino service name. |
@eyalcha, please listen to @bartoszmajsak's recommendations (not mine!) if editing any Authorino resource manually. Those changes may be attempted to be reconciled by the ODH Operator and/or by Authorino Operator. In general, you should not change the name of the Either way, I'd be interested to understand the restarts of the |
@guicassolato FWIW, I'm seeing a crash in authorino pod (the instance, not the operator nor the webhook) in a cluster. Latest v0.11.1 version in OperatorHub. These are the logs:
|
@israel-hdez, this seems to be a problem of the ext_authz protocol. It looks like the grpc message does not comply with the format – missing the http attributes of the request in the payload to the ext_authz service. Can you please enable debug log level in the Authorino CR, so we can try to inspect further? I could also use the logs from the istio proxy. |
@israel-hdez, I've just sent a patch upstream with an improved handler for those ext_authz requests errors. If you want to please try overriding the Authorino image in the Authorino CR with the following one: |
i wonder if the debugging has to be performed from @eyalcha 's cluster |
These are the logs:
BTW this image does not crash. Also the updated |
We may have managed to fix the crash, but there are still a couple of things weird here. First one is
Then, there's What kind of API client is this? Is it gRPC? Do you have settings such as |
4.15.2
I had to use another cluster for this, as the original one had hibernated. But the setup should be similar (if not the same). See attached file: authorino-can-i.txt
I'm not sure how to answer the first two :-) I can say there are two services deployed: one is a restful service (kserve-test/sklearn-v2-iris in the logs) and the other one should be grpc (kserve-auth/llama-auth). Also, I know that I didn't have to do any request, and still was crashing. It must have been some request coming from the networking stack (either Istio or Knative), or a request from the cluster (perhaps the readines/liveness probes) that would fire the check with Authorino. About the 3rd Q, see the following YAML which contains the SMCP of the cluster. apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
name: data-science-smcp
namespace: istio-system
spec:
addons:
grafana:
enabled: false
jaeger:
name: jaeger
kiali:
enabled: false
name: kiali
prometheus:
enabled: false
gateways:
ingress:
service:
metadata:
labels:
knative: ingressgateway
openshiftRoute:
enabled: false
profiles:
- default
proxy:
networking:
trafficControl:
inbound:
excludedPorts:
- 8444
- 8022
outbound: {}
security:
dataPlane:
mtls: true
identity:
type: ThirdParty
techPreview:
meshConfig:
defaultConfig:
terminationDrainDuration: 35s
extensionProviders:
- envoyExtAuthzGrpc:
port: 50051
service: authorino-authorino-authorization.opendatahub-auth-provider.svc.cluster.local
name: opendatahub-auth-provider
tracing:
type: None
version: v2.5 |
@israel-hdez can you please point to me the instructions in the docs to try to reproduce this myself? I see a couple differences in the RBAC rules, compared to what I was expecting. However, running Authorino in OpenShift 4.15 passed in QE. So, I guess for now I'm deeming those boot-up errors as a hiccup... At least until proven otherwise. (We'd need authz requests passing through to tell for sure whether those are affecting the flow or not. Maybe they aren't.) Let's focus on the |
Is this still relevant? I reckon a couple different issues were mentioned in the description and comments (e.g. failing Authorino Operator pod, invalid authorization payload, etc.) It's not clear to me if it's still an Authorino issue or a deployment/management one. There's also some possible distinction to be made between boot-up issue vs. authorisation request issue perhaps. Maybe we want to try to confirm which is which and close/open proper new issues accordingly? Hopefully with detailed steps to reproduce. |
@guicassolato I'll close this ticket. We are not observing any issues around RBAC policies. About the |
/kind bug
I have two OpenShift clusters with 4.13.xx version. Authorinio operator pod fails when I change the service to
authorino-authorino-authorization.opendatahub-auth-provider.svc.cluster.local
. It doesn't happen on the other cluster.Failing cluster:
Other cluster which doesn't fail:
The text was updated successfully, but these errors were encountered: