Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packages.operators apiregistration fails to authenticate to packageserver endpoint. #3136

Open
epheo opened this issue Dec 20, 2023 · 1 comment

Comments

@epheo
Copy link

epheo commented Dec 20, 2023

Hi,
After installing OLM (either with operator-sdk or install.sh), packageserver returns connect: connection refused while connecting to operatorhubio-catalog while I don't see any issue using a grpc_cli debugging container.

This is a very simple singlenode install of kubernetes with all pods patched on a same bridge.

$ kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0

The clusterserviceversions stays in Installing phase.

$ kubectl get csv packageserver -n olm
NAME            DISPLAY          VERSION   REPLACES   PHASE
packageserver   Package Server   0.26.0               Installing
$ k get apiservices v1.packages.operators.coreos.com -o yaml
[...]
  conditions:
  - lastTransitionTime: "2023-12-19T22:40:59Z"
    message: 'failing or missing response from https://10.32.0.29:5443/apis/packages.operators.coreos.com/v1:
      bad status from https://10.32.0.29:5443/apis/packages.operators.coreos.com/v1:
      403'
    reason: FailedDiscoveryCheck
    status: "False"
    type: Available

From a grpci_cli debuging container I can reach and list services of the operatorhubio-catalog.olm.svc endpoint.

$ kubectl run -it --rm --restart=Never --image=webplates/grpc-cli:latest grpccli ls operatorhubio-catalog.olm.svc.cluster.local:50051 api.Registry
ListPackages
GetPackage
GetBundle
GetBundleForChannel
GetChannelEntriesThatReplace
GetBundleThatReplaces
GetChannelEntriesThatProvide
GetLatestChannelEntriesThatProvide
GetDefaultBundleThatProvides
ListBundles

Within the operatorhubio-catalog pod the served configs seems ok.

<<K9s-Shell>> Pod: olm/operatorhubio-catalog-r52b7 | Container: registry-server
/ $ ps
PID   USER     TIME  COMMAND
    1 1001      0:35 /bin/opm serve /configs --cache-dir=/tmp/cache
 1922 1001      0:00 sh
 1942 1001      0:00 ps

/ $ grpc_health_probe -addr 127.0.0.1:50051
status: SERVING

/ $ /bin/opm validate /configs
/ $ /bin/opm version
Version: version.Version{OpmVersion:"v1.33.0", GitCommit:"5e23ef59", BuildDate:"2023-11-28T15:00:47Z", GoOs:"linux", GoArch:"amd64"}
/ $

All containers appears as running and livenessprobes seems to have been satisfied.

$ k get all -n olm
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME                                                                  READY   STATUS      RESTARTS      AGE
pod/0b9f8e8106e6bc92a5b3edb6791ceaab0e8a22f5493895798082899af768bmj   0/1     Completed   0             12h
pod/9b9d47c94b554c8bd984f185a7385db635c1dbd74e304e2f4d34960f8bdvm5j   0/1     Completed   0             12h
pod/catalog-operator-7676fc5cc8-jr6th                                 1/1     Running     0             13h
pod/olm-operator-7c897bd449-jgnlk                                     1/1     Running     0             13h
pod/operatorhubio-catalog-r52b7                                       1/1     Running     0             13h
pod/packageserver-5966d674f8-fmjsn                                    1/1     Running     0             13h
pod/packageserver-5966d674f8-hwpxn                                    1/1     Running     0             13h

NAME                            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)     AGE
service/operatorhubio-catalog   ClusterIP   10.32.0.101   <none>        50051/TCP   13h
service/packageserver-service   ClusterIP   10.32.0.138   <none>        5443/TCP    51s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/catalog-operator   1/1     1            1           13h
deployment.apps/olm-operator       1/1     1            1           13h
deployment.apps/packageserver      2/2     2            2           13h

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/catalog-operator-7676fc5cc8   1         1         1       13h
replicaset.apps/olm-operator-7c897bd449       1         1         1       13h
replicaset.apps/packageserver-5966d674f8      2         2         2       13h

NAME                                                                        COMPLETIONS   DURATION   AGE
job.batch/0b9f8e8106e6bc92a5b3edb6791ceaab0e8a22f5493895798082899af72da17   1/1           9s         12h
job.batch/9b9d47c94b554c8bd984f185a7385db635c1dbd74e304e2f4d34960f8bdc287   1/1           7s         12h

But a log from a packageserver pod returns:

time="2023-12-20T11:26:48Z" level=warning msg="error getting bundle stream" action="refresh cache" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.32.0.101:50051: connect: connection refused\"" source="{operatorhubio-catalog olm}"
W1220 11:26:49.978844       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {operatorhubio-catalog.olm.svc:50051 operatorhubio-catalog.olm.svc:50051 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.32.0.101:50051: connect: connection refused". Reconnecting...

I included what felt relevant from the olm-operator operatorhubio-catalog and packageserver logs.

catalog-operator.log
operatorhubio-catalog.log
packageserver.log
olm-operator.log

@epheo
Copy link
Author

epheo commented Dec 20, 2023

update:
The connection refused logs from the packageserver pod are only happening during the instantiation of opm and package-server can connect correctly using grpc afterward.

Actual issue appears to concern the packageserver endpoint authentication as healthz livez and readyz endpoints all returns 200 ok but the apis/packages.operators.coreos.com/v1 endpoint returns 403 Forbidden.

message: 'failing or missing response from https://10.32.0.210:5443/apis/packages.operators.coreos.com/v1:
  bad status from https://10.32.0.210:5443/apis/packages.operators.coreos.com/v1:
      403'

If I run another package-server with --authorization-always-allow-paths /apis/packages.operators.coreos.com/v1 the endpoint is returning the expect result.

/bin/package-server -v=4 --secure-port 5444 --global-namespace olm --debug --authorization-always-allow-paths /apis/packages.operators.coreos.com/v1
dnstools# curl -k https://10.200.0.94:5444/apis/packages.operators.coreos.com/v1
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "packages.operators.coreos.com/v1",
  "resources": [
    {
      "name": "packagemanifests",
      "singularName": "packagemanifest",
      "namespaced": true,
      "kind": "PackageManifest",
      "verbs": [
        "get",
        "list"
      ]
    },
    {
      "name": "packagemanifests/icon",
      "singularName": "",
      "namespaced": true,
      "kind": "PackageManifest",
      "verbs": [
        "get"
      ]
    }
  ]

https://github.com/openshift/library-go/blob/7a65fdb398e28782ee1650959a5e0419121e97ae/pkg/config/serving/server.go#L63
refers to system:masters which matches the certificate I use to create OLM ressources.

What component/configuration may I be missing in my kubernetes deployment ?

@epheo epheo changed the title packageserver can't connect to operatorhubio-catalog while grpc_cli can packages.operators apiregistration fails to authenticate to packageserver endpoint. Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant