Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object store user create/update reconcile failed due to invalid certificate #8993

Closed
ashangit opened this issue Oct 18, 2021 · 20 comments · Fixed by #9020
Closed

Object store user create/update reconcile failed due to invalid certificate #8993

ashangit opened this issue Oct 18, 2021 · 20 comments · Fixed by #9020
Labels

Comments

@ashangit
Copy link
Member

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

Expected behavior:
Using an invalid certificate for S3 RGW leads to object store user create/update reconcile failed

**2021-10-18 13:46:07.309996 E | ceph-object-store-user-controller: failed to reconcile failed to create/update object store user "s3-monitor-user": failed to get details from ceph object user "s3-monitor-user": Get "https://rook-ceph-rgw-backup.par-ns1-preprod.svc:19000/admin/user?display-name=User%20used%20to%20get%20ceph%20admin%20radosgw%20metrics&format=json&max-buckets=1000&uid=s3-monitor-user": x509: certificate is valid for *.XXXX.CCCC, XXXX.CCCC, not rook-ceph-rgw-backup.par-ns1-preprod.svc

Since #8712 the bucket health checks does not check certificate but still check for user reconcile.
I'm wondering if there is a security concern that leads to set insecure to false in rgw.go: https://github.com/rook/rook/pull/8712/files#diff-00d4604932102df57560a4811e89064acd51ec541a5ef439b3f14cbf0a54d791R364 or if it is fine to just set it to true

How to reproduce it (minimal and precise):

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): 1.7.5
  • Storage backend version (e.g. for ceph do ceph -v):
  • Kubernetes version (use kubectl version):
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@leseb
Copy link
Member

leseb commented Oct 18, 2021

The healthchecker uses insecure TLS because the execution is really contained, we know it's our own internal check so there is no real security implication. Using insecure TLS is not recommended as it can lead to man-in-the-middle type of attacks. I doubt we want to go as far as using insecure TLS for the object store create/update by default.

We could have another key in the secret's data to pass HTTP client options, like use Insecure TLS. @thotz thoughts?

@thotz
Copy link
Contributor

thotz commented Oct 19, 2021

The healthchecker uses insecure TLS because the execution is really contained, we know it's our own internal check so there is no real security implication. Using insecure TLS is not recommended as it can lead to man-in-the-middle type of attacks. I doubt we want to go as far as using insecure TLS for the object store create/update by default.

We could have another key in the secret's data to pass HTTP client options, like use Insecure TLS. @thotz thoughts?

That works for cephobjectstoreuser, not for OBC. I am trying to understand the complete workflow here
we have s3 clients connected with the load balancer and the load balancer try to redirect the request to the RGW server. Can we TLS authentication just between s3 client and load balancer, not with the RGW server. Maybe I forgot what is the issue with adding the rgw host address to the certificate. If we are using always insecure with requests I am not fully sure why enable TLS ?
Other way around is create two certs one between Rook and RGW. And another for applications and RGW server. Pass both certs in the option for in cephobjectstore TLS secret

@ashangit
Copy link
Member Author

Maybe I forgot what is the issue with adding the rgw host address to the certificate
This is an internal limitation with certificate provided by external provider only containing a set of externally reachable host and not the internal kubernetes host one.
If we are using always insecure with requests I am not fully sure why enable TLS ?
We only need this for the rook operator, our clients are using valid host entries

@leseb
Copy link
Member

leseb commented Oct 21, 2021

Proposal here #9020

@thotz
Copy link
Contributor

thotz commented Oct 25, 2021

Maybe I forgot what is the issue with adding the rgw host address to the certificate This is an internal limitation with certificate provided by external provider only containing a set of externally reachable host and not the internal kubernetes host one. If we are using always insecure with requests I am not fully sure why enable TLS ? We only need this for the rook operator, our clients are using valid host entries

Thanks for explaining, how about using two certs passing to secret, is that possible to work around for the timebeing?

@logan2211
Copy link

I'm struggling with usability of the workaround implemented in #9020.

I use cert-manager Certificate resources to issue RGW SSL certs. How can I make a Certificate resource that generates a TLS secret containing the insecureSkipVerify flag?

Currently I'm struggling with the following error that is preventing OBC's from working.

x509: certificate is valid for objects.k8s.domain.com, not rook-ceph-rgw-ceph-rgw.rook-ceph.svc

@thotz
Copy link
Contributor

thotz commented May 9, 2022

@logan2211 : IF you are using k8s secret type as TLS for storing RGW certs, then this flag cannot be added. But if you are using k8s secrets normal opaque type then u can add this value in the data field of the secret

@ibotty
Copy link

ibotty commented Jun 23, 2022

How can I configure radosgw to serve using a certmanager-issued certificate while still allowing rook to reconcile?

@thotz
Copy link
Contributor

thotz commented Jun 24, 2022

sslCertificateRef in https://rook.io/docs/rook/v1.9/CRDs/Object-Storage/ceph-object-store-crd/#gateway-settings clearly mentions how RGW can be configured with TLS certs. What specific is ur usecase or how it is different than in the docs?

@ibotty
Copy link

ibotty commented Jun 24, 2022

Yes it does, but afaict it's impossible to instruct certmanager to inject the extra key in the secret.

If adding the service DNS name to the certificate is not empty another key can be specified in the secret's data: insecureSkipVerify: true to skip the certificate verification. It is not recommended to enable this option since TLS is susceptible to machine-in-the-middle attacks unless custom verification is used.

@ibotty
Copy link

ibotty commented Jun 24, 2022

To elaborate: if you want, say, letsencrypt-verified certificates, you'll need another process that postprocesses the generated secret, every time the certificate is updated.

In general: I'd argue, that using a certificate signed by public CAs is or ought to be common. In that case it's impossible to get a signed certificate with the servicename.

@thotz
Copy link
Contributor

thotz commented Jun 28, 2022

If I understand correctly the cert manager creates k8s tls secret in which adding skip certificate check option cannot be set. Can cert manager just creates the certs and then u can manually create the secrets using that? For the time being IMO the workaround is convert the k8s tls secret into k8s opaque one.

@ibotty
Copy link

ibotty commented Jun 28, 2022

Of course that works, but it has to be done every time the certificate gets rotated, which is quiet often with acme-issued certificates (letsencrypt).

@thotz
Copy link
Contributor

thotz commented Jun 29, 2022

Oh I see now I understand your issue completely. @travisn @BlaineEXE currently we can option to skip SSL check-in if the k8s secret is an opaque type. It won't work if the k8s secret type is tls. Do we move this option to objectstoreSpec??

@travisn
Copy link
Member

travisn commented Jun 29, 2022

What's the proposal? We need an option for the k8s secret type to be tls? Please open a new issue with the issue summarize and the potential proposal.

@ibotty
Copy link

ibotty commented Jun 30, 2022

Another option is to use the insecure port for in-cluster communication.

@thotz
Copy link
Contributor

thotz commented Jun 30, 2022

it works if both ports are opened, otherwise end up in same situation

@galbeniluz
Copy link

Hi, I'm having a similar issue, when trying to create a bucket it fails with error that saying the certificate is not valid for rook-ceph-rgw-store-deck.rook-ceph.svc

@jansmets
Copy link

jansmets commented Aug 18, 2022

Same issue here, the insecureSkipVerify: true doesn't help.

2 problems:

a) bucket provisioning

2022-08-08 17:29:45.553555 I | op-bucket-prov: Provision: creating bucket "bucketname-dedc75fc-ebf5-4102-9788-182d8dd403a6" for OBC "bucketname"
E0808 17:29:45.599636       1 controller.go:205] error syncing 'default/bucketname: error provisioning bucket: Provision: can't create ceph user: no user name provided and unable to generate a unique name: failed to get ceph user "ceph-user-Wjfu27Jq": Get "https://rook-ceph-rgw-s3.rook-ceph.svc/admin/user?format=json&uid=ceph-user-Wjfu27Jq": x509: certificate is valid for host.example.com, not rook-ceph-rgw-s3.rook-ceph.svc, requeuing

  1. bucket health check
  Bucket Status:
    Details:       failed to get details from ceph object user "rook-ceph-internal-s3-user-checker-37301072-0641-4ce6-b927-2b9c4fd1320b": Get "https://rook-ceph-rgw-s3.rook-ceph.svc/admin/user?format=json&uid=rook-ceph-internal-s3-user-checker-xyz": x509: certificate is valid for host.example.com, not rook-ceph-rgw-s3.rook-ceph.svc

@travisn @leseb : do we need/want a new issue for this?

@kwrobert
Copy link

kwrobert commented Sep 21, 2022

@logan2211 I was able to work around this with a Cert Manager certificate that is signed by a publicly trusted CA using the uris field (which I think corresponds to certificate SANs?) of the Certificate object. So you need a manifest that looks something like this:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: rook-object-store-cert
  namespace: rook-ceph
spec:
  # Secret names are always required.
  secretName: rook-object-store-tls
  dnsNames:
    - public-dns-name-for-external-use.example.com
  uris:
      # We need this extra hostname for some internal health checks that happen
      # when deploying new buckets. This gets put in the SAN field
    - rook-ceph-rgw-ceph-objectstore.rook-ceph.svc # IMPORTANT PART THAT FIXES THE INVALID CERT ISSUE

in the YAML manifest for your Cert Manager Certificate object. Make sure the secret object generated by the Certificate is an actual tls secret, not an opaque one. Then your CephObjectStore should refer to the name of the generated Secret object in the sslCertificateRef field like so:

sslCertificateRef:  rook-object-store-tls

EDIT: Never mind I still have the same issue after the cluster deploys. The cluster was able to deploy and reconcile, but now I can't provision any new buckets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants