Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object: virtulhostnames is not required in the endpoint for rgw #14034

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thotz
Copy link
Contributor

@thotz thotz commented Apr 5, 2024

The endpoint of cephobjectstore is populated with help of BuildDNSEndpoint which combines domain name from getDomainName and port in the objectstore.spec.gateway. This approach is used in most of the Rook code base and even for adminsops api for Rook to communicate with RGW.
In case vhost feature is enabled it returns the domain from rgw DNS name list, but the domain picked may not have the same port as the rgw internal port which is failing for adminsops use case. So the populated endpoint in this case is wrong.
Even if the feature is enabled by default rgw service endpoint will be part of the rgw DNS names so we can pick RGW service endpoint for all the internal communications

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

The virtualhostnames for rgw is added to possible endpoint list, this
endpoint is mainly used by rook operator for communicating with RGW
server. If enable the virtualhostname feature for rgw, by default the
service endpoint is added to the list. So the service endpoint can
accessed even if the feature is enabled.

Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
@@ -348,9 +348,6 @@ func getDomainName(s *cephv1.CephObjectStore, returnRandomDomainIfMultiple bool)
for _, e := range s.Spec.Gateway.ExternalRgwEndpoints {
endpoints = append(endpoints, e.String())
}
} else if s.Spec.Hosting != nil && len(s.Spec.Hosting.DNSNames) > 0 {
// if the store is internal and has DNS names, pick a random DNS name to use
endpoints = s.Spec.Hosting.DNSNames
} else {
return domainNameOfService(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the dns host name is already added to this service name that would be returned here?

What was the bug? Were we returning an invalid domain name with line 353?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also somewhat confused. Please update the commit description and PR description text to be clear about what problem exists and how this resolves it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travis and @BlaineEXE it is not because invalid DNS name.
The endpoint of cephobjectstore is populated with help of BuildDNSEndpoint which combines domain name from getDomainName and port in the objectstore.spec.gateway . This approach is used in most of the Rook code base and even for adminsops api for Rook to communicate with RGW.
In case of host feature is enabled it returns the domain from rgw DNS name list, but the domain picked may not have the same port as the rgw internal port which is failing for adminsops use case. So the populated endpoint in this case is wrong.
Even if the feature is enabled by default rgw service endpoint will be part of the rgw DNS names so we can pick RGW service endpoint for all the internal communications. As I mentioned earlier by default rgw service endpoint is part of there is no specific need to pick the domain name from the rgw DNS list

I have added this change in the first version based on the assumption that by default rgw service endpoint would not be added to rgw DNS names, when vhost feature is enabled. But later we decided to include the rook-service-endpoint to rgw DNS names if not it will break the existing clusters access

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still struggling to understand this.

However, I do understand this statement, and I don't think the logic makes sense to me:

rgw service endpoint will be part of the rgw DNS names so we can pick RGW service endpoint for all the internal communications. As I mentioned earlier by default rgw service endpoint is part of there is no specific need to pick the domain name from the rgw DNS list

If the RGW service endpoint exists, I don't think that necessarily means that we should pick it when reporting/selecting the endpoint. From what I understand, the point of adding rgw dns names is to allow users to use wildcard addressing, which is not possible for default service endpoints. Only the endpoints added by the user in .spec.hosting.dnsNames are wildcard-addressable, and those endpoints should be preferred selections for OBCs and CephObjectStoreUsers. This is especially true because the wildcard-addressability is actually supposed to be the S3 default (the older path-style addressing is deprecated).

@travisn travisn requested a review from BlaineEXE April 5, 2024 15:30
@thotz
Copy link
Contributor Author

thotz commented Apr 12, 2024

@BlaineEXE : At least according to https://access.redhat.com/solutions/7032743 the k8s endpoint can be wildcarded with the help of DNS service(doc was using Noobaa k8s service endpoint of s3). Secondly, even if we pick a domain from list of rgw dns names we cannot guarantee that it is wildcard supported because the list contains all the required domains from the user which also contains domain in which wildcard is not configured so that access not broken. IMO Rook-Op cannot test whether domains in rgw dns names are wildcard supported or not. If the user configures spec.hosting.dnsNames then he should use the endpoint rather than relying upon information from OBC k8s configmap or CephObjectStoreUser secret secret IMHO

@thotz thotz requested a review from travisn April 16, 2024 17:01
@BlaineEXE
Copy link
Member

even if we pick a domain from list of rgw dns names we cannot guarantee that it is wildcard supported because the list contains all the required domains from the user which also contains domain in which wildcard is not configured so that access not broken. IMO Rook-Op cannot test whether domains in rgw dns names are wildcard supported or not.

I agree with this.

If the user configures spec.hosting.dnsNames then he should use the endpoint rather than relying upon information from OBC k8s configmap or CephObjectStoreUser secret secret IMHO

I disagree with this for the reasons I explained here: #14034 (comment)

To expand on this: Because vhost-style bucket addressing is the only non-deprecated S3 access method, it is only natural that Rook should treat it as the preferred method. It is against the ethos of Kubernetes and of Rook to foist onto end users the responsibility of configuring their CR-based (OBC-based) clients based on the input parameters of the dependent resource (CephObjectStore) that those end users likely don't have read permissions to.

Rook must figure out how to use vhost-style dnsNames addresses for use in OBCs and COSI without encountering errors.

However, if a user mis-configures wildcarding for an endpoint they add to dnsNames, there is not much Rook can do. The best Rook could do is return an error and add some text that hints to users that the endpoint may not be working properly, with suggestions.

@thotz
Copy link
Contributor Author

thotz commented Apr 23, 2024

@BlaineEXE @travisn is another issue which I found while testing this feature with QA in ODF. If we enable both SSL and normal ports in then the DNS list contains domains which list to 443 and the SSL cert used support all the DNS domain as well. Otherwise, Rook Operator will fail to communicate(by default Rook prefers secure port). In ODF by default certs are automatically managed by the cert operator and RGW consumes this one. I am not sure right approach here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants