Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object: remove default value us-east-1 as region from s3 client #9754

Closed

Conversation

thotz
Copy link
Contributor

@thotz thotz commented Feb 16, 2022

Currently s3 client apis sets default value as us-east-1. This patch
removes that, region need to map with zonegroup in the RGW server so we
set that accordingly

Signed-off-by: Jiffin Tony Thottan thottanjiffin@gmail.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not passing CI.

@thotz thotz force-pushed the remove-default-value-for-region-s3client branch from 572bbae to e8ba957 Compare February 17, 2022 06:53
@thotz thotz requested a review from leseb February 17, 2022 07:38
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

us-east-1 is still used in the integration tests in ceph_base_object_test.go. How about changing that to something else so we can see the non-default works?

@@ -157,9 +157,9 @@ func (b *BucketOperation) CheckBucketNotificationSetonRGW(namespace, storeName,
s3AccessKey, _ := helper.BucketClient.GetAccessKey(obcName)
s3SecretKey, _ := helper.BucketClient.GetSecretKey(obcName)
if tlsEnabled {
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, "", true)
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, storeName, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a different region is specified in the storage class, will this client connection work with the store name? Or does it need to match the storage class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn : I wondered magic behind us-east-1 as region value. The actual value for s3 client needs to storeName and ideally we should have filled same value for storage class. The us-east-1 works for RGW, I am guessing it skips checks(or maybe bug)

Copy link
Contributor Author

@thotz thotz Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn : [Another update]: us-east-1 as a region no longer valid in master, it works with pacific though. So IMO we might see a similar issue when we start using new ceph versions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we remove region from the default examples in rook, it will work with ceph pacific and master? Or will we need different examples for pacific vs quincy? And to be clear, does the ceph master change also affect quincy, or just post-quincy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will work, this fills internal by Rook. Earlier we are using us-east-1 from this patch we add the actual zone of the rgw server that's the difference.
The us-east-1 worked in quincy as well, but I don't see much code difference in that path for master and Quincy. Need to have a bit of time to understand that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented examples we are working on in the other comments only apply to Quincy and newer, correct? So what is the behavior for region in Pacific? The docs need to give that detail, especially since Pacific is still the default version with Rook.

Copy link
Contributor Author

@thotz thotz Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn : Sorry if I created confusion before for RGW in any versions, region is the zone group but (again true for all versions) if we give region as us-east-1 it will bypass some checks in RGW and request succeed. I felt it is kind of a hack. But without knowing much about this context Rook was setting us-east-1 for the internal aws s3 SDK and this was working fine for our use case. But this may create confusion for users when they use us-east-1 works but other region's requests are failing. By this PR we are avoiding hack and using the required value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about how users can understand the region. If us-east-1 is a workaround and a hack, maybe we should just keep that for the simple, local rgw case. What does the user gain by using a different region? Seems like it should only be important for external or multi-site where we need the region name in order to connect. But if it's just the local cluster, can we hide the region from the user so they don't need to worry about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a time to discuss this next week since I'm having such a hard time. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn: Sure :). Sounds good to me

@@ -9,7 +9,7 @@ reclaimPolicy: Delete
parameters:
objectStoreName: my-store
objectStoreNamespace: rook-ceph # namespace:cluster
region: us-east-1
# region: us-east-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this needs to match with the zone group, perhaps we should add documentation here in the example and also in the official docs to help users set this value correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE : I will try to add #9556 with this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thotz Since the default is us-east-1, should we just leave this default uncommented?

Suggested change
# region: us-east-1
region: us-east-1

@thotz
Copy link
Contributor Author

thotz commented Feb 21, 2022

us-east-1 is still used in the integration tests in ceph_base_object_test.go. How about changing that to something else so we can see the non-default works?

@travisn : do u want to check whether with regions and see s3 client works or not right? It's more or like s3 client kind of test, but I can definitely add those

@travisn
Copy link
Member

travisn commented Feb 21, 2022

us-east-1 is still used in the integration tests in ceph_base_object_test.go. How about changing that to something else so we can see the non-default works?

@travisn : do u want to check whether with regions and see s3 client works or not right? It's more or like s3 client kind of test, but I can definitely add those

Yes sounds to add some testing for this.

@thotz
Copy link
Contributor Author

thotz commented Feb 22, 2022

us-east-1 is still used in the integration tests in ceph_base_object_test.go. How about changing that to something else so we can see the non-default works?

@travisn : do u want to check whether with regions and see s3 client works or not right? It's more or like s3 client kind of test, but I can definitely add those

Yes sounds to add some testing for this.

The only create bucket request needs the correct region value, all the objects operation works fine with the
invalid region for s3 client. So I guess I need to add a test case for cephobjectstoreuser by creating a bucket to test this out

@thotz thotz force-pushed the remove-default-value-for-region-s3client branch 5 times, most recently from dcc908a to 238178d Compare February 22, 2022 13:55
Documentation/ceph-object-bucket-claim.md Show resolved Hide resolved
Documentation/ceph-object-bucket-claim.md Outdated Show resolved Hide resolved
@@ -157,9 +157,9 @@ func (b *BucketOperation) CheckBucketNotificationSetonRGW(namespace, storeName,
s3AccessKey, _ := helper.BucketClient.GetAccessKey(obcName)
s3SecretKey, _ := helper.BucketClient.GetSecretKey(obcName)
if tlsEnabled {
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, "", true)
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, storeName, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we remove region from the default examples in rook, it will work with ceph pacific and master? Or will we need different examples for pacific vs quincy? And to be clear, does the ceph master change also affect quincy, or just post-quincy?

@thotz thotz force-pushed the remove-default-value-for-region-s3client branch from 1b7bcce to cd70ca5 Compare February 23, 2022 09:48
@thotz thotz requested a review from travisn February 23, 2022 09:57
Documentation/ceph-object-bucket-claim.md Outdated Show resolved Hide resolved
Documentation/ceph-object-bucket-claim.md Outdated Show resolved Hide resolved
@thotz thotz force-pushed the remove-default-value-for-region-s3client branch from cd70ca5 to 380dd49 Compare February 24, 2022 06:26
@thotz thotz requested a review from travisn February 24, 2022 06:27
Documentation/ceph-object-bucket-claim.md Outdated Show resolved Hide resolved
Documentation/ceph-object-bucket-claim.md Outdated Show resolved Hide resolved
@@ -157,9 +157,9 @@ func (b *BucketOperation) CheckBucketNotificationSetonRGW(namespace, storeName,
s3AccessKey, _ := helper.BucketClient.GetAccessKey(obcName)
s3SecretKey, _ := helper.BucketClient.GetSecretKey(obcName)
if tlsEnabled {
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, "", true)
s3client, err = rgw.NewInsecureS3Agent(s3AccessKey, s3SecretKey, s3endpoint, storeName, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented examples we are working on in the other comments only apply to Quincy and newer, correct? So what is the behavior for region in Pacific? The docs need to give that detail, especially since Pacific is still the default version with Rook.

@thotz thotz force-pushed the remove-default-value-for-region-s3client branch from 380dd49 to 89a7cbe Compare March 2, 2022 12:50
@thotz thotz requested a review from travisn March 2, 2022 12:50
Currently s3 client apis sets default value as `us-east-1`. This patch
removes that, region need to map with zonegroup in the RGW server so we
set that accordingly

Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
The default value for region in RGW is us-east-1, if users want to use
other regions then they need to create zone/zonegroup accordingly before
defining it in storage class.

Fixes: rook#9486
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
@thotz thotz force-pushed the remove-default-value-for-region-s3client branch from 89a7cbe to 5a2f5ad Compare March 7, 2022 05:24
bucketName: ceph-bucket [4]
reclaimPolicy: Delete [5]
```
1. `label`(optional) here associates this `StorageClass` to a specific provisioner.
1. `provisioner` responsible for handling `OBCs` referencing this `StorageClass`.
1. **all** `parameter` required.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention the default here? Otherwise, I'm not sure what it means for Rook to "fill this value".

Suggested change
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, the default is `us-east-1`. Please check example at the end of the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn : I don't what I missed earlier if the user does not define the region, Rook will fill it from Zonegroup from ceph object store context and don't use us-east-1 at all(at least that was my first intention of PR).

us-east-1 is not the default value from RGW(at least not in RGW code base). It is a value that always works mysteriously. Maybe because a lot of s3 clients use this as default for region, but again not defined as default by RGW

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, does that mean rook can just take care of this internally? Why do we need to expose the region setting at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn : The region defined in storage class of libbucket provisioner. Not sure whether Rook can hide it from user, maybe we can ignore the value. But this was available earlier release not sure whether it right to remove the variable. AFAIR I have sent #8766 as per request from the user, he wants to set the custom region for his use case.

@@ -9,7 +9,7 @@ reclaimPolicy: Delete
parameters:
objectStoreName: my-store
objectStoreNamespace: rook-ceph # namespace:cluster
region: us-east-1
# region: us-east-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thotz Since the default is us-east-1, should we just leave this default uncommented?

Suggested change
# region: us-east-1
region: us-east-1

@@ -8,7 +8,7 @@ reclaimPolicy: Retain
parameters:
objectStoreName: my-store # port 80 assumed
objectStoreNamespace: rook-ceph # namespace:cluster
region: us-east-1
# region: us-east-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# region: us-east-1
region: us-east-1

} else {
s3client, err = rgw.NewS3Agent(s3AccessKey, s3SecretKey, s3endpoint, "", true, nil)
s3client, err = rgw.NewS3Agent(s3AccessKey, s3SecretKey, s3endpoint, storeName, true, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the storeName used here, but the zonegroup is used in the example for using a different region? Shouldn't this use the zonegroup as well? Or in the test, the storeName and zonegroup are the same anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should be zone group, since we testing only internal ceph cluster without multisite, I just set directly storeName which is value of zonegroup as well

1. `bucketName` is required for access to existing buckets but is omitted when provisioning new buckets.
Unlike greenfield provisioning, the brownfield bucket name appears in the `StorageClass`, not the `OBC`.
1. rook-ceph provisioner decides how to treat the `reclaimPolicy` when an `OBC` is deleted for the bucket. See explanation as [specified in Kubernetes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#retain)
+ _Delete_ = physically delete the bucket.
+ _Retain_ = do not physically delete the bucket.

##### Example: Custom Region
If user is providing the `region` value other than `us-east-1` then additional CRDs for zonegroup/zone/realm need to be created. For example, here user is providing `region` value as `us-west-1`. So make sure following [multisite related CRDs](ceph-object-multisite-crd.md) are defined as well. The Rook Operator will create required `zonegroup` for RGW server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If user is providing the `region` value other than `us-east-1` then additional CRDs for zonegroup/zone/realm need to be created. For example, here user is providing `region` value as `us-west-1`. So make sure following [multisite related CRDs](ceph-object-multisite-crd.md) are defined as well. The Rook Operator will create required `zonegroup` for RGW server.
To specify a `region` other than `us-east-1`, a custom zone group must be created that matches the desired region name. In the following example, the `region` is set to `us-west-1`. For more details, see the [related multisite CRDs](ceph-object-multisite-crd.md).

Comment on lines +516 to +522
if len(sc.Parameters[key]) > 0 {
p.region = sc.Parameters[key]
} else {
// If user does not define, then set region to Zonegroup, since RGW internally maps
// aws region to ZoneGroup
p.region = p.objectContext.ZoneGroup
}
Copy link
Member

@BlaineEXE BlaineEXE Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused. What is the point of overriding region if the default is mapped to ZoneGroup and if the user has to create a ZoneGroup resource if they want to override region? Would it be simpler to just remove this parameter and rely on users to create a ZoneGroup instead?

Is this only useful if there are multiple zone groups? What happens in the case where there are multiple zone groups? Then what is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE : region is an optional value for storage-class. If it is not defined then we need to fill up with Zonegroup from object store context, This value changes in different scenarios:
1.) normal cluster, it is name of object store
2.) external cluster it is default
3.) if multiple zones are configured then it depends on value set in object-store CR.

If the user wants to use a custom region then he defines Zonegroup that's different scenario. Hope this avoids the confusion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all three of those cases, I don't think we need to have the region parameter.

  1. If external cluster with no zone group defined, use default
  2. If external cluster with external zone group defined, use the zone group
  3. If internal cluster with no zone group defined, use name of object store
  4. If internal cluster with zone group defined, use the zone group

Is this not correct? Are there any cases missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE : Please see #9754 (comment) why the region is needed.
Above cases are right, but please note 2.) is not support in Rook atm

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can fill in a default value for "region" from the zone group, is region even a valid parameter?

bucketName: ceph-bucket [4]
reclaimPolicy: Delete [5]
```
1. `label`(optional) here associates this `StorageClass` to a specific provisioner.
1. `provisioner` responsible for handling `OBCs` referencing this `StorageClass`.
1. **all** `parameter` required.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to the [zone group](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.

bucketName: ceph-bucket [4]
reclaimPolicy: Delete [5]
```
1. `label`(optional) here associates this `StorageClass` to a specific provisioner.
1. `provisioner` responsible for handling `OBCs` referencing this `StorageClass`.
1. **all** `parameter` required.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not very clear. What is the region set to if the user doesn't define it? "Accordingly" isn't clear enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE : earlier version was Rook Operato fill this value from object store context internally. But @travisn suggested it better not to give internal code details for the end-user. Hence changed like this. I can revert back the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the text "fill this value from object store context internally" doesn't explain to the user what the value will be. In order for the user to understand when they need to modify the value, they must understand what the default value is. In words good for a user, what value will be the default?

This is also assuming that region is even a necessary parameter to have still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will update it again next push

bucketName: ceph-bucket [4]
reclaimPolicy: Delete [5]
```
1. `label`(optional) here associates this `StorageClass` to a specific provisioner.
1. `provisioner` responsible for handling `OBCs` referencing this `StorageClass`.
1. **all** `parameter` required.
1. `region`(optional) defines the region in which the bucket should be created. For RGW, this maps to [zonegroup](https://docs.ceph.com/en/latest/radosgw/multisite/#zone-groups) of the server. If user does not define `region`, Rook will fill this value accordingly. Please check example at the end of the doc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to the example in the doc for users.

1. `bucketName` is required for access to existing buckets but is omitted when provisioning new buckets.
Unlike greenfield provisioning, the brownfield bucket name appears in the `StorageClass`, not the `OBC`.
1. rook-ceph provisioner decides how to treat the `reclaimPolicy` when an `OBC` is deleted for the bucket. See explanation as [specified in Kubernetes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#retain)
+ _Delete_ = physically delete the bucket.
+ _Retain_ = do not physically delete the bucket.

##### Example: Custom Region
If user is providing the `region` value other than `us-east-1` then additional CRDs for zonegroup/zone/realm need to be created. For example, here user is providing `region` value as `us-west-1`. So make sure following [multisite related CRDs](ceph-object-multisite-crd.md) are defined as well. The Rook Operator will create required `zonegroup` for RGW server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do users need to create all of the resources, or only zone group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly zonegroup is not an independent resource. Rooks mandates to have realm for zonegroup and in RGW does not make sense to have zonegroup without zones (zonegroup is collection of zones and zone is collection of RGW servers in short)

@alimaredia alimaredia self-requested a review March 7, 2022 22:52
@alimaredia
Copy link
Contributor

@thotz I was taking at glance at this PR and I don't quite understand the use case/problem this is trying to solve, even after reading the doc changes. Could you summarize what the object-store/obc's couldn't do before this PR was written and what the PR solves?

@thotz
Copy link
Contributor Author

thotz commented Mar 8, 2022

@thotz I was taking at glance at this PR and I don't quite understand the use case/problem this is trying to solve, even after reading the doc changes. Could you summarize what the object-store/obc's couldn't do before this PR was written and what the PR solves?
@alimaredia : Please refer #9754 (comment) for details

@thotz thotz requested review from BlaineEXE and travisn March 8, 2022 06:31
@thotz
Copy link
Contributor Author

thotz commented Mar 8, 2022

@BlaineEXE @travisn : I have minor nits need to address in this PR. But before refreshing the PR I want to make sure everyone is on the same page

@BlaineEXE
Copy link
Member

BlaineEXE commented Mar 8, 2022

@BlaineEXE @travisn : I have minor nits need to address in this PR. But before refreshing the PR I want to make sure everyone is on the same page

It is still unclear what is wrong currently. The description in #9754 (comment) isn't quite enough for me to understand. Please be very clear about whether behavior is broken or not. Is there broken behavior? Can you make a test case that causes any failures? If yes, outline the failing case.
If we can't make a failing test case, then why are we making the code more complex? Wouldn't that mean the parameter is just unnecessary?

Once we understand what is wrong, we can do a much better job of reviewing the PR for behavior and usability and not merely code.

@thotz
Copy link
Contributor Author

thotz commented Mar 9, 2022

@BlaineEXE @travisn : I have minor nits need to address in this PR. But before refreshing the PR I want to make sure everyone is on the same page

It is still unclear what is wrong currently. The description in #9754 (comment) isn't quite enough for me to understand. Please be very clear about whether behavior is broken or not. Is there broken behavior? Can you make a test case that causes any failures? If yes, outline the failing case. If we can't make a failing test case, then why are we making the code more complex? Wouldn't that mean the parameter is just unnecessary?

Once we understand what is wrong, we can do a much better job of reviewing the PR for behavior and usability and not merely code.
@BlaineEXE : there are two commits in this PR :

  • replacing instance of us-east-1 with zone group of RGW server
  • document how to use custom region in storage class of OBC for the user

The second part is clear I guess. The reason for the why region is needed is explained in earlier replies

Now talking about the first part, most of the cases will work with current(i.e using us-east-1) but it is semantically correct to use zone-group instead.It is based on @cbodley's suggestion.
I have added a test case with the invalid region for the s3 client https://github.com/rook/rook/pull/9754/files#diff-5f2c0f5640c927aba71e73e05b71d0f3116c972f7e0f99a40c6e2b1a986b0431R210. EIther zonegroup or us-east-1 as the region for s3 client the request will succeed, so not sure whether it requires to have test cases based on it, If need I can add those as well

@BlaineEXE
Copy link
Member

@BlaineEXE @travisn : I have minor nits need to address in this PR. But before refreshing the PR I want to make sure everyone is on the same page

It is still unclear what is wrong currently. The description in #9857 isn't quite enough for me to understand. Please be very clear about whether behavior is broken or not. Is there broken behavior? Can you make a test case that causes any failures? If yes, outline the failing case. If we can't make a failing test case, then why are we making the code more complex? Wouldn't that mean the parameter is just unnecessary?
Once we understand what is wrong, we can do a much better job of reviewing the PR for behavior and usability and not merely code.
@BlaineEXE : there are two commits in this PR :

  • replacing instance of us-east-1 with zone group of RGW server
  • document how to use custom region in storage class of OBC for the user

The second part is clear I guess. The reason for the why region is needed is explained in earlier replies

@thotz as I explained in the message you replied to, it is not clear to those of us trying to understand what is going on. Please summarize again, and be thorough. Several of us have concerns about this approach, and we need better information to make sure we are applying the correct fix for this.

Now talking about the first part, most of the cases will work with current(i.e using us-east-1) but it is semantically correct to use zone-group instead. It is based on @cbodley's suggestion.

From that discussion, it has become clear to me that there is something wrong with the region setting, but I am not confident that the current proposed changes are the best solution. I still want some sort of proof that simply removing the field as a configurable option isn't a better solution.

I have added a test case with the invalid region for the s3 client https://github.com/rook/rook/pull/9754/files#diff-5f2c0f5640c927aba71e73e05b71d0f3116c972f7e0f99a40c6e2b1a986b0431R210. EIther zonegroup or us-east-1 as the region for s3 client the request will succeed, so not sure whether it requires to have test cases based on it, If need I can add those as well

I should have been more clear. I am not interested in seeing a unit test. Unit tests cannot show the actual behavior of a running RGW. I want to see the details for creating an end-to-end scenario where there is a failure.

From the discussion linked, it seems that we are be able to induce some failure by setting region to some value other than us-east-1. If changing this value is the only way to induce a failure, then the right solution is probably to ignore the region option in Rook.

In order to determine if we need changes other than removing the value entirely, we want to need to have a failing scenario with region left unspecified (it gets its default value). If we can make this failing scenario, then we need better defaults (but we still may not need the region option).

Once we have better defaults, the question becomes: can we make another failing scenario (internal or external ceph cluster) such that we can't compute the default ourselves? If we can, then users will need an option to override region. If we can't then the updated computation is all we needed, and we can ignore the region setting.

@BlaineEXE
Copy link
Member

This documentation for RHCS 3 states that the RGW now uses the terminology "zone-group" instead of "region". https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/object_gateway_guide_for_red_hat_enterprise_linux/rgw-multisite-rgw

Since Rook now has multisite configs zone/zonegroup/realm, I think we probably no longer need the "realm" option in the OBC storageclass. I think it's probably vestigial. I think that we likely only need to figure out what the zone group is and use that value.

My best guess at what a failing scenario would be is to :

  1. Create a ceph cluster (cluster A) with an RGW using a non-"default" zone group
  2. Create a Rook cluster in external mode (cluster B) whose CephObjectStore references cluster A, and do not set zone/zonegroup info.

I suspect this scenario will cause a failure because the S3 client in cluster B won't be able to connect properly to the RGW in cluster A since the zone group in cluster A is not "default". (This PR did some related work recently: #9751)

From the information I have, I think the most appropriate fix for this would be to make sure that creating a zone and zone group that reference the external cluster are created in cluster B. And we must also update docs and examples to remove the "realm" config.

@alimaredia has an open PR to better allow configuring zone/zonegroup endpoints for external clusters. I'm not sure if that PR is directly related to this discussion. #9622

@alimaredia does this breakdown make some sense to you?

@thotz
Copy link
Contributor Author

thotz commented Mar 14, 2022

After discussing with developers from Rook and RGW planning to close this PR infavour of #9906

@thotz thotz closed this Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants