Reconsider Horizontal Pod Autoscaler docs for CephObjectStore #10001

BlaineEXE · 2022-04-05T20:40:35Z

I am concerned the documentation for KEDA Horizontal Pod Autoscaler (HPA) documented in the CephObjectStore won't work in production. I believe any time the CephObjectStore is reconciled, Rook will reset the replicas to 1. This could result in disrupting user workloads.

Originally posted by @BlaineEXE in #9202 (review)

We should investigate whether this is a concern and reasses. I believe the example is a good one, but we may need to design a Rook feature to assist with this use-case. If we decide not to proceed with a Rook feature and there is an issue, we may want to revert the commits.

In any scenario, we should remove the KEDA CRDs from the current location in github-action-helper.sh since they are of no use to our CI.

The text was updated successfully, but these errors were encountered:

thotz · 2022-04-06T04:38:45Z

I am concerned the documentation for KEDA Horizontal Pod Autoscaler (HPA) documented in the CephObjectStore won't work in production. I believe any time the CephObjectStore is reconciled, Rook will reset the replicas to 1. This could result in disrupting user workloads.

I am not quite sure how reconciliation will impact HPA, by default HPA can work with memory and CPU with any special configuration, so I highly doubt reconciliation of resources can affect HPA.
Do the Rook operator restart after HPA scaling validates this use case?

Originally posted by @BlaineEXE in #9202 (review)

We should investigate whether this is a concern and reasses. I believe the example is a good one, but we may need to design a Rook feature to assist with this use-case. If we decide not to proceed with a Rook feature and there is an issue, we may want to revert the commits.

In any scenario, we should remove the KEDA CRDs from the current location in github-action-helper.sh since they are of no use to our CI.

During CI testing it validates all the yamls in deploy/examples since KEDA CRDs were not installed, creation of the keda-rgw.yaml was failing. Other way to work around it via skip-yaml. But @travisn suggested to do it via installing CRDs, please refer #9202 (comment)

thotz · 2022-04-14T14:09:17Z

I am concerned the documentation for KEDA Horizontal Pod Autoscaler (HPA) documented in the CephObjectStore won't work in production. I believe any time the CephObjectStore is reconciled, Rook will reset the replicas to 1. This could result in disrupting user workloads.

I am not quite sure how reconciliation will impact HPA, by default HPA can work with memory and CPU with any special configuration, so I highly doubt reconciliation of resources can affect HPA. Do the Rook operator restart after HPA scaling validates this use case?

From my rook-operator restart testing, Rook operator resets the replica count 1 as you mentioned and I guess HPA increases it suddenly to previous value replica value for the workload

Originally posted by @BlaineEXE in #9202 (review)
We should investigate whether this is a concern and reasses. I believe the example is a good one, but we may need to design a Rook feature to assist with this use-case. If we decide not to proceed with a Rook feature and there is an issue, we may want to revert the commits.
In any scenario, we should remove the KEDA CRDs from the current location in github-action-helper.sh since they are of no use to our CI.

During CI testing it validates all the yamls in deploy/examples since KEDA CRDs were not installed, creation of the keda-rgw.yaml was failing. Other way to work around it via skip-yaml. But @travisn suggested to do it via installing CRDs, please refer #9202 (comment)

BlaineEXE · 2022-04-14T14:48:02Z

We should update our CephObjectStore reconciliation. I think a small design would make sense. Questions I have:

Should Rook continue to create one deployment per RGW for its own scaling, or should it update the replica count on a single deployment?
Should Rook only set a replica count on RGW deployments only if they don't exist and allow the value to be different for any pre-existing deployments? Or is there a different strategy Rook should use to perform deployment updates without disrupting replicas?
Possible other strategy: can KEDA adjust the CephObjectStore server count directly rather than modifying the deployment replica count?

travisn · 2022-04-14T21:44:33Z

@thotz KEDA updates the replica count on the rgw deployment, right? So all Rook will really notice during a reconcile is that the deployment replica count is incorrect? In that case, I'm thinking we should do the following to keep it simple to support KEDA:

When the object store is initialized, create the rgw deployment with replicas according to the setting in the CR (gateway.instances)
During future reconciles, generate the pod spec as we always do, but set the deployment replicas to the replicas of the existing rgw deployment.

Another consideration is a new property to the object CR for allowAutoExpansion: true, which would explicitly allow this behavior. But this setting doesn't really seem necessary if we just document the behavior.

github-actions · 2022-06-14T20:02:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

BlaineEXE · 2022-06-14T20:52:43Z

Not a wontfix, but probably not able to prioritize until 1.11+

github-actions · 2022-08-14T20:01:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2022-08-21T20:02:11Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

BlaineEXE added docs object Object protocol - S3 labels Apr 5, 2022

BlaineEXE assigned thotz Apr 5, 2022

thotz mentioned this issue May 24, 2022

docs: design for handling reconcilation if cephobjectstore for HPA #10316

Closed

7 tasks

github-actions bot added the wontfix label Jun 14, 2022

BlaineEXE removed the wontfix label Jun 14, 2022

github-actions bot added the wontfix label Aug 14, 2022

github-actions bot closed this as completed Aug 21, 2022

thotz added keepalive and removed wontfix labels Aug 22, 2022

thotz reopened this Aug 22, 2022

travisn removed the keepalive label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider Horizontal Pod Autoscaler docs for CephObjectStore #10001

Reconsider Horizontal Pod Autoscaler docs for CephObjectStore #10001

BlaineEXE commented Apr 5, 2022 •

edited

thotz commented Apr 6, 2022

thotz commented Apr 14, 2022

BlaineEXE commented Apr 14, 2022 •

edited

travisn commented Apr 14, 2022

github-actions bot commented Jun 14, 2022

BlaineEXE commented Jun 14, 2022

github-actions bot commented Aug 14, 2022

github-actions bot commented Aug 21, 2022

Reconsider Horizontal Pod Autoscaler docs for CephObjectStore #10001

Reconsider Horizontal Pod Autoscaler docs for CephObjectStore #10001

Comments

BlaineEXE commented Apr 5, 2022 • edited

thotz commented Apr 6, 2022

thotz commented Apr 14, 2022

BlaineEXE commented Apr 14, 2022 • edited

travisn commented Apr 14, 2022

github-actions bot commented Jun 14, 2022

BlaineEXE commented Jun 14, 2022

github-actions bot commented Aug 14, 2022

github-actions bot commented Aug 21, 2022

BlaineEXE commented Apr 5, 2022 •

edited

BlaineEXE commented Apr 14, 2022 •

edited