Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Allow child of cluster_impl LB to change #10091

Merged
merged 3 commits into from Apr 24, 2023

Conversation

temawi
Copy link
Contributor

@temawi temawi commented Apr 21, 2023

Under normal conditions the child LB of ClusterImplLoadBalancer does not fluctuate, based on the field used to configure load balancing in the xDS Cluster proto it is either:

1. WrrLocalityLoadBalancer if the newer load_balancing_policy field is used
2. WeightedTargetLoadBalancer if the legacy lb_policy field is used

(The premise set here is off, see comment)

ClusterImplLoadBalancer currently assumes that this child does not change and so does not change the child LB when the name resolver sends an update. If the control plane does switch to using a different field for LB config, that update will have an LB config meant for the other child LB type. This will result in a ClassCastException and a channel panic.

To address this, ClusterImplLoadBalancer will now use GracefulSwitchLoadBalancer and makes sure if the child policy changes the correct LB implementation is switched to.

Closes #10006

@temawi
Copy link
Contributor Author

temawi commented Apr 21, 2023

@ejona86 FYI

Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
@temawi temawi force-pushed the clusterimpl-graceful-switch branch from 00995af to d50d989 Compare April 21, 2023 22:40
@larry-safran
Copy link
Contributor

Everything looks good except that GracefulSwitchLoadBalancer is marked experimental. Since ClusterImplLoadBalancer isn't, I believe that it isn't supposed to depend on an experimental class.

@larry-safran
Copy link
Contributor

I've added GracefulSwitchLoadBalancer to the top of the stabilization agenda.

@temawi
Copy link
Contributor Author

temawi commented Apr 21, 2023

Everything looks good except that GracefulSwitchLoadBalancer is marked experimental. Since ClusterImplLoadBalancer isn't, I believe that it isn't supposed to depend on an experimental class.

It's already being used by several other LBs, so I think we can use it here.

+1 to your other comment about stabilizing it.

@ejona86
Copy link
Member

ejona86 commented Apr 21, 2023

Everything looks good except that GracefulSwitchLoadBalancer is marked experimental. Since ClusterImplLoadBalancer isn't, I believe that it isn't supposed to depend on an experimental class.

This is apples and oranges. We can use experimental APIs, because we can change the callers at the same time as breaking the API. That's why it is safe(r) for us to use @Internal.

ClusterImplLoadBalancer is not experimental. The class is not exposed at all. The only way the class would be exposed is via the LB registry, and its name there is also experimental. And the LoadBalancer API itself is experimental.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks to be the right shape.

xds/src/main/java/io/grpc/xds/ClusterImplLoadBalancer.java Outdated Show resolved Hide resolved
@temawi temawi merged commit 68b67b6 into grpc:master Apr 24, 2023
14 checks passed
@temawi temawi deleted the clusterimpl-graceful-switch branch April 24, 2023 18:01
temawi added a commit to temawi/grpc-java that referenced this pull request Apr 24, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
@temawi
Copy link
Contributor Author

temawi commented Apr 25, 2023

While the problem is explained correctly, the premise is off.

There is no alternating between WrrLocalityLoadBalancer or WeightedTargetLoadBalancer based on what the control plane load balancer configuration is. In fact when the load_balancing_policy field is used, the control plane has complete control over the child policy. Any change in LB type from the control plane can trigger this problem.

temawi added a commit to temawi/grpc-java that referenced this pull request Apr 25, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
temawi added a commit to temawi/grpc-java that referenced this pull request Apr 25, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
temawi added a commit that referenced this pull request Apr 25, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
temawi added a commit that referenced this pull request Apr 25, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
temawi added a commit that referenced this pull request Apr 25, 2023
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:

1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
   is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used

`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.

To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clusterimpl: Switch Child LB to Graceful Switch
3 participants