Multiple services are not spread with L2 #2284

nvlan · 2024-02-13T23:21:51Z

Is your feature request related to a problem?

When I create multiple services that have the same endpoint, metallb does some hashing of nodeName + serviceIP and uses this to pick which node should win the election and announce a given IP.
However, I found that most times metallb end up choosing the same node to announce more than one IP: in a cluster with 7 nodes and 3 ingress loadbalancers, the same node hosts two IPs while the remaining IP is scheduled to a different node.

Describe the solution you'd like

In the hopes to avoid most traffic using a single node, the election algorithm should be random enough to pick different nodes for different services: in my setup I use ingress-nginx and create three services that all have the same endpoint, only each service has its own contiguous IP.
I was wondering if it would make sense to use the stable version of sort.Slice in here: https://github.com/fedepaol/metallb/blob/3df9cbd68b9d24edeaba6d84f30fcf3b446a7289/speaker/layer2_controller.go#L116-L126 since it is the only explanation I could thing of for this behavior.

Additional context

No response

I've read and agree with the following

I've checked all open and closed issues and my request is not there.
I've checked all open and closed pull requests and my request is not there.

fedepaol · 2024-02-14T13:05:36Z

I am not sure using a stable sort would help here (if I got your suggestion right), as the keys are different by construction (same ip, different node name, hashed).
Changing the behaviour to something more even would introduce a state that we currently don't want to introduce.

I can try do do a few simulations to see if we are missing something obvious with the current logic, or if you were just not lucky.

nvlan · 2024-02-14T16:43:14Z

That was my suggestion, yes. I had three separate L2 advertisements each for a pool with a single IP, and I used a nodeSelector label to pin said IPs to nodes 1-3. After an outage (labels were not reapplied after the three first nodes were rebuilt), I simply added a single pool with three IPs and a single L2 advertisement that uses all nodes. When I deployed this new config to 5 different clusters, I see a curious pattern: first IP is assigned to the third node, second and third are assigned to the last node (each cluster has between 3 and 9 nodes, names are worker[10-19]-domain).
Please let me know if you need any further info, thanks!

nvlan added the enhancement label Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple services are not spread with L2 #2284

Multiple services are not spread with L2 #2284

nvlan commented Feb 13, 2024 •

edited

fedepaol commented Feb 14, 2024

nvlan commented Feb 14, 2024 •

edited

Multiple services are not spread with L2 #2284

Multiple services are not spread with L2 #2284

Comments

nvlan commented Feb 13, 2024 • edited

Is your feature request related to a problem?

Describe the solution you'd like

Additional context

I've read and agree with the following

fedepaol commented Feb 14, 2024

nvlan commented Feb 14, 2024 • edited

nvlan commented Feb 13, 2024 •

edited

nvlan commented Feb 14, 2024 •

edited