Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico-node warning: "Configured MTU is larger than detected host interface MTU" seems misleading #8674

Open
bradbehle opened this issue Mar 29, 2024 · 2 comments

Comments

@bradbehle
Copy link
Contributor

Expected Behavior

I expected that this check of MTU values would compare the Calico MTU with the MTU of the host interface being used for Calico

Current Behavior

It appears this check of MTU values instead compare the Calico MTU with the MTU of a different host interface than the one being used for Calico. Possibly because our IP_AUTODETECTION_METHOD regex does match multiple interfaces on the node?

Possible Solution

Only compare the Calico MTU with the MTU of the host interface being used for Calico

Steps to Reproduce (for bugs)

Run Calico on a cluster where the node network interfaces are configured something like this (I replaced some information in here with X's):

sh-4.4# ip a s 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
...
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    inet 10.X.X.X/X brd X.X.X.X scope global bond0
       valid_lft forever preferred_lft forever
    inet6 XXXX::XXXX:XXXX:XXXX:XXXX/64 scope link 
       valid_lft forever preferred_lft forever
7: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:42:fc:5d:9c:f2 brd ff:ff:ff:ff:ff:ff
    inet 52.X.X.X/25 brd X.X.X.X scope global bond1
       valid_lft forever preferred_lft forever
    inet6 XXXX::XXXX:XXXX:XXXX:XXXX/64 scope link 
       valid_lft forever preferred_lft forever
...

bond0 is the private network interface (bonded from eth0 and eth2), and the one we want to use for Calico
bond1 is the public network interface (bonded from eth1 and eth3)

Set the calicoMTU to 8980 via the veth_mtu: "8980" line in the calico-config config map (this cluster uses a manifest Calico deploy)
Set the IP_AUTODETECTION_METHOD to interface=(^bond0$|^eth0$|^ens.*$|^enc.*$)

When Calico starts, the following Warning message appears (I included a few other messages for context, note that the 10.X.X.X/X in these logs matches the bond0 IP above):

...
2024-03-01 16:41:13.711 [INFO][9] startup/autodetection_methods.go 117: Using autodetected IPv4 address 10.X.X.X/X on matching interface bond0
...
2024-03-01 16:41:15.380 [WARNING][85] felix/int_dataplane.go 1057: Configured MTU is larger than detected host interface MTU hostMTU=1500 mtu=8980
2024-03-01 16:41:15.380 [INFO][85] felix/int_dataplane.go 1059: Determined pod MTU mtu=8980

So I think we have the Calico MTU set to a reasonable value, and it seems to work fine (host MTU for bond0 is set to 9000, calico MTU set to 8980 to give 20 bytes for the IPinIP encapsulation packet header.

Context

This was not a huge problem for us, but it did delay our troubleshooting of a performance problem. The WARNING log message concerned us, making us think maybe we had it set the MTUs incorrectly, or that the host MTU for bond0 wasn't really set to 9000 as we thought it should be. This was more just something annoying that wasted a bit of our time, so I wrote it up here so if it can be fixed it might save others from some wasted time.

Your Environment

@coutinhop
Copy link
Contributor

@bradbehle thanks your reporting this! Can I ask you post full logs, preferably also enabling debug logging (by setting logSeverityScreen to Debug in the default FelixConfiguration)? What is going on is that findHostMTU() does use the smallest MTU from the autodetected interfaces to be safe:

func findHostMTU(matchRegex *regexp.Regexp) (int, error) {

Though from the intf output you posted, it seems like the regex you use does only get interfaces with MTU=9000. Was that different at some point in time by any reason? Could you also post the full output of ip addr s (redacted of course, I'd be interested in seeing all interfaces you have and their MTUs)?

Thanks!

@bradbehle
Copy link
Contributor Author

@coutinhop Unfortunately I don't have access to this cluster any more, so I can't grab any logs or the full ip addr s output. I'll see if I can get access again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants