-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batman-adv: Multicast eventually not forwarded from/to clients on supernodes on v2023.x? #3059
Comments
Thanks for the report! I really appreciate these investigations. Especially as the next mechanism regarding multicast is going to land in batman-adv soon it is a good idea to double check the current mechanism. tl;dr: Couldn't reproduce the issue yet, could need further guidance. I've flashed the latest stable Freifunk Aachen firmware on a CPE210 (gluon-ffac-v2023.1.1-1-tp-link-cpe210-v1-sysupgrade.bin) and performed the following tests: bridge multicast wakeup call feature/workaround: Seems to work as expected, I see the additional packet exchange (a special ICMPv6 echo request/reply and a subsequent unicasted MLD query) between the CPE210 and an Android 9 phone. Multicasted ICMPv6 Echo request: I've tested sending 300 ICMPv6 echo request via multicast from my Linux/Debian laptop to the IPv6 solicited-node multicast address of the (default) gateway I got assigned to over 25 minutes over WiFi:
All received a valid response. Wireshark I/O graph: pcapng capture file: icmpv6-mc-echo-1500s pcapng not-a-png Black line, at 2: Number of packets per 5 seconds So seems to look fine at least in this scenario? ICMPv6 Neighbor Discovery / Solicitation: I've then also tested sending 300 ICMPv6 Neighbor Solicitations via multicast from my Linux/Debian laptop to the IPv6 solicited-node multicast address of the (default) gateway I got assigned to over 25 minutes over WiFi. I've used the ipv6toolkit for that.
All except one neighbor solicitation received a response via an ICMPv6 Neighbor Advertisement: pcapng capture file: icmpv6-nd-1500s pcapng not-a-png Black line, at 2: Number of packets per 5 seconds So seems to look fine at least in this scenario? Hence, I could need a little more guidance on how to reproduce the issue. |
Last week, someone from our community reported this issue again, and I had broken IPv6 due to this too. I tried to debug it with a separate supernode, but somehow could not reproduce it for a few days, as I probably need a larger mesh cloud size than 2 (which is hard without resulting in people being affected..) And I am happy, that I now know a functional workaround |
Bug report
In FFAC I experienced broken ipv6 neighbor discovery after some time for predominantly wifi clients on gluon v2023.x (not entirely sure about affected versions - see below).
First I suspected a wrong supernode configuration being the case.
But this issue also appeared on the local mesh ipv6 (being
fdac::/64
in FFAC), inbetween the mesh.Pinging
fe80:
addresses from the supernode to the client did work fine though.The IP-Addresses of the Gluon-Nodes themselves have never been affected yet.
I could also reproduce this issue on a client behind a gluon v2022.1.x - connected to a supernode running batman-adv 2023.x (v2023.0 or v2023.2 don't remember it)
It does not affect IPv4 traffic, only ipv6 neighbor discovery (neighbor advertisement/soliciation).
This results in broken ipv6 for clients, which leads to a buggy web connection.
An eventually related problem could be #2854
I did not see Neighbor discovery packages of the router sent to the client on the gluon-node in bat0 though. So this looks differently (or should I have looked into
local-node
/any
?).Fixes - Disabling multicast optimizations
This issue has been discussed on IRC. @T-X asked me to disable multicast optimizations in batman using
batctl multicast_forceflood 1
which did help.To get it working throughout the mesh domain I had to set
multicast_forceflood
on all nodes inbetween to receive the multicast from my laptopfdac::
from its nextnodefdac::
from the mesh-vpn node - but not from my laptopmulticast_forceflood
setmulticast_forceflood
on my nextnode too - I could ping the clientsfdac::
Reproducing
To try to reproduce, I would use a >=v2023.1 node, connect through wifi as client and ping ipv6 addresses or fdac addresses of other nodes in the same mesh domain/segment. If this works throughout multiple >3 days, I don't think you have this issue.
It helps to use freifunk as the main connection while being in home-office, to monitor that.
Affected Versions/Devices
I could not yet fix this to an affected (batman-adv) version - as the supernode might have been the cause when I reproduced this on gluon v2022.x nodes - and the gluon-node with batman-adv v2023.x might have been the cause when the supernode had v2022.
But I suspect it to be somewhat batman-adv v2023.x related 🤷
It has been reproduced on Supernodes running Debian 11/12, batman-adv (v2022.0)/v2023.0/v2023.2
It has been seen on gluon (v2022.x), v2023.1, v2023.1.1, master
It has been seen on FireTV stick, Debian 12 Laptop, Samsung phone and others.
Somehow no other community did yet report similar issues.
What is the expected behaviour?
IPv6 neighbor discovery should be working on clients
Gluon Version:
v2023.x
Site Configuration:
https://github.com/ffac/site
Next Steps - Further investigation
This issue is used to publicly track my issue and document further advancements.
Things to do when this issue can be seen:
batctl tg -m
on remote nodes andbatctl tl -m
on the node serving this client.33:33:ff:<last-3-bytes-of-unicast-MAC>
bridge wakeup-call feature
is disabled as no special ICMPv6 echo requests or MLD unicast queries were seen in pcap dump (then this would only affect clients on wifi? - (I am not sure if this is plausible as I had to setmulticast_forceflood
on all nodes inbetween):The text was updated successfully, but these errors were encountered: