Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically enable requesting broadcast reply on ipvlan #32

Open
deliciouslytyped opened this issue Apr 21, 2021 · 23 comments · May be fixed by #33
Open

Automatically enable requesting broadcast reply on ipvlan #32

deliciouslytyped opened this issue Apr 21, 2021 · 23 comments · May be fixed by #33
Assignees

Comments

@deliciouslytyped
Copy link

deliciouslytyped commented Apr 21, 2021

The dhcpcd.conf manual states:

     broadcast
             Instructs the DHCP server to broadcast replies back to the client.  Normally this is only set for non-Ethernet interfaces, such as FireWire and InfiniBand.  In most cases, dhcpcd will set this automatically.

Would it be possible to do this for ipvlan interfaces as well?

Unicast packets don't make it through the device unless it's configured. I don't know if this can be worked around, but by design it only demuxes based on layer 3 data.

@rsmarples
Copy link
Member

I don't see a reason why not.

We would need a way to detect if the interface is ipvlan or not.
Do you would how this could be achieved?

We would also need to adjust the default IAID for the interface as well as the MAC address is shared.
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcpcd.c#L471

@deliciouslytyped
Copy link
Author

I'm having a hell of a time figuring ipvlans out myself, so I don't really know. How do you detect the other stuff?

@rsmarples
Copy link
Member

/sys/class/net might have something.
dhcpcd works out bridge and tap devices like that:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if-linux.c#L166
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if-linux.c#L328

@deliciouslytyped
Copy link
Author

The ipvlan source is here if that helps any https://github.com/torvalds/linux/tree/master/drivers/net/ipvlan

@ido
Copy link
Member

ido commented Apr 21, 2021

The specific info around whether broadcast via ipvlan is available might be in rtnetlink(7) or netdevice(7) in Linux.

I thought the (non-portable) way of determining whether an interface supports broadcast generically was to do SIOCGIFFLAGS on IFF_BROADCAST in netdevice. If memory serves, that's how ifconfig decides whether to show BROADCAST or not next to an interface's name, for example.

Bringing up an L2 ipvlan interface on my test box, looks promising:

$ sudo ip link add link eth0 name eth0ipvl1 type ipvlan mode l2
$ ifconfig eth0ipvl1 | head -1
eth0ipvl1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
$ cat /sys/class/net/eth0ipvl1/broadcast
ff:ff:ff:ff:ff:ff

While it has a lower_eth0 symlink, at least upon first glance, I don't see anything promising in /sys/class/net/eth0ipvl1 to identify the interface as ipvlan. iproute2 seems to use netlink to detect ipvlan. I'll probably have to do the same on Linux hosts.

@deliciouslytyped
Copy link
Author

deliciouslytyped commented Apr 22, 2021

You have a good point in that broadcast isn't available in all ipvlan modes IIUC, I forgot about that because I've only been using the L2 mode. The initial problem was that I wasn't able to get the DHCP replies at all over unicast; because you can't receive to IPs that aren't yours (IIUC).

@rsmarples
Copy link
Member

IFF_BROADCAST Valid broadcast address set.

Doesn't mean that we need to set the broadcast bit in DHCP messages sadly.

@rsmarples
Copy link
Member

While it has a lower_eth0 symlink, at least upon first glance, I don't see anything promising in /sys/class/net/eth0ipvl1 to identify the interface as ipvlan. iproute2 seems to use netlink to detect ipvlan. I'll probably have to do the same on Linux hosts.

You can base new code on the netlink code to get the wireless SSID found here:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if-linux.c#L1405

@ido
Copy link
Member

ido commented Apr 22, 2021

The initial problem was that I wasn't able to get the DHCP replies at all over unicast; because you can't receive to IPs that aren't yours (IIUC).

I just want to make sure I understand your setup here, because your statement here is making me question whether this is solvable with DHCP broadcast replies.

Disclaimer just to get the obvious out of the way: ipvlan shares a MAC address with its parent interface. This makes DHCP tricky. macvlan does not have this problem, and may be a cleaner choice for you depending on your setup, since each interface would then have its own MAC address.

Help me replicate your setup, so that I can get a sense if this will work or not, can you share how your interfaces are configured?

Also:

  • Which ipvlan mode are you using? (Note: L2 responds to ARP for the child interface IP address(es) and can receive bcast. L3 and L3S modes do not forward broadcast traffic and rely on the default netns nftables rules to get traffic out or in/out, respectively, if I remember correctly - this might have changed in the intervening years.) You answered this already - L2 mode. Excellent, thanks.
  • The DHCP unicast replies are arriving at the host/parent interface because it shares a MAC address with the ipvlan interface(s) running the DHCP client(a), right? Can you see the DHCP unicast replies (e.g. offer) on the parent interface with tcpdump, but not on the container's ipvlan interface?
  • Is the problem that you don't have a good way to get those unicast DHCP replies to the correct container/netns interface that requested them in the first place? Furthermore, if you own the container host, is there a way to get the inbound DHCP packets from the parent interface to the ipvlan interfaces using the host firewall (iptables/nftables/bpf), for example by using the DHCP hostname (option 12) field to route them to the appropriate ipvlan interface? (One approach that looks promising is to use a different DHCP client-ID on each ipvlan interface to differentiate them and route the DHCP offer replies back to the correct interface (e.g. using netfilter rules) on that basis.)

@rsmarples
Copy link
Member

rsmarples commented Apr 22, 2021

That is why we need to set the broadcast bit in the DHCP message we sent to instruct the server to broadcast the reply rather than unicasting it.
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcp.c#L790

We need to set the flag somewhere here for ipvlan interfaces:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcpcd.c#L435
OR just set the flag in the DHCP code linked above.

@ido
Copy link
Member

ido commented Apr 22, 2021

That is why we need to set the broadcast bit in the DHCP message we sent to instruct the server to broadcast the reply rather than unicasting it.
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcp.c#L790

I was able to get that working with L2 mode, but with L3/L3S mode ipvlan interfaces, it looks like broadcast packets are not copied from/to the ipvlan interfaces depending on configuration, so I want to make sure I understood @deliciouslytyped's specific interface configuration to replicate it. The outbound DHCP DISCOVER never makes it to the physical switch interface in ipvlan L3 mode, for example, despite the interface being labeled as supporting broadcast. (...which is a completely different problem.)

For example, I saw a container networking implementation a couple years ago that did stateful DHCP relaying between the container interfaces and external DHCP server by tracking DHCP XIDs or (where available) client-IDs hostname (option 12), which seems like a strategy that'd work for all three modes (L2/L3/L3S) and wouldn't require broadcast replies, just a stateful relay (e.g. in netfilter).

We need to set the flag somewhere here for ipvlan interfaces:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcpcd.c#L435
OR just set the flag in the DHCP code linked above.

Thanks for the pointer, that saved me a couple of greps. 😄

@deliciouslytyped
Copy link
Author

deliciouslytyped commented Apr 22, 2021

Seems like some of these replies aren't needed anymore but here they are;

It looks like I failed to mention (since I didn't see a need at the time) that I do in fact have a working DHCP setup when the broadcast flag is set. - by just passing -J or writing broadcast in the config. (Which I have to say is more than dhclient lets you do! - in dhclient its hardcoded and you have to patch the source!)

I just figured it would be reasonable to detect ipvlan additionally to the other two cases, because unicast doesn't work with L2 mode ipvlan unless you already have the IP you're being assigned. (I don't understand how this works with other device either though...)

I don't understand how ipvlan works , but I made a discovery yesterday (honestly, why isn't this stuff documented in the usual places as opposed to some obscure pdf somewhere?): https://people.netfilter.org/pablo/netdev0.1/slides/bandewar-IPvlan-presentation-Netdev01.pdf , that suggests that communication between the master device and slave device doesn't work at all, or at least only in one direction, which matches what I experienced with wireshark.

Similar to the macvlan devices, the traffic to and from the master device cannot  be sent to and from  slaves. In the macvlan   setup  the   problem   can   be   solved   if   the   host   is connected to a switch that allows hair-pin mode. However,not many switches support this mode.Packets from the slave interfaces will reach the master interface   (mostly   in   the   default-ns)   but   the   replies   can’t reach the slave interfaces since the bridge on the master is transparent in the TX mode. This causes TX replies from the master to leave the host and will be lost resulting in broken connectivity.

If you have any good suggestions how to work around this, that would be great, because I didn't understand the suggested workarounds yet, or they don't look very good. The document says macvlan has the same problem though?

The DHCP unicast replies are arriving at the host/parent interface because it shares a MAC address with the ipvlan interface(s) running the DHCP client(a), right? Can you see the DHCP unicast replies (e.g. offer) on the parent interface with tcpdump, but not on the container's ipvlan interface?

Yes the packets are visible on the host interface. (IIUC - they were definitely visible somewhere, which is part of how I figured out that I needed this in the first place)

Is the problem that you don't have a good way to get those unicast DHCP replies to the correct container/netns interface that requested them in the first place? Furthermore, if you own the container host, is there a way to get the inbound DHCP packets from the parent interface to the ipvlan interfaces using the host firewall (iptables/nftables/bpf), for example by using the DHCP hostname (option 12) field to route them to the appropriate ipvlan interface? (One approach that looks promising is to use a different DHCP client-ID on each ipvlan interface to differentiate them and route the DHCP offer replies back to the correct interface (e.g. using netfilter rules) on that basis.)

This is partially answered by the above, but also, I have no idea how to move packets around with iptables. If you have any recommended literature, I'd be happy to hear. It might help alleviate the "master can't talk to slave" problem. (I don't understand why they can "external" packets but not ones hitting the master interface from the same machine?)

I did manually set a client ID, but I don't know if it's strictly needed (or if dhcpcd sets a unique one automatically). containernetworking/cni#17 (comment) brought the setting to my attention, and it makes sense, so I set it. That guy seems to have a clue about how this works.

For example, I saw a container networking implementation a couple years ago that did stateful DHCP relaying between the container interfaces and external DHCP server by tracking DHCP XIDs or (where available) client-IDs hostname (option 12), which seems like a strategy that'd work for all three modes (L2/L3/L3S) and wouldn't require broadcast replies, just a stateful relay (e.g. in netfilter).

I don't know how containers work internally, but it seems there are some basic features, which get combined into what ends up being called "containers". I use systemd-nspawn, because some infrastructure is already provided by my linux distro. I don't know any details but it seems thinner than stuff like docker, so probably pretty minimal. Any daemon or such, I would have to use in addition.

Once you're limited to L3 stuff, I think you start using dhcp relay daemons?

This whole project of mine came from trying to run a proxyDHCP (weird name for what it does?) server so I could run a PXE server in a container, and it works now :)

I actually had a pretty hard time finding my way around tooling in the dhcp space.

@ido
Copy link
Member

ido commented Apr 22, 2021

I just figured it would be reasonable to detect ipvlan additionally to the other two cases, because unicast doesn't work with L2 mode ipvlan unless you already have the IP you're being assigned. (I don't understand how this works with other device either though...)

For L2 ipvlan, setting the broadcast flag makes sense, and I'm planning to put that patch forward.

If you have any good suggestions how to work around this, that would be great, because I didn't understand the suggested workarounds yet, or they don't look very good. The document says macvlan has the same problem though?

I'm looking for alternatives that will work for unicast DHCP OFFERs if you know the container hostnames/client-IDs (not IPs), or are willing to be stateful (DHCP XIDs) and can inject the DHCP replies you're seeing on the host interface into the ipvlan interfaces. That alternative would work across all types of ipvlan interfaces, but requires further research.

The reason one might want to have unicast OFFERs where possible is that with broadcast, all neighboring containers and hosts in the broadcast domain would see the new container. Depending on one's threat model, that may not be desired. (For example, if you allow containers from different untrusted third parties on the same host, that could introduce crosstalk between parties that shouldn't exist.)

macvlan does not share a MAC address with the host interface, so it does not have the same problem - the DHCP OFFER packets destined for the macvlan interface MAC address get bridged over to the macvlan interface MAC address from the physical port. The host interface may need to be in promiscuous mode, or require some ebtables configuration, but it's possible without a DHCP relay or inspecting the DHCP packet beyond the destination MAC. Hope that clears it up.

In the case of an ipvlan, the destination MAC of the DHCP OFFER is the MAC address of the host interface and ipvlan interfaces, so the decision of which interface to send the packet to can't be made by the host based on the destination MAC alone. That'd be why you see it on the host interface, so the trick is just to forward those OFFER packets somehow to the right ipvlan interface (container/netns). And since the ipvlan interface doesn't yet have an IP address, as you point out, you'd need an alternative method to forward them into the container's ipvlan interface, for example based on deeper inspection of the DHCP OFFER - e.g. client ID, hostname, or XID... That's what I was investigating.

The DHCP unicast replies are arriving at the host/parent interface because it shares a MAC address with the ipvlan interface(s) running the DHCP client(a), right? Can you see the DHCP unicast replies (e.g. offer) on the parent interface with tcpdump, but not on the container's ipvlan interface?

Yes the packets are visible on the host interface. (IIUC - they were definitely visible somewhere, which is part of how I figured out that I needed this in the first place)

Excellent, then there is some hope we can forward them into the container somehow... :-D

Is the problem that you don't have a good way to get those unicast DHCP replies to the correct container/netns interface that requested them in the first place? Furthermore, if you own the container host, is there a way to get the inbound DHCP packets from the parent interface to the ipvlan interfaces using the host firewall (iptables/nftables/bpf), for example by using the DHCP hostname (option 12) field to route them to the appropriate ipvlan interface? (One approach that looks promising is to use a different DHCP client-ID on each ipvlan interface to differentiate them and route the DHCP offer replies back to the correct interface (e.g. using netfilter rules) on that basis.)

This is partially answered by the above, but also, I have no idea how to move packets around with iptables. If you have any recommended literature, I'd be happy to hear. It might help alleviate the "master can't talk to slave" problem. (I don't understand why they can "external" packets but not ones hitting the master interface from the same machine?)

I did manually set a client ID, but I don't know if it's strictly needed (or if dhcpcd sets a unique one automatically). containernetworking/cni#17 (comment) brought the setting to my attention, and it makes sense, so I set it. That guy seems to have a clue about how this works.

It looks like we're reading more or less the same things. They're implementing more or less what I'm suggesting above with client IDs. If there were an iptables conntrack extension for DHCP, that would come in super useful here, since you could conntrack by DHCP XID or client ID across ipvlan interfaces, and everything would work out...seems like that's the missing piece for getting L3/L3S ipvlan working properly with unicast (and maybe even L2). (Obviously out of scope for dhcpcd, but if you're feeling pioneering...)

For example, I saw a container networking implementation a couple years ago that did stateful DHCP relaying between the container interfaces and external DHCP server by tracking DHCP XIDs or (where available) client-IDs hostname (option 12), which seems like a strategy that'd work for all three modes (L2/L3/L3S) and wouldn't require broadcast replies, just a stateful relay (e.g. in netfilter).

I don't know how containers work internally, but it seems there are some basic features, which get combined into what ends up being called "containers". I use systemd-nspawn, because some infrastructure is already provided by my linux distro. I don't know any details but it seems thinner than stuff like docker, so probably pretty minimal. Any daemon or such, I would have to use in addition.

Once you're limited to L3 stuff, I think you start using dhcp relay daemons?

Or iptables-/ebpf-based options with to forward DHCP packets into the ipvlan interfaces (from the host) based on DHCP client ID, etc.

This whole project of mine came from trying to run a proxyDHCP (weird name for what it does?) server so I could run a PXE server in a container, and it works now :)

I actually had a pretty hard time finding my way around tooling in the dhcp space.

For what it's worth, I also use systemd-nspawn or lxc, and prefer to avoid heavier container orchestration systems (e.g. Docker) because of their attack surface. You can replace my usage of "container" with "netns" everywhere above.

ido added a commit that referenced this issue Apr 23, 2021
@ido ido linked a pull request Apr 23, 2021 that will close this issue
2 tasks
@deliciouslytyped
Copy link
Author

deliciouslytyped commented Apr 23, 2021

I'm not sure I understood this correctly, and it complicates setup (but sounds like it makes it possible at all); a slave device could be added in the "host" environment next to the master device, and then routing done?

Feels redundant with already having a master device....

Edit: Unless I'm confusing something with my existing setup again, that seems to work for communication between host(master-adjacent) and slave. Though half the point of all this containerization stuff is me trying to simplify the config of the host...
Edit2: meant to link https://people.netfilter.org/pablo/netdev0.1/papers/IPVLAN-The-beginning.pdf earlier

@ido ido self-assigned this Apr 24, 2021
@ido
Copy link
Member

ido commented Apr 24, 2021

It should be doable without another child interface, using netfilter/XDP/eBPF, or possibly even just a relay agent of some sort listening on the master interface and injecting packets into the ipvlan interface.

Anyhow, a draft of the broadcast patch is available in #33. I'm still cleaning it up but it should be functional now, if you want to try it out. Note that you will need to explicitly set the IAID or client ID for ipvlan interfaces in the current version of the patch, because it will otherwise pick up the MAC address to generate the IAID, which for ipvlan will be the same as the IAID of the parent interface and therefore get you the same IP address in the DHCP OFFER as the parent interface...

Unfortunately, for properly namespaced ipvlan interfaces that have the same interface name (e.g. eth0) and MAC address as the parent interface, I think finding a way to generate a unique (and stable across reboots) IAID will be challenging.

I'm thinking about migrating to an approach that generically tracks "child" and "parent" interface relationships and enables broadcast + alternate default IAID when their MAC addresses are the same, but for now moving forward with ipvlan-specific code.

@deliciouslytyped
Copy link
Author

deliciouslytyped commented Apr 24, 2021

If it detects ipvlan it could bail/leave a log message about needing an id set?

@ido
Copy link
Member

ido commented Apr 24, 2021

If it detects ipvlan it could bail/leave a log message about needing an id set?

A log message seems fair. However, if we're going to bail when the ipvlan interface doesn't have an IAID configured, we might as well also suggest they configure broadcast for the ipvlan interface in the same message and not set it for the user... So, I'm looking into how I might solve this without any configuration required by the user, and without bailing on ipvlan interfaces by default for a little longer first... :-)

Also, thanks for the reminder, the current patch does not yet check if the interface is configured, and should probably only set broadcast if the interface isn't configured (i.e. if_noconf).

@deliciouslytyped
Copy link
Author

Yeah on second though, I just realized it's kind of all-or-nothing, because if broadcast gets set automatically but the client ID isn't set, something weird could happen? So a sane default needs to be found, or you cant really do this automatically? Good luck.

ido added a commit that referenced this issue Apr 25, 2021
Linux ipvlan interfaces share a MAC address with their siblinds and
parent physical interface.  Before they are assigned an IP address,
these virtual interfaces do not receive DHCP OFFER unicast messages
because the ipvlan driver does not know to pass them to the virtual
interface yet.  This chicken-and-egg problem is resolved with two
changes:

- Set broadcast flag for an interface if it belongs to the ipvlan
  driver, as detected via SIOCETHTOOL ETHTOOL_GDRVINFO. (closes #32)

A forthcoming patch will automatically modify the DHCP IAID for
ipvlan interfaces so that they do not conflict with the parent
(lower/physical) interface IAID.  For now, dhcpcd will display a warning
log message when conflicting IAID (same MAC address) interfaces are active.

(A minor grammar correction is included free of charge.)
ido added a commit that referenced this issue Apr 25, 2021
Linux ipvlan interfaces share a MAC address with their siblings and
parent physical interface.  Before they are assigned an IP address,
these virtual interfaces do not receive DHCP OFFER unicast messages
because the ipvlan driver does not know to pass them to the virtual
interface yet by IP.  This chicken-and-egg problem is resolved with
two changes:

In this patch, we set the broadcast flag for an interface if it
belongs to the ipvlan driver, as detected via SIOCETHTOOL ETHTOOL_GDRVINFO.
(closes #32)

A forthcoming patch will automatically modify the DHCP IAID for
ipvlan interfaces so that they do not conflict with the parent
(lower/physical) interface IAID.  For now, dhcpcd will display a warning
log message when conflicting IAID (same MAC address) interfaces are active.

(A minor grammar correction is included free of charge.)
@rsmarples
Copy link
Member

If it's ipvlan then default the IAID to the interface name if 4 chars or less, otherwise index.
It's the best we can do. No it's not stable across reboots without configuration and this is how dhcpcd configures other interfaces lacking hardware address.
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcpcd.c#L524

@ido
Copy link
Member

ido commented Apr 25, 2021

If it's ipvlan then default the IAID to the interface name if 4 chars or less, otherwise index.
It's the best we can do. No it's not stable across reboots without configuration and this is how dhcpcd configures other interfaces lacking hardware address.
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/dhcpcd.c#L524

I was playing around with hashing the network namespace + interface name, but I think you're right. Unfortunately, the likely outcome here will be that a very common setup will result in a conflict: Suppose you have two "containers" (network namespaces) with an ipvlan interface in each one named eth0, and a physical/lower interface named eth0 in the init netns (outside the "containers"). The IAIDs of the ipvlan interfaces will conflict in this case. This seems too much like magic that will surprise the user in unexpected ways, so I'll output a warning message when we see an ipvlan interface to the effect of strongly recommending that they set a custom IAID.

@deliciouslytyped
Copy link
Author

deliciouslytyped commented Apr 25, 2021

I don't know what kind of things it's appropriate to do here, if you want to avoid conflicts wouldn't it be better to set a random suffix per boot and recommend setting an ID if the user wants stability? or is DHCP space exhaustion if you reboot a lot bad too?
To prevent churn you could then save the suffix to a state file.

To generalize it a bit, you could instead save a seed in the file and then use that to derive further IDs unique to the host.

@rsmarples
Copy link
Member

rsmarples commented Apr 26, 2021

Another option is to ignore the interface unless there is a configuration for it as we do for tap and bridge interfaces on Linux:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if.c#L555
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if-linux.c#L351

Although we might want to be more noisy about it and adjust if_ignore (on if-bsd.c and if-sun.c as well as if-linux.c) to return an int (probably as a references argument rather than the returned value) which represents the log level for use in logmessage here:
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/if.c#L560

@ido
Copy link
Member

ido commented Apr 29, 2021

Brief update - haven't forgotten about this - going to implement your suggestions @rsmarples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants