-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request : usrSctp in Kubernetes --> silently discard OOTB HB-request #499
Comments
Isn't the scenario you are referring to the one being described in draft-ietf-tsvwg-natsup? Maybe such SCTP support can be implemented and then your setup is supported. Calling
after initialising the stack disables the sending of packets containing ABORT chunks in response to out of the blue packets. Does that work around your issue with the NAT instance? |
The scenario is similar, but not exactly the same. usrsctp_sysctl_set_sctp_blackhole(2); Can be a workaround in the short term, but it requires the SCTP User to have knowledge about details of the implementation. It would be better to have a compile-time configuration flag, still I would prefere keeping the ABORT generation for other OOTB cases. |
But if the NAT would follow what is described in the ID, the problem would not be there. The HEARTBEAT would be delivered to the correct endpoint, since the remote address is not important.
Sure.
Please note that the sysctl variable to to allow to make attackers life harder. That is why it is there. Not for working around limited NAT implementations. |
Please note that the server is free to use its secondary address for all packets it is sending. So there is no reason why you want to limit you special handling to HEARTBEAT chunks, just because that is what you observed up to now... |
The reason why I wish HB-request for special handling is because SCTP does probe the path before using it for any traffic and NAT (linux sctp_conntrack) doesn't recognize the association based on vTag, but only cares about source/destination IP address and ports. |
That is a limitation of the NAT you are using.
The local end-point does the path verification for the second address. This is only done if you don't provide both addresses in the The server does not do path verification, since it sees only a single address of the client. It is free to use any local IP address for any outgoing at any time.
Then work around this limited NAT by issuing a HEARTBEAT to the other IP-address right after the association has been established.
And why is your server multihomed? It does't really gives you anything, or am I missing something? |
In the considered network scenario NAT is the default networking behavior of K8s and is based on linux iptables implementation and sctp_conntrack kernel module. |
And that is one implementation.
That is your decision how things operate. If have seen setups using multihoming where clients how multiple IP-addresses of the peer, because that way they can already use both addresses for connection setup.
The server only has a single peer address. It does not do path verification, because it is verified by the handshake. The server is free to use any IP-address for sending its packets. It can do this, for example, for sending DATA chunks or SACK chunks. That is implementation dependent.
RFC has the rules it has for good reasons. I do not see that it is necessary to change them. As I said, the problem is not limited to HEARTBEAT chunks. |
Let's consider the scenario once more. I think that Section 8.4 may be improved a little by explaining that the OOTB ABORT should be used only when the offending traffic is related to one of the Endpoints owned by the SCTP stack and not in general, so that parallel instances of SCTP Stack can exist, thus solving the well-known issue with LKSCTP. |
I understand. However using this kind of NAT enforces some constraints to the SCTP usage when it comes to multihoming.
I think this is exactly the case. It seems you have multiple end-points behind the NAT and they are all sharing the same port number. If talking to the same peer, it would be a port number collision case.
Again, I think this is covered by the Internet Draft describing an SCTP aware NAT.
Just to be crystal clear: Host-A is behind the NAT and uses a single private address, Host-B has two public addresses B' and B'', Host-A is initiating the association towards Host-B. Host A is sending the INIT towards B'.
Please note that Host-A knows after connection setup two IP-Addresses of the peer: B' and B''. Host-B only knows a single IP-address of its peer: A (the public address of the NAT). It is confirmed by the second list entry in
There is also a good reason for HB-requests. Assume an attacker owns address A, wants to attack a victim owning address V and using for is some host B owning address B. the attacker sends an INIT to host B and list IP address V. After the association setup host B will send path verification heartbeats to V. If V would not respond, it will get a lot of them. This allows the attacker to run a packet amplification attack against V. In summary:
I think I'm suggesting to use a value of 2. This will disable OOTB handling completely, which is what you need. It is against the RFC 4960, but some people prefer to make live harder for being port scanned. That is what the sysctl variable is for.
As described above, it is important to reply to OOTB packets with an ABORT. The behaviour of LKSCTP is NOT REQUIRED by RFC 4960, but it is allowed. In my view you are using the NAT instance in a scenario which it doesn't support. So the best way out of the problem is to configure the peers to use only a single address. That way you can operate within what is supported by your configuration. Or, implement an SCTP aware NAT and use that. Then the peers can continue to use multihoming (although there is no use of it), and you still operate within what is supported by that config. If you don't want to do this, disable OOTB handling at all nodes behind the NATs (this also includes sending ICMP, destination unreachable / protocol unreachable). That is a work around to be able to operate in a scenario not supported by your setup. |
Correct, we need to add some special handling when the remote peer is multihomed.
It's not exactly like that. From network perspective we see only one host with, and that host has one or more endpoints. Being the SCTP host implemented as many instances of usrSctp is for redundancy and scalability reasons. The users of SCTP will also see a single instance of the termination (by means of K8s), thus there will never be a duplication of an Association as it's preserved with load-sharing mechanism when creating the Association itself.
What I have seen is that Host-B does path verification, at least LKSCTP does it, as well as other protocol stacks. I think that this is not a bad behavior as the simple knowledge about the existence of the single IP address of a remote single-homed peer doesn't guarantee that a path exists at a given time between all the local IP addresses and that remote IP address as the network may be designed in a way that makes path redundancy.
Yes, definetly K8s (Linux NAT and conntrack) doesn't support SCTP other than in a very basic way. |
Again: Why do you think LKSCTP is doing path verification? It only sees a single peer address and that address is already verified by the handshake. So there are no peer addresses to be verified. Just to be clear: Path verification is NOT used to test address pairs. It is only about verifying that the addresses reported by the peer actually belong to that peer. So LKSCTP does NOT do path verification. It just seems to send an HEARTBEAT using B'' as the source address. It can send any packet with that address. Only the node behind that sees two addresses of the peer and needs to verify the send address. So if you want to suppress the OOTB handling, you have to suppress it for all packets, not only for HEARTBEAT chunks.
As I explained above, this wouldn't help you. Node running LKSCTP (or other stacks) does not need to verify any peer address and it can use any local address immediately after the association setup. So if you want to use the ABORT suppression method as a work around, you need to disable OOTB handling for all packets, not only for packets containing HEARTBEAT chunks.
|
That's the way it behaves. I guess LKSCTP does probe paths and not only addresses. Assuming that there are other SCTP implementation that do not probe the path, but only the address as in 5.4, then the only way forwards is to disable OOTB supervision totally, or by forcing the client installation to probe for the whole set of remote IP addresses before the remote peer does any activity, for instance by sending multiple COOKIE ECHO, or sending COOKIE ECHO towards the seconday IP address (in our case we have at most 2 IP addresses). |
And that is completely valid. The peer can send packets containing arbitrary chunks. That is why ignoring only HEARTBEATs doesn't work in general.
I think it should work with a generic compliant SCTP implementation as the peer, not with some version of some specific implementation.
Sure, the FreeBSD kernel implementation does it and therefore also the usrsctp stack.
You can't send a COOKIE-ECHO to unconfirmed addresses. If the upper layer on the client side provides both addresses of the peer, one could send packet containing the INIT-chunk to both address. Right now we do this only for transmissions. But again: This requires getting the addresses from the upper layer. Not sure why you want to change the implementation (which you are free to do, it is open source), and not tweak the sysctl parameter? |
At the end of section 5.4, sending of COOKIE-ECHO to an unconfirmed address is permitted if bundled with an HB-request. In the same section is also stated that probing shall be started when an association moves to the ESTABLISHED state. The reason for (slightly) changing the implementation rather than disabling features is an attempt to find a way that keeps rfc compatibility and at the same time builds the traffic in a way that allows a generic conntrack-based NAT work as wished. |
That is correct. However, the drawback of sending the COOKIE-ECHO to a different address to which the INIT was sent is that you assume that both paths work when setting up the association.
No, you can do that. But it has the drawback mentioned above...
I see. |
@tuexen If I would like to contribute with a PR to implement this, what is the procedure? What is required to get it accepted? What tests would be needed, and how to implement them? I can test on Linux, and perhaps freeBSD if I install it in a kvm, but not Windows. Can your CI be used? Best Regards, Lars Ekman |
Hi @uablrek, when you open a PR, compile checks are executed automatically. We have an additional, independent buildbot CI system, which executes multiple runtime checks on different platform. If you open a PR, I can trigger our Buildbot system manually and provide you with the results. Best, |
Hi @uablrek, it might help to agree on what will be implemented first. This is not clear to me... Best regards |
I totally agree. To be honest it is not clear to me either 😄 But I believe I understand the problem; In a K8s environment usrsctp comes in conflict with the Linux kernel sctp (using conntrack NAT) (or even with other usrsctp's belonging to other tenans in the same cluster?). I will try to sort this out and come with some concrete suggestion. |
Great. Please do not assume that the peer is LKSCTP. It can be any SCTP stack. So I think just not sending ABORTs for OOTB is the appropriate way to handle this. Using
gives you that... |
As I understand the requested change is the one described in #499 (comment); send a bundled COOKIE-ECHO+HB-request to the "other" address if multiple addresses are in the INIT msg. The intention is to setup a multihomed NAT way when the SCTP server executes in a K8s POD. I am not sure that this will work but I am not a SCTP guy but I do know K8s networking. Anyway here is what I will do; I will fork the usrsctp project (already done) and try to do the requested update but not create a PR immediately. Then we do a PoC; @teiclap builds usrsctp from the fork and verify that it really works in K8s and solves the multihoming/NAT problem. @weinrank @tuexen To test this I must setup sctp multihoming. Is there some test program (in "usrsctp/programs/" or elsewhere) that can serve as an example. I read the code in "programs/" but there does not seem to be any using multihoming. |
@uablrek I'm not sure about the architecture... I thought that you run multiple instances of usrsctp behind a NAT acting as a single homed client, all talking to a server which is dual-homed. You are stating that you want to run multiple servers. Can you clarify? Here is what I think are the drawbacks of the proposed solution:
Regarding the question of examples: |
@tuexen I assumed the server would be in K8s. First because the K8s load-balancing (that uses NAT) is for incoming requests only, and second because otherwise the requirement to respond on the other interface would be on a remote server that may not be usrsctp at all. But as I said I am not an SCTP guy so I might have missed something. @teiclap Can you please explain the architecture? I agree with the drawbacks. The feature must be configurable, but if is really enables a resilient sctp solution in K8s without modifications (like MULTUS and friends) it is worth something. I would however like to see that verified, hence the PoC. The most important property of multi-homing in this case I think is resilience, scalability comes second. But again I must refer to @teiclap Thanks for the pointer to |
Please note that the client sends a packet with the INIT chunk, the server responds with a packet containing an INIT-ACK chunk, and then the client sends a packet with the COOKIE-ECHO chunk. Since you want to change the sending of the COOKIE-ECHO chunk, you want to change the client side. My understanding of the problem is that one single homed client behind a NAT sends an INIT to an external server having two addresses. The server responds with an INIT-ACK and lists its addresses. The client sends a COOKIE-ECHO Now it seems that LKSCTP, for whatever reason, sends a HB from its other address to the client. This is allowed, but implementation specific. Actually it can send any packet anytime using these addresses. The race is at the NAT: If the verification HEARTBEAT is seen first, everything is fine. However, if the packet from the server is seen first, it gets delivered to some of the clients (this is the loadsharing feature). If the client selected is not the right one, it will send an ABORT and the association is dead. The simplest solution would be to avoid sending the ABORT for OOTB messages. This is already implemented and could be activated.
My suggestion was: Just disable the OOTB handling.
If you don't locally bind to a specific address, SCTP uses all applicable addresses. |
@tuexen The architecture is as you described: there are a number of instances of SCTP Stack sharing the same SCTP EP behind NAT. It's seens from the external world as a single SCTP Stack.
Yes, this is correct, in case the COOKIE-ECHO request goes to timeout, it shall be attempted to the source IP address used by the peer for INIT-ACK. At this point NAT table has been already adjusted so the Association setup can continue as in the legacy.
This is also correct, but in the use case the server has at most two addresses.
That is true, we need the solved of the shared SCTP Stack in Kubernetes networking based on NAT, running on Linux and using iptables and sctp_conntrack version of Linux kernel. The only advantage of the proposed behavior is that it covers the use case being still full rfc4960 compliant. I think that the OOTB handling in the rfc is valid and disabling it fully should be avoided. |
Thinking about it... The problem is that the NAT instance forwards incoming packets for which it has no entry in its tables to an arbitrary internal endpoint. I understand that this is used for load sharing. But this should only applied to packets which contain an INIT chunk. So can't you just limit the forwarding to packets which contain an INIT chunk and drop all other packets at the NAT instance. Doing loadsharing for packets which are not used for connection setup doesn't make sense to me. |
But you are compliant by not sending an ABORT. RFC 4960 says:
In RFC 4960bis this will be clarified as
There is no |
No, K8s only load-balance incoming traffic to virtual load-balancer IPs (or to NodePorts which are not relevant now). Since the POD in K8s is acting as a client and making an outgoing request K8s (and it's load-balancing) is not involved at all. But there is a NAT from the internal POD address (usually some 192.168.x.x) to the K8s node (host) address where the POD is executing. Return-packets from the server would have the K8s node address as destination and will be "re-NATed" and forwarded to the POD address if a conntrack entry exists. The problem would be if the server sends a packed with it's "other" address as source. There is no conntrack entry and Linux will assume that the packet is really for the node itself (which is the dest), but it will not be load-balanced. @tuexen Thanks a lot for the explanation. I am sorry you had to explain SCTP fundamentals to me but I am grateful that you had tha patience to do so. |
Thanks for the clarification. So the K8s node is responding... Does the node need to handle SCTP packets? If not, just don't load the SCTP module and make sure it does not send ICMP/ICMPv6 packets indicating that it does not support SCTP. Is that an option?
You are welcome... |
@tuexen I see your point. Fully disabling OOTB does the job and requires no modification. I am not sure what security issues can be created by fully disabling OOTB. Thanks. |
Yes, actually I think that is the current solution. When sctp support was introduced in K8s we (Ericsson) requested that the module should not be "auto-loaded" by K8s. I was not involved directly so I don't have the details but there is a long-running issue in K8s somewhere. (the module is not auto-loaded btw) I think the problem now is that other tenants are using LKSCTP in the same cluster. |
Comments about LKSCTP in K8s in; kubernetes/kubernetes#64973 |
Here is the reason why you might not want to respond with an ABORT: If you reply with an ABORT in response to an INIT, you allow an attacker to get an instant indication that the port is not listening. So you can do a fast portscan. If you don't, the attacker doesn't know how long to wait and has to deal with the possibility of packet loss. Some people even prefer that an end-point is in a "stealth-mode", only responding when it helps in communications it actually wants to do. This is served by not responding to OOTB packets. That explains why you have the different settings for If a host is not responding to OOTB packet with an ABORT:
Does this help?
|
Let me learn something here: You can load the SCTP on the K8s host and then you can use it in a container. Right? Can you still use a userland stack in another container? You are saying that some containers use the kernel stack and therefore the module is loaded. Couldn't you add a rule to the host that it drops outgoing packets which contain an ABORT chunk and have the T-bit set? That would drop ABORTs, which are sent in response to OOTB packets which do not contain INIT-chunks. That would mean that the containers using the kernel SCTP stack have the same problem. |
Then we come into something difficult. |
As it does if you are not sending ABORTs.
I would argue that if you want to deploy something which involves a NAT, use an appropriate one. The problem you are facing is, in my view, a consequence of the implementation you are using. So why are packets delivered to the K8s host, which are not sent to it?
I would say no. The
You need to discuss this on tsvwg@ietf.org. |
I normally do not comment on this list but I will state
right now that I would be strongly against changing the
SHOULD to a MUST. The current wording allows for discretion on
the part of the implementation as well as the application. Binding
it to a MUST forces the developer and where SCTP would be applied
to do that.
TCP has a similar “stealth” mode and it too is in the spec that way
on purpose!
R
… On Aug 25, 2020, at 10:12 AM, Michael Tüxen ***@***.***> wrote:
Sure. If you reply with an ABORT in response to an INIT, you allow an attacker to get an instant indication that the port is not listening. So you can do a fast portscan. If you don't, the attacker doesn't know how long to wait and has to deal with the possibility of packet loss. Some people even prefer that an end-point is in a "stealth-mode", only responding when it helps in communications it actually wants to do. This is served by not responding to OOTB packets. That explains why you have the different settings for usrsctp_sysctl_set_sctp_blackhole. FreeBSD has similar settings for TCP and SCTP (see man blackhole).
Thanks.
Then we come into something difficult.
The rfc suggests as preferred implementation that OOTB shall generate an ABORT, we see that the case of sending an ABORT would in some way help the malicious attacker.
As it does if you are not sending ABORTs.
Having the stealth-mode as default would help a lot cases where the SCTP Endpoint is shared among instances of the protocol stack, especially in the Cloud paradigma.
I would argue that if you want to deploy something which involves a NAT, use an appropriate one. The problem you are facing is, in my view, a consequence of the implementation you are using. So why packets delivered to the K8s host, which are not sent to it?
Should the new rfc4960bis change the OOTB preferred handling into stealth-mode, would make it much easier also having a number of SCTP protocols stacks from different implementation within the same Cloud based environment.
I would say no. The SHOULD instead of a MUST allows you what you want to do. The FreeBSD implementation even supports the stealth mode. Since you are using Linux, you would need to implement this, if it doesn't support this yet. But it is fairly easy, I did it 8 years ago for FreeBSD in r229805. You just need to change a sysctl variable on boot (which is a single line in /etc/sysctl.conf, also on Linux) you you get what you want and you are compliant with the spec.
Is there any chance to get it?
You need to discuss this on ***@***.***
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
------
Randall Stewart
rrs@netflix.com
|
Yes, I assume so but I have not tested personally but I guess @teiclap has. The container has it's own network namespace and a program should be able to open a raw socket inside it.
I guess so. There is some competition of "iptables" in K8s, both K8s and various CNI-plugins add and "sync" iptables rules rather uncontrollably so it may be so that our added rules are removed by others. But this is certainly a preferable option. I will check it.
Yes, I think they have. But as far as we know no user of LKSCTP in K8s are using multi-homing or have any plans to do so. |
I think @teiclap wants to change "SHOULD send an ABORT" to "SHOULD NOT send an ABORT". The default should be the "stealth mode" and the current default behaviour should be allowed. But as you can read from my answers above, I would also be against that change. At least for the reasoning given in this discussion (working around a limitation of some middle box implementation).
|
Normally (without docker or such containers) the problem is not opening a raw socket, but receiving packets on it. If there is a kernel stack for that protocol, packets are delivered to the kernel stack, not to the raw socket. This applies normally to UDP, TCP, and SCTP (and other transport protocols).
Please note the it is the peer, which is multihomed, not the endpoint running on K8s or in its containers. |
@tuexen I didn't mean to force any change in the protocol, rather try to influence the implementors in some ways. |
That is up to the implementer and allows for some competition in implementations... If you want this to be changed in the Linux stack, you need to talk to the implementers of that implementation and convince them that it is a good feature: linux-sctp@vger.kernel.org. |
When deploying usrSctp in a K8s Pod with replicas, thus distributing an SCTP Endpoint among independent instances of SCTP Stack behind a NAT, the strict compliance to rfc4960 section 8.4 part 8 can cause wrong abortion of Association.
The case is : SCTP Client in K8s Pod, single homed, SCTP Server is remote and is multihomed.
SCTP Client sends INIT to the primary IP address of the remote server via NAT, NAT creates an entry in the Natting table mapping the Client and the primary address of the server.
Once the association is up, the remote Server sends HB-request to the Client from a secondary IP address.
Since NAT doesn't know the secondary IP address, it chooses randomly an instance of SCTP Client among the available replicas. When selecting a Client different than the one that has originated the Association, the HB-request will reach an instance of SCTP Stack that doesn't know about the Association, thus will reply with ABORT to the remote Server. The remote Server will close the Association.
Solution is to move HB-request handling from rfc4960 section 8.4 part 8 (reply with ABORT) to part 7 (silently discard).
The completion of the multihomed Association happens as soon as the Client will send HB-request towards the secondary address(es), thus enabling NAT with the proper information.
May you please consider that change in usrSctp? (possibly under a selectable option).
Thanks,
Claudio Porfiri
The text was updated successfully, but these errors were encountered: