Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTP support implementation for Kubernetes #64973

Merged
merged 9 commits into from Aug 28, 2018

Conversation

janosi
Copy link
Contributor

@janosi janosi commented Jun 11, 2018

What this PR does / why we need it: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #44485

Special notes for your reviewer:

Release note:


SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.  

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jun 11, 2018
@janosi
Copy link
Contributor Author

janosi commented Jun 11, 2018

/assign @thockin

@thockin
Copy link
Member

thockin commented Jun 11, 2018

I know precious little about SCTP - I have no hands-on experience. @kubernetes/sig-network-feature-requests I do know, for example that GCP's load-balancer only supports SCTP in single-endpoint mode: https://cloud.google.com/compute/docs/load-balancing/network/forwarding-rules

So at a minimum, all of the cloud implementations need to be checked for compatibility and disabled when not compatible.

This should probably start as a proposal to sig-network to gather context (like the above GCP point) and then become a KEP for broader review.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 11, 2018
@janosi
Copy link
Contributor Author

janosi commented Jun 11, 2018

@thockin Do I understand right, that you would like us to go back to point 0 and start with a sig-network proposal first, and then after some criteria are met we should create a KEP? In order to check that if a cloud provider API gets an unsupported protocol value then it answers with an error and it is handled right in the relevant cloud provider logic of k8s?

@janosi
Copy link
Contributor Author

janosi commented Jun 12, 2018

@thockin
I have checked now the EnsureLoadBalancer implementations of the different cloud providers (pkg/cloudprovider/providers)
AWS: it accepts only "TCP" as Protocol. Otherwise it returns with error.

if port.Protocol != v1.ProtocolTCP {

Azure: it accepts only "TCP" or "UDP". Otherwise it returns with error.
return &transportProto, &securityProto, &probeProto, fmt.Errorf("only TCP and UDP are supported for Azure LoadBalancers")

CloudStack: it accepts only "TCP" or "UDP". Otherwise it returns with error.
return nil, fmt.Errorf("unsupported load balancer protocol: %v", port.Protocol)

GCE: it allows only "TCP" or "UDP" for external LB. Otherwise it returns with error. For internal LB there is no explicit protocol check.
if ports[0].Protocol != v1.ProtocolTCP && ports[0].Protocol != v1.ProtocolUDP {

OpenStack: my understanding is, that OpenStack supports SCTP, so I enabled SCTP in the OpenStack cloudprovider code with this PR
oVirt, Photon, vSphere: these require the registration of a "cloud provider" and it is then up to the "LoadBalancer" implementation of the cloud provider how it handles the different protocols.

pkg/controller/service/service_controller.go also have a nice warning at the point where it calls the EnsureLoadBalancer() function of the relevant cloud provider:
" // - Not all cloud providers support all protocols and the next step is expected to return
// an error for unsupported protocols"

// - Not all cloud providers support all protocols and the next step is expected to return

As I can see all the supported cloud providers follows this principle, and they handle unsupported protocol values in their code.

Of course we are happy to receive comments from Network SIG. Please help in inviting the relevant people here if the "sig/network" label is not enough for a heads-up.

@MaximProshin
Copy link

MaximProshin commented Jun 13, 2018

LKSCTP is not loaded by default in Linux OS that allows developers to run user-land SCTP in Linux. With your change LKSCTP will automatically be loaded and it will intercept all incoming SCTP packets which will lead to mess and basically aborting of SCTP associations. In other words, this change will not allow developers to run user-land SCTP in K8s cluster.

@janosi
Copy link
Contributor Author

janosi commented Jun 13, 2018

@MaximProshin Is there the same effect when those are in different network namespaces? I.e. when the user space SCTP implementation is used inside a pod that does not use the host network namespace?

UPDATE: If the network interface of the pod is not a tun/tap-like one, and the user space SCTP stack uses raw socket with IPPROTO_SCTP, then namespace will not save us. On the other hand, considering the world without containers, one had to ensure that such user space SCTP apps are not deployed on the same nodes with such applications that would trigger the loading of lksctp. It drives me into the direction of thinking about a way how nodes could be reserved for such user-land SCTP applications. Assuming of course, that those user-land SCTP applications are deployed as containers/pods that share the same kernel with other pods - i.e. not as e.g. Kata Containers or similar.

@@ -25,6 +25,8 @@ import (

"k8s.io/api/core/v1"
utiliptables "k8s.io/kubernetes/pkg/util/iptables"

"github.com/nokia/sctp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to update the BUILD file in this directory to account for this import addition. run 'hack/update-bazel.sh' and I think it'll do that for you.

@@ -33,6 +33,7 @@ import (
"time"

"github.com/golang/glog"
"github.com/nokia/sctp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, bazel update.

@@ -1656,6 +1667,18 @@ func openLocalPort(lp *utilproxy.LocalPort) (utilproxy.Closeable, error) {
return nil, err
}
socket = conn
case "sctp":
//SCTP is not supported by golang/net, or any other built-in lib,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mis-indented whitespace; also there should be a single space after the // in comments.

@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jun 14, 2018
@janosi
Copy link
Contributor Author

janosi commented Jun 14, 2018

@dcbw Thank you for the comments! Updated with hack/update-gofmt.sh and hack/update-bazel.sh. Also did a rebase to the current master.

@MaximProshin
Copy link

Yes, namespaces don't help to isolate user-land SCTP from LKSCTP so there should be a way to not load/unload LKSCTP at least. If I compare K8s with native Linux OS, I would say the PR is non-backward compatible as previously LKSCTP wasn't loaded by default while with K8s it will be so.

@janosi
Copy link
Contributor Author

janosi commented Jun 15, 2018

@MaximProshin My understanding is that lksctp is not loaded as long as no SCTP Service (with Cluster IP or NodePort) is created, because of the on-demand loading. Once someone creates an SCTP Service the module is loaded, but it is an explicit request from the user. Just like when a user started a native Linux process that wanted to use SCTP.
If my understanding is wrong, please point to the place that triggers lksctp loading right at node startup, i.e. before there is an explicit request for that. Thank you!

UPDATE: we tested the module load case explicitly with the current code proposed here. My understanding is right: the current solution does not trigger the loading of the SCTP module by default(e.g. when kube-x is loaded at bootup). The SCTP module was loaded only when an SCTP Service was created. So, from this perspective if someone upgrades her k8s to a version that includes this feature, she will not experience any problems as long as she does not create SCTP Services explicitly.
This was about backward compatibility.
Starting from this stand we can discuss what we can do in order to have both SCTP Servies with ClusterIP/NodePort and userspace SCTP stack supported at the same time on the same cluster.
I think, we can agree, that these were not supported earlier either - i.e. if I had an application that used userspace SCTP and then I started another app that used kernel SCTP on the same node, then the result was the same problem. I.e. the solution, most probably, was to dedicate nodes to either of these SCTP users, i.e. to achieve that they are not deployed on the same nodes. Here we have the same challenge: to achieve that on the nodes where userspace SCTP apps are to be deployed we should not use the kernel level SCTP on any way.

@janosi
Copy link
Contributor Author

janosi commented Jun 16, 2018

@thockin I created a KEP, the PR is at kubernetes/community#2276

@thockin
Copy link
Member

thockin commented Jun 18, 2018

Thanks for the KEP. This also needs discussion on sig-network. A lot of significant issues in there.

@janosi
Copy link
Contributor Author

janosi commented Jun 19, 2018

Sure, we can take it to the next SIG Network meeting. Am I allowed to write it on the agenda?

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Aug 27, 2018
@janosi
Copy link
Contributor Author

janosi commented Aug 27, 2018

Hopefully this PR kubernetes/test-infra#9161 solved the GKE test issues on 27th August 2018, so
/retest

UPDATE (30 minutes later): that PR is not in effect yet. The retest failed.

@thockin thockin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2018
@thockin
Copy link
Member

thockin commented Aug 28, 2018

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janosi, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2018
@janosi
Copy link
Contributor Author

janosi commented Aug 28, 2018

/retest

2 similar comments
@janosi
Copy link
Contributor Author

janosi commented Aug 28, 2018

/retest

@janosi
Copy link
Contributor Author

janosi commented Aug 28, 2018

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 2eb14e3 into kubernetes:master Aug 28, 2018
@janosi
Copy link
Contributor Author

janosi commented Aug 28, 2018

@thockin Thank you for guiding me through this adventure, for the review, and for the approval!
@MaximProshin @danwinship @dcbw @KomorkinMikhail @matrohon @m1093782566 Thank you for your comments and reviews!

@justaugustus
Copy link
Member

Linking the features issue for this: kubernetes/enhancements#614

@yujuhong
Copy link
Contributor

I don't think this gets enough attention from the sig-node. This adds a new SCTP protocol to CRI and may require changes in the CRI shims.

@kubernetes/sig-node-api-reviews @kubernetes/sig-node-pr-reviews @mrunalp @Random-Liu @resouer @feiskyer

@feiskyer
Copy link
Member

On the Kubernetes side we solve this problem so, that we do not start listening on the SCTP ports defined for Servies with ClusterIP or NodePort or externalIP, neither in the case when containers with SCTP HostPort are defined. On this way we avoid the loading of the kernel module due to Kubernetes actions.

@yujuhong Should we skip SCTP port mapping in kuberuntime instead?

@janosi
Copy link
Contributor Author

janosi commented Sep 12, 2018

@feiskyer @yujuhong I wonder whether the same logic could be applied on the CRI side which was selected for the load balancers: k8s itself does not restrict the usage of SCTP as protocol for load balancer reuqests, and it is the task of the cloud provider's load balancer plugin to filter unsupported protocols.
For example, AFAIK, Docker supports SCTP for host port binding.
Also, I wonder if a clear and straightforward rejection of a request with an unsupported protocol would be better from the user's perspective than just skipping that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm area/kubectl area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes service can't support SCTP protocol