Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EndpointSlice Conditions semantics: Ready, Serving and Terminating #24174

Merged
merged 2 commits into from
Mar 7, 2023

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Mar 6, 2023

Fix the usage of the EndpointSlice Conditions semantics: Ready, Serving and Terminating.

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!

The EndpointSlice conditions describe 3 states for an Endpoint: Ready, Service and Terminating.

If the Endpoint is Ready it must leverage that value, independently of the other Conditions, this is the reason why the Kubernetes test mentioned in #18241 fails, since using service.Spec.publishNotReadyAddresses force this Ready state in the Endpoint.

In addition, Terminating endpoints must be included only of they are Serving, since that indicates that the Endpoint is valid, this state has to be introduced in the API for handling backward compatibility between versions adding an undesired complexity to consumers, so apologize for that 😄

In KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1669-proxy-terminating-endpoints we describe the behavior for Terminating Endpoints in case there are no existing Ready endpoints, to allow to implement zero downtime strategies during rolling updates.

Add a job to run Kubernetes sig-network tests, to guarantee that there is no regression and also benefit Cilium of these tests, that should guarantee a standard and conformance behavior with Kubernetes sig-network defined behaviors.

Fixes: #18241

Services backends with publishNotReadyAddresses are able to receive traffic independently if they are Terminating, since is the user intent to make them reachable despite its state.

@aojea aojea requested a review from a team as a code owner March 6, 2023 01:27
@aojea aojea requested a review from tommyp1ckles March 6, 2023 01:27
@maintainer-s-little-helper
Copy link

Commit a2ec7c754c1196638ad54a2a5dfc7539741d6674 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 6, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 6, 2023
@aojea
Copy link
Contributor Author

aojea commented Mar 6, 2023

/assign @aanm @squeed @aditighag

I've verified that this commits fixes #18241, the e2e passes with this patch.

However, I want to add some unit tests and a second commit to fix ProxyTerminatingEndpoints and implement https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1669-proxy-terminating-endpoints per Kubernetes KEP

I'm sending this PR just for testing the CI and for guidance and early feedback, since is my first contribution to cilium

Thanks

@sayboras
Copy link
Member

sayboras commented Mar 6, 2023

Thanks for your PR.

You probably already see the below page. Also, you can mark this PR as draft if you want.

PS: Glad to see you here contributing to Cilium, we are heavily using Kind cluster as part our tests ;). The IPv6 related workflow was possible, thanks to your time helping me a few years back 👍.

https://docs.cilium.io/en/v1.13/contributing/development/contributing_guide/

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 6, 2023
@aojea aojea changed the title [WIP] EndpointSlice Conditions semantics: Ready, Serving and Terminating EndpointSlice Conditions semantics: Ready, Serving and Terminating Mar 6, 2023
@aojea aojea requested review from a team as code owners March 6, 2023 09:18
@aojea aojea requested review from brlbil and nebril March 6, 2023 09:18
@aanm aanm requested review from aditighag and squeed March 6, 2023 10:20
@aanm aanm added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 6, 2023
@aanm aanm added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch labels Mar 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 6, 2023
@aanm
Copy link
Member

aanm commented Mar 6, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 10.0.1.2:80 from testclient-host-bcqqr

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

@squeed
Copy link
Contributor

squeed commented Mar 6, 2023

The travis failure is legit:

    testcase.go:300: lbmap mismatch:
        --- /home/travis/gopath/src/github.com/cilium/cilium/test/controlplane/services/graceful-termination/lbmap2.golden
        +++ <actual>
        @@ -1,15 +1,15 @@
          - Services ---------------------------------------------------------------------
         | ID |                   Name |      Type |               Frontend | Backend IDs |
         |----+------------------------+-----------+------------------------+-------------
         |  0 |     default/kubernetes | ClusterIP |     10.96.0.1:443/NONE |           1 |
         |  1 | test/graceful-term-svc | ClusterIP | 10.96.116.33:8081/NONE |           0 |
          --------------------------------------------------------------------------------
         
        - - Backends -------------------------------------------------
        -| ID |              L3n4Addr |       State | Linked Services |
        -|----+-----------------------+-------------+-----------------
        -|  0 | 10.244.0.112:8081/TCP | terminating |               1 |
        -|  1 |   172.18.0.2:6443/TCP |      active |               0 |
        - ------------------------------------------------------------
        + - Backends --------------------------------------------
        +| ID |              L3n4Addr |  State | Linked Services |
        +|----+-----------------------+--------+-----------------
        +|  0 | 10.244.0.112:8081/TCP | active |               1 |
        +|  1 |   172.18.0.2:6443/TCP | active |               0 |
        + -------------------------------------------------------

(You can run these with go test ./test/controlplane) You might need to do a make -C test/controlplane update-golden

@squeed
Copy link
Contributor

squeed commented Mar 6, 2023

The affinity test is failing; probably need to do --helm-set sessionAffinity=true in the cilium install command.

@squeed
Copy link
Contributor

squeed commented Mar 29, 2023

I've flagged this PR to be backported; it will happen automatically unless it needs manual intervention.

@aditighag
Copy link
Member

I agree that we can backport this to 1.12. @aojea AFAIK, the delta between the master and 1.12 implementation might not be too big. But in case there are conflicts, will you be able to help in resolving the conflicts?

@aojea
Copy link
Contributor Author

aojea commented Mar 29, 2023

will you be able to help in resolving the conflicts?

absolutely, I joined cilium slack so I can be reachable, but in case I have slow response Casey knows how to contact me

@xyll
Copy link

xyll commented Mar 31, 2023

Hello!
Do you have maybe the ETA for the backport to the 1.12?
It is possible to get maybe nightly builds of 1.12 after the backport has happened?
Thanks!

@joestringer
Copy link
Member

@xyll typically backporters will look at all PRs each week and start the backport. They will tag it so that it shows up in the breadcrumbs links here on this PR, so you should be able to find it by monitoring this PR. Typically after a couple of days it will be reviewed & merged, so a rough estimation would be late next week assuming something doesn't come up. At that point, it should be part of quay.io/cilium/cilium-ci:v1.12 and you could test out the image from there.

@jibi jibi added the backport/author The backport will be carried out by the author of the PR. label Apr 3, 2023
@jibi jibi mentioned this pull request Apr 3, 2023
3 tasks
@jibi jibi added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch backport-done/1.3 backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. affects/v1.12 This issue affects v1.12 branch needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 7, 2023
@gandro
Copy link
Member

gandro commented Apr 11, 2023

Unfortunately, this PR had some had some non-trivial conflicts against v1.12. This means that this needs PR needs to be backported manually to v1.12 from someone with expertise in the affected area.

@aojea Would you be able to open a PR against the v1.12 branch? Otherwise, we'll have to find a Cilium committer who is knowledgeable in the code to backport it.

@aojea
Copy link
Contributor Author

aojea commented Apr 11, 2023

@aojea Would you be able to open a PR against the v1.12 branch? Otherwise, we'll have to find a Cilium committer who is knowledgeable in the code to backport it.

I can do it, just need some guidance if there are some additional steps, or is just a normal cherry-pick and dealing with the conflicts?

@aojea
Copy link
Contributor Author

aojea commented Apr 11, 2023

hmm, it interferes with #21161, it seems that backporting requires to bring some code that is not necessary ... it seems that users will have to update to 1.13 :/

@gandro
Copy link
Member

gandro commented Apr 12, 2023

hmm, it interferes with #21161, it seems that backporting requires to bring some code that is not necessary ... it seems that users will have to update to 1.13 :/

Thanks for investigating. Yeah, let's remove the backport label for now then.

/cc @squeed

@gandro gandro added affects/v1.12 This issue affects v1.12 branch and removed needs-backport/1.12 labels Apr 12, 2023
@ElvinEfendi
Copy link

ElvinEfendi commented Apr 13, 2023

Does this also fix #23744? It seems like it could.

@aojea
Copy link
Contributor Author

aojea commented Apr 14, 2023

Does this also fix #23744? It seems like it could.

I expect it, but I don't have time to test it so I can only guess 😄

@gentoo-root gentoo-root moved this from Needs backport from master to Backport done to v1.13 in 1.13.2 Apr 14, 2023
aojea added a commit to aojea/community-1 that referenced this pull request May 17, 2023
Per the contributor ladder document https://github.com/cilium/community/blob/main/CONTRIBUTOR-LADDER.md#organization-member I'd like to request membership in the org for myself, since I plan to contribute regularly, mainly in the areas related to Kubernetes where I'm sig-network and sig-testing member and Kind maintainer.

I've already contributed adding more test coverage for the Kubernetes integration to the project
- cilium/cilium#25258
- cilium/cilium#24209
fixing some bugs:
- cilium/cilium#24202
- cilium/cilium#24174
and helping to triage issues

Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
pchaigno pushed a commit to cilium/community that referenced this pull request May 20, 2023
Per the contributor ladder document https://github.com/cilium/community/blob/main/CONTRIBUTOR-LADDER.md#organization-member I'd like to request membership in the org for myself, since I plan to contribute regularly, mainly in the areas related to Kubernetes where I'm sig-network and sig-testing member and Kind maintainer.

I've already contributed adding more test coverage for the Kubernetes integration to the project
- cilium/cilium#25258
- cilium/cilium#24209
fixing some bugs:
- cilium/cilium#24202
- cilium/cilium#24174
and helping to triage issues

Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch backport/author The backport will be carried out by the author of the PR. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.13.2
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

Cilium 1.11 fails the kubernetes e2e test "[sig-network] Services should create endpoints for unready pods"