-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-09-26 #28282
v1.14 Backports 2023-09-26 #28282
Conversation
[ upstream commit 48023b9 ] This commit makes conn-disrupt-test a github action, so upgrade test and IPsec key rotation test don't have to copy and paste everywhere. The idea is to allow caller workflow to specify the commands to execute, then this action will follow the steps: 1. Run "cilium-cli connectivity test --conn-disrupt-test-setup"; 2. Run whatever caller workflow passes: could be upgrade operation or IPsec key rotation; 3. Run "cilium-cli connectivity test --include-conn-disrupt-test"; Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 85481c4 ] When executing `cilium encrypt status`, cilium-agent lists xfrm states and counts number of different AEAD keys. However, cilium-agent panics if there is any xfrm state using non-AEAD algorithm. These unexpected xfrm states could be installed by other applications. To reproduce the panic, we can manually install one by running command: ``` ip x s a src 1.1.1.1 dst 1.1.1.2 proto esp spi 0x3 reqid 1 mode tunnel enc aes 0xf0e1d2c3b4a5f60708090a0b0c0d0e0f ``` Then `cilium encrypt status` crashes. This patch fixes it. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 415e48f ] One dump is enough. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit fb1c074 ] Collect the tail-call map for bpf_xdp. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 708e9c1 ] For the sake of ease of development, we test in CI that the Ginkgo suite found in /test can compile on multiple platforms, including darwin. This test suite includes a test of the Hive framework, which initializes an instance of the daemon Cell. This Cell requires the MetricsRegistry, and so these tests depend on the MetricsRegistry. Some metrics may soon depend upon Linux feature probes, and so make the Linux probes package an indirect dependency of the Ginkgo tests. In order to support this build dependency, we allow the probes package to compile on darwin by adding stub values for netlink constants used by the package, and a linux-specific declaration of the constants using the netlink package. Signed-off-by: Andrew Sauber <2046750+asauber@users.noreply.github.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 53b9ccf ] To avoid a circular import, move testutils/ipam.go into its own package. The circular import was observed when attempting to use datapath kernel feature probes in the metrics package. It was affecting only the probe package's tests. import cycle: github.com/cilium/cilium/pkg/datapath/linux/probes [github.com/cilium/cilium/pkg/datapath/linux/probes.test] github.com/cilium/cilium/pkg/testutils [github.com/cilium/cilium/pkg/datapath/linux/probes.test] github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2 [github.com/cilium/cilium/pkg/datapath/linux/probes.test] github.com/cilium/cilium/pkg/policy/api [github.com/cilium/cilium/pkg/datapath/linux/probes.test] github.com/cilium/cilium/pkg/metrics [github.com/cilium/cilium/pkg/datapath/linux/probes.test] Moving this package avoids the import cycle Signed-off-by: Andrew Sauber <andrew.sauber@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 3d964de ] [ backporter's notes: fixed trivial conflict in spelling_wordlist.txt ] Add a metric for the current maximum observed interface index Adds a metric `endpoint_max_ifindex`, which is the current maximum interface index for all endpoints. The metric is updated during the periodic invocation of `syncHostIPs`. This metric can be used to determine if the interface index for the next Pod will exceed Cilium's limit of max(uint16) on older kernels. On kernels which do not provide ifindex via the FIB, Cilium must store the ifindex in the CT map, and 16 bits are reserved per entry for this purpose. This presents a limit of 65535 lifetime interfaces, which can be approached quickly with sufficient pod churn. Users who are subject to this limitation (typically a kernel version less than 5.10), are advised to reboot or roll the host in this case. Its default-enabled status is dynamic. On kernels which provide ifindex via the FIB, the metric is disabled (since Cilium's ifindex limits are not subject to max(uint16) in this case). Addresses: #17259 Addresses: #16260 Signed-off-by: Andrew Sauber <2046750+asauber@users.noreply.github.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit d9c9337 ] [ backporter's notes: re-applied hunk to the v1.14 version of the upgrade notes. ] Signed-off-by: Andrew Sauber <andrew.sauber@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 3ea5e8c ] [ backporter's notes: * dropped changes in install/kubernetes/Makefile as the given line was not present in v1.14; * fixed trivial conflict in spelling_wordlist.txt; * re-generated the cilium/values.yaml and cmdref/cilium-agent.md files from scratch. ] When ToFQDN policies are in use (CIDR) identities are created for each IP in a DNS response that matches a ToFQDN policy. These identities are garbage collected when 1) all endpoints with ToFQDN policies are removed 2) the maximum number of IPs per host is reached (tofqdn-max-ips-per-host) 3) when the identity has been unused and not refreshed by conntrack map GC Problems arise when he conntrack GC interval becomes very long. If it's backed by a LRU BPF map, the maximum is set to 12 hours (defaults.ConntrackGCMaxLRUInterval), meaning it will take 12 hours before unused identities are marked dead and collected (unless tofqdn-max-ips-per-host limit is reached for the FQDN entry). To allow user to have more control over this add the conntrackGCMaxInterval option that will allow limiting the maximum interval to something less than the 12 hours. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit ce51167 ] Previously, the CiliumExecContext() function would retry the cilium command up to 5 times (limitTimes) without verifying the success of previous executions. This pull request introduces a logic enhancement that validates the return code of each executed command and exits upon success. Additionally, this change optimizes test execution time by reducing unnecessary retries. With a 200ms delay between retries, the overall test execution time is expected to improve by at least 200ms multiplied by the number of cilium commands and the number of cilium pods involved. Local Ginkgo tests, specifically those focused on 'K8sDatapathBGPTests' 'K8sDatapathCustomCalls' and 'K8sDatapathLRPTests' have shown significant improvements: Before: `Ran 2 of 106 Specs in 233.348 seconds` After: `Ran 2 of 106 Specs in 168.563 seconds` Signed-off-by: Boris Petrovic <carnerito.b@gmail.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 071c2af ] In the past, we have merged automated updates to the coccicheck image version (for example, [0]) without realising they may break the workflow, because the workflow wouldn't run on the PR, which contained no BPF-related changes. Let's make the workflow run when its definition file is updated, to make sure we can catch similar issues in the future. [0] #27947 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 02c17d5 ] CB_SRC_LABEL is cleared just a few lines above. So to get the source's actual security identity, we need to use our local variable. Fixes: e2829a0 ("bpf: lxc: support Pod->Service->Pod hairpinning with endpoint routes") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit aff16b2 ] This commit sets the routing-mode and tunnel-protocol based on .Values.tunnel or .Values.routingMode and .Values.tunnelProtocol, defaulting to the former behavior but honoring new values if set. Signed-off-by: Marco Aurelio Caldas Miranda <17923899+macmiranda@users.noreply.github.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit fdeb6f3 ] Signed-off-by: Marco Aurelio Caldas Miranda <17923899+macmiranda@users.noreply.github.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit d987c37 ] The Makefile for the documentation makes sure that the docs-builder image is always present and up-to-date before building the docs. To do so, it simply rebuilds the image, every time. This is efficient to make sure the image is up-to-date, but not that much in terms of build duration. Let's make it possible to skip that step if the user is confident their image is present and up-to-date. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit e58d673 ] Allow users to set the INCREMENTAL environment variable to avoid passing the "-E" option to Sphinx when skipping the linters. This way, we can have Sphinx do incremental builds instead of re-reading and re-writing all files. This can be used for quick iterations on a local setup. There is no need to make "-E" conditional for the spell checker run, given that the spell checker seems to always re-read all files anyway. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 38d5669 ] The documentation build targets, including html, have a dependency on update-helm-values. There is no clear reason for that: this seems to be for historical reasons. - Initially, the dependency was on $(HELM_VALUES), because the file with Helm values had not been committed to the Git repo at the time, so we needed to ensure that it was generated first. - When reorganising checks for Helm values in 6167f8a ("docs: check updates for the Helm reference"), we replaced it with "update-helm-values" which looked cleaner; but we don't need to run the _check_ at this stage, only to make sure the _file_ itself is available. Now that the file is part of the repository, there's no need to check for its existence, and we can simply remove the target dependency (to align it with other files that receive automatic updates, such as codeowners.rst or crdlist.rst). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit cf3dc21 ] Let's make sure all Makefile targets are properly documented, and nicely displayed when running "make help". Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4d699d8 ] Move around targets inside of Documentation/Makefile to have them displayed in a more sensible order on "make help". $ make help Usage: make <target> Targets (default: "html") Development Images base-image Build the docs-base image for updating the requirements.txt file. builder-image Build the docs-builder image for rendering and checking the documentation. Auto-generated Contents Updates and Validation api-flaggen Update the table of API flags restrictions. update-cmdref Update the command reference documents (agent, bugtool, operators, etc.). update-codeowners Update the description of the code owner teams. update-crdlist Update the list of CRDs. update-helm-values Update the Helm reference documentation. check Validate command and Helm references, policy examples, and others. Build html Check documentation and render it under the specified format. epub Check documentation and render it under the specified format. latex Check documentation and render it under the specified format. live-preview Build and serve the documentation locally. Development update-requirements Regenerate the requirements.txt file from requirements-min/requirements.txt. clean Clean up all artefacts from documentation. help Display help for the Makefile. Also add target update-cmdref to .PHONY. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 5b77a54 ] Since commit 944dddf ("docs: Symlink (instead of copying) the API reference"), we symlink the _api/ directory (from ../api) instead of copying it. So there's nothing to clean on that side on "make clean", otherwise we lose the link. Fixes: 944dddf ("docs: Symlink (instead of copying) the API reference") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 2b4f6df ] Rather than having this information hidden in a parenthesis in the troubleshooting section, let's have a full paragraph in the key rotation section that explains what needs to be done when using Cluster Mesh. Signed-off-by: Marga Manterola <marga@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4cb0940 ] As k8s UID are unique, we can index the policies by their UId. This will decrease Cilium's boot-up time for clusters with a large number of network policies. For a cluster with a lot of network policies that was taking ~4m20s to boot Cilium, with this change it will take ~20 seconds. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit c36548c ] Function is_html5_writer_available() is deprecated, its invocation is unnecessary (the writer _is_ available in the versions of Sphinx we're using), and it has been removed in more recent versions of Sphinx. Let's get rid of it. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 94da061 ] [ backporter's notes: picked Documentation/requirements.txt verbatim from the main branch to keep it consistent across versions, as we nonetheless use the same image in CI. ] Now that we have rebased our theme onto the upstream version, we can use newer versions of docutils, which allows us to update Sphinx and a bunch of dependencies. We're still limited to docutils 0.18 because of sphinx-tabs, though. One immediate improvement is that we get working links for the value types in the gRPC reference. There are newer versions of Sphinx, websupport, or some second-level dependencies available (currently 7.2.6 and 1.2.6, respectively), but Netlify doesn't know about them yet and fails to build the previews. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 30c4df9 ] In the past, we disabled Sphinx's output in check-build.sh. This was because of the way we handle warnings: in order to have them printed at the end of the logs, we print them to a file, we filter them, and print the outstanding ones at the end of the output. Keeping the standard output meant that warnings or errors wouldn't appear in the middle of this output, and warnings that we filtered out wouldn't appear at all, which could be confusing. With the recent Sphinx update, we are able to filter the warnings from MyST more efficiently, directly from conf.py, and to get rid of the filters. Let's take advantage of this to re-enable the output from Sphinx. While it's not necessary to build the docs, it's occasionally helpful to debug the setup when things don't go as expected. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 2f7fbfe ] Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit f73e1c5 ] [ backporter's notes: switched cilium/dns import to miekg/dns, as v1.14 was relying on a replace directive instead of pointing to the fork. ] PR #25309 introduced a data race by sharing the sessionUDPFactory between the DNS server instances for the different IP families (IPv4 & IPv6). This has been detected by #27979. This commit fixes the issue for the TCP servers too. It not set explicitly, these are initialized with the default udp session factory to prevent nil pointer exceptions. Therefore, an explicit noop udp session factory is set. Fixes: #28156 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit c6fbeb1 ] This target should completely ignore the existing contents of `cilium/values.yaml` since we always generate from the template files. Co-authored-by: Donia Chaiehloudj <donia.cld@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
9fb108a
to
28f2f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good. Thank you!
/test-backport-1.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backports for my doc PRs look good, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good
The build commits check failed due to timeout. Manually run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config changes LGTM.
Thanks for the backports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mine are good, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR 28163 and 28219 LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for HTTPRoute commit ✅
ℹ️ Re-applied hunk to the v1.14 version of the upgrade notes.
ℹ️ Dropped changes in install/kubernetes/Makefile as the given line was not present in v1.14, and re-generated the cilium/values.yaml and cmdref/cilium-agent.md files from scratch.
ℹ️ Picked Documentation/requirements.txt verbatim from the main branch to keep it consistent across versions, as we nonetheless use the same image in CI.
ℹ️ Dropped the CODEOWNERS changes, as not propagated to stable branches.
ℹ️ Switched cilium/dns import to miekg/dns, as v1.14 was relying on a replace directive instead of pointing to the fork.
cilium/values.yaml
target to.PHONY
#28225 (@nbusseneau)Once this PR is merged, you can update the PR labels via:
or with