Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 Backports 2023-03-22 #24547

Merged
merged 35 commits into from
Apr 6, 2023
Merged

v1.13 Backports 2023-03-22 #24547

merged 35 commits into from
Apr 6, 2023

Conversation

NikAleksandrov
Copy link

@NikAleksandrov NikAleksandrov commented Mar 23, 2023

PRs skipped due to conflicts:

Once this PR is merged, you can update the PR labels via:

for pr in 22848 24205 24174 24171 24255 24144 24140 24164 24271 24275 24244 24134 24163 24304 24223 24189 24046 24373 24267 24407 24400 24398 23764 24413 24460 24428 24477 24435 23297; do contrib/backporting/set-labels.py $pr done 1.13; done

or with

make add-labels BRANCH=v1.13 ISSUES=22848,24205,24174,24171,24255,24144,24140,24164,24271,24275,24244,24134,24163,24304,24223,24189,24046,24373,24267,24407,24400,24398,23764,24413,24460,24428,24477,24435,23297

@NikAleksandrov NikAleksandrov requested a review from a team as a code owner March 23, 2023 15:24
@maintainer-s-little-helper maintainer-s-little-helper bot added the backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. label Mar 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the kind/backports This PR provides functionality previously merged into master. label Mar 23, 2023
@NikAleksandrov
Copy link
Author

Few notes:

  • PR 24350 was skipped because the commits are missing from v1.13
  • PR 24271 had to be changed (changes went into the respective v4 and v6 fib functions instead of a central one)
  • PR 24296 will be handled by @brb
  • PR 23764 the labels code was pushed to injectLabels() instead of resolveIdentity() since that is missing

@aojea
Copy link
Contributor

aojea commented Mar 23, 2023

what about #24202 @NikAleksandrov , was it already backported?

Copy link
Contributor

@PhilipSchmid PhilipSchmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@pchaigno
Copy link
Member

pchaigno commented Mar 23, 2023

Few notes:

@NikAleksandrov I'd encourage putting that information in the OP otherwise many people will miss it. GitHub may even end up hiding your comment.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PRs look good. Thanks!

@oblazek
Copy link
Contributor

oblazek commented Mar 23, 2023

my PR looks good as well :)

@joestringer
Copy link
Member

Few notes:

Paul suggested to put these in the PR description, which I think is the minimum for informing the original authors - we rely on this to identify when there are changes we need to take a closer look at. I would suggest taking this one step further and also including this information in the commit messages. Then if it turns out there is a bug related to this issue and we bisect to that commit, we will have the context that the backport made some specific changes vs. the original proposed PR.

Note that this is standard process that is expected from backporters per step 3 in the backporting guide.

@NikAleksandrov
Copy link
Author

Few notes:

Paul suggested to put these in the PR description, which I think is the minimum for informing the original authors - we rely on this to identify when there are changes we need to take a closer look at. I would suggest taking this one step further and also including this information in the commit messages. Then if it turns out there is a bug related to this issue and we bisect to that commit, we will have the context that the backport made some specific changes vs. the original proposed PR.

Note that this is standard process that is expected from backporters per step 3 in the backporting guide.

Alright, point taken. I'll update the descriptions.

@hemanthmalla
Copy link
Member

My PR looks good. Thank you.

forgems and others added 16 commits April 6, 2023 13:23
[ upstream commit 04fe6a3 ]

Empty VirtualMachineScaleSets returns 404 on ListVirtualMachineScaleSetNetworkInterfaces if there are no nodes in it.
Continue with next VirtualMachineScaleSet in that case.

Signed-off-by: Marcin Swiderski <marcin@arangodb.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 94b0ee1 ]

Previously when etcd client context was not set and client
created with client.New(*config) it would use context.TODO,
without any way to cancel the ctx.

In some cases (e.g. wrong endpoints configuration) it would
 be necessary to cancel the ctx, otherwise the client might
be stucked in a retry loop and in case of remoteConnectionController
there wouldn't be any way stop that.

This commit passes a parent context to the client itself.

This commit also fixes the closing of stopWatch channel and rest
of the resources when ctx is done/cancelled.

If ctx was closed and select statement choosed handling of that
case (<-ctx.Done()) the code would quit the for loop and handling
of <-w.stopWatch would never be possible which means calling
w.Stop() from outside (e.g. releaseOldConnection()) would hang
forever as the wg counter wouldn't be decremented.
Now when the context gets closed, it correctly closes all channels
and decrements the wg counter, just like when handled by
releaseOldConnection func which releases everything.

Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 86aa873 ]

This commit adds a new ControllerParam CancelDoFuncOnUpdate which
allows cancelling the controller context passed to DoFunc.

This way we can terminate the controller (if stuck in e.g. etcd
client retry loop) and still preserve the existing logic.

Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 9ac5b53 ]

When using checker.ExportedEqual(), it was using the standard Equals
checker under-the-hood, but this is incorrect. Fix it to use the correct
checker.

In the commit introducing the bug, there was no direct usage of
checker.ExportedEqual(), but rather checker.ExportedEqual (note the
plural). Subtle!

Discovered while working on improving policy unit tests.

Fixes: f4407e7 ("checker: Add ExportedEquals checker")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 3bd923c ]

Fixes: #24266

Signed-off-by: Alex Katsman <alexkats@google.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 477b43b ]

Signed-off-by: Philip Schmid <philip.schmid@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 93ebeb3 ]

The current documentation is incorrect as the default value is 0 and not
5 minutes. 0 implies a dynamic interval value, which this commit now
documents.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit ee7572b ]

We are currently maintaining two lists of drivers supporting XDP, one in
the kube-proxy-free guide and another one in the XDP documentation.
Let's keep only the XDP documentation one, but with the kube-proxy-free
one's format (it's cleaner).

The list in the kube-proxy-free guide is replaced with a reference to
the now-unique list.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 8d28f16 ]

We upstreamed XDP support for the bond devices in Linux v5.15. Let's
document it.

1 - https://isovalent.com/blog/post/2021-12-release-111/#transparent-xdp-bonding-support
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 1c01d51 ]

An IP / CIDR can be associated with the reserved:world + CIDR labels in
the IPcache, for example if a user applies a ToCIDR. Later, if this CIDR
is then associated with an entity within the cluster, then the IPcache
subsystem will not correctly handle this scenario. It will append the
reserved:remote-node or reserved:host label, while keeping the
reserved:world + CIDR labels.

This doesn't make sense as reserved:world + CIDR labels are meant to
represent entities outside of the cluster.

This scenario can happen when a user wants to allow traffic to an IP
that is firstly outside of the cluster, and then later joins the
cluster.

For example, a user has a K8s cluster and an external VM with an IP of
x.y.z.w. The user wants to allow traffic to x.y.z.w so they create a
ToCIDR rule allowing it. Later, the user joins the VM to the cluster and
it becomes a worker node. Now x.y.z.w is an entity within the cluster
and is therefore associated with either the reserved:remote-node or
reserved:host label.

Fix this by disassociating the reserved:world + CIDR labels from an
entity that's now within the cluster.

Related: #17962
Fixes: #23750

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 0557a2f ]

Earlier versions of these flags had a prefix of ENI and were later
renamed with a prefix of IPAM to be consistent across cloud providers.
While removing the deprecated old flags, #12676 also removed lines
needed for setting OperatorConfig struct fields from the values read in.
This resulted in fields having golang defaults, which caused the rate
limiter to be completely bypassed.

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 986daf4 ]

contrack -> conntrack

Fixes: 93ebeb3 ("docs: Update the documentation for the conntrack-gc-interval flag")
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 9396bde ]

While reviewing a new Isovalent Instruqt Cilium IPAM lab, we've found
that the current Cilium documentation about Cluster-Pool IPAM mode isn't
up to date. If refers to fields (e.g.
`Spec.IPAM.PodCIDRAllocationThreshold`) which never even existed. At the
same time, it uses capital letters for actual field names which isn't
how the CiliumNode CRD actually defines it.

Signed-off-by: Philip Schmid <philip.schmid@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 30a783f ]

Renovate v35 had a breaking change in how it computes the digest of
GitHub releses. Previously, it would use the digest of the release
attachements, but now it's just using a git sha as the "digest". This
commit changes the data source for the Hubble artifacts to use the data
source which preserves the old behavior.

Ref: renovatebot/renovate#20178

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit b2bc42a ]

Signed-off-by: Liz Rice <liz@lizrice.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
[ upstream commit 885cffc ]

We need to replace the t.Fatal calls with t.Error when within the
await callbacks. These callbacks are called by the fake client but from
the goroutine of the LBIPAM controller. This means that we panic in
the controller causing us to not call `.Done` on resource events.

Not calling `.Done` on resource events triggers the finalizer based
checks to enforce `.Done` is always called. This masks the original
error.

By using t.Error we do mark the test as failed but only ever panic in
the test suite itself.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
@jibi jibi force-pushed the pr/v1.13-backport-2023-03-22 branch from 8d89c3f to ae078fd Compare April 6, 2023 11:23
@jibi
Copy link
Member

jibi commented Apr 6, 2023

rebased on top of v1.13

@jibi
Copy link
Member

jibi commented Apr 6, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.18-kernel-4.9' hit: #22578 (95.87% similarity)

Job 'Cilium-PR-K8s-1.16-kernel-4.9' has 1 failure but they might be new flake since it also hit 1 known flake: #22578 (97.53% similarity)

@jibi
Copy link
Member

jibi commented Apr 6, 2023

/test-1.16-4.9

@jibi
Copy link
Member

jibi commented Apr 6, 2023

test-1.18-4.9 is failing due to a known flake, everything else is green, reviews are in

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 6, 2023
@squeed squeed merged commit 8ebfb18 into v1.13 Apr 6, 2023
36 of 37 checks passed
@squeed squeed deleted the pr/v1.13-backport-2023-03-22 branch April 6, 2023 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet