Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address ipcache startup perfomance regression #25007

Merged

Conversation

bimmlerd
Copy link
Member

@bimmlerd bimmlerd commented Apr 20, 2023

The commit messages should be descriptive, here's the summary line for each.

  1. ipcache: add benchmark for Upsert (regression.txt below)
  2. ipcache: make NamedPortMultiMap an interface (interface.txt below)
  3. ipcache: switch named ports to reference counting (fix.txt below)
  4. ipcache: simplify the named ports update code (cleanup.txt below)

The following is a comparison of running the benchmark added in commit 1 over the course of the four commits.

goos: linux
goarch: arm64
pkg: github.com/cilium/cilium/pkg/ipcache
                      │ regression.txt │             interface.txt             │               fix.txt               │             cleanup.txt             │
                      │     sec/op     │     sec/op      vs base               │   sec/op     vs base                │   sec/op     vs base                │
IPCacheUpsert10-10        97.10µ ±  3%     94.58µ ±  3%       ~ (p=0.052 n=10)   48.84µ ± 5%  -49.70% (p=0.000 n=10)   47.59µ ± 5%  -50.98% (p=0.000 n=10)
IPCacheUpsert100-10      2672.3µ ±  5%    2718.4µ ±  8%       ~ (p=0.481 n=10)   508.3µ ± 5%  -80.98% (p=0.000 n=10)   485.0µ ± 2%  -81.85% (p=0.000 n=10)
IPCacheUpsert1000-10     69.543m ± 13%    66.515m ± 17%       ~ (p=0.481 n=10)   5.282m ± 5%  -92.40% (p=0.000 n=10)   5.056m ± 2%  -92.73% (p=0.000 n=10)
IPCacheUpsert10000-10   4166.59m ±  1%   4299.95m ±  2%  +3.20% (p=0.000 n=10)   48.63m ± 5%  -98.83% (p=0.000 n=10)   44.87m ± 5%  -98.92% (p=0.000 n=10)
geomean                   16.56m           16.47m        -0.55%                  1.589m       -90.40%                  1.513m       -90.86%

The last commit, cleanup.txt removes some now redundant work, and hence gets a mild speedup. If preferred, it can be squashed into the third commit, but like this the algorithmic changes are easier to understand.

goos: linux
goarch: arm64
pkg: github.com/cilium/cilium/pkg/ipcache
                      │   fix.txt   │            cleanup.txt             │
                      │   sec/op    │   sec/op     vs base               │
IPCacheUpsert10-10      48.84µ ± 5%   47.59µ ± 5%       ~ (p=0.165 n=10)
IPCacheUpsert100-10     508.3µ ± 5%   485.0µ ± 2%  -4.58% (p=0.005 n=10)
IPCacheUpsert1000-10    5.282m ± 5%   5.056m ± 2%  -4.28% (p=0.011 n=10)
IPCacheUpsert10000-10   48.63m ± 5%   44.87m ± 5%  -7.74% (p=0.009 n=10)
geomean                 1.589m        1.513m       -4.81%

Note that simply reverting 93cd67f is not actually a fix, as the quadratic behaviour still lurks. This can be observed by forcefully setting ipcache.needNamedPorts = true when benchmarking:

                      │ main-head.txt  │    revert-with-neednamedports.txt     │ 
                      │     sec/op     │     sec/op      vs base               │  
IPCacheUpsert10-10        95.00µ ±  5%     92.12µ ±  3%  -3.04% (p=0.000 n=10)   
IPCacheUpsert100-10      2729.7µ ±  4%    2617.5µ ± 12%       ~ (p=0.165 n=10)   
IPCacheUpsert1000-10     70.192m ± 11%    68.974m ± 10%       ~ (p=0.579 n=10)   
IPCacheUpsert10000-10   4247.52m ±  3%   4233.62m ±  1%       ~ (p=0.853 n=10)   
geomean                   16.68m           16.29m        -2.31%                 

Fixes: #24987

Address cilium-agent startup performance regression.

@bimmlerd bimmlerd added release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/agent Cilium agent related. labels Apr 20, 2023
@bimmlerd bimmlerd requested a review from gandro April 20, 2023 15:50
@bimmlerd bimmlerd changed the title adress ipcache startup perf regression adress ipcache startup perfomance regression Apr 20, 2023
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, nice work! The fix looks good to me, I have a few minor questions below.

Just a nit on the commit msg: technically the namedport map is not the only lock being held while it's locked; the ipcache lock is always held before the namedport map lock is taken. Actually, it seems like we should "promote" this relationship to say that the ipcache lock must be taken before the namedport map.

pkg/ipcache/ipcache_test.go Outdated Show resolved Hide resolved
pkg/ipcache/ipcache_test.go Outdated Show resolved Hide resolved
pkg/types/portmap.go Show resolved Hide resolved
@christarazi christarazi added kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 20, 2023
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/ipcache-startup-perf-regression branch 2 times, most recently from 123f3e2 to cbcb6f8 Compare April 20, 2023 16:39
@bimmlerd bimmlerd marked this pull request as ready for review April 20, 2023 16:48
@bimmlerd bimmlerd requested review from a team as code owners April 20, 2023 16:48
@bimmlerd bimmlerd requested a review from ldelossa April 20, 2023 16:48
@christarazi christarazi added the backport/author The backport will be carried out by the author of the PR. label Apr 20, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 Apr 20, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 Apr 20, 2023
@christarazi christarazi changed the title adress ipcache startup perfomance regression Address ipcache startup perfomance regression Apr 20, 2023
@joestringer
Copy link
Member

/test

@bimmlerd bimmlerd force-pushed the pr/bimmlerd/ipcache-startup-perf-regression branch from cbcb6f8 to 23d9b0d Compare April 21, 2023 07:35
@bimmlerd
Copy link
Member Author

bimmlerd commented Apr 21, 2023

technically the namedport map is not the only lock being held while it's locked; the ipcache lock is always held before the namedport map lock is taken. Actually, it seems like we should "promote" this relationship to say that the ipcache lock must be taken before the namedport map.

What I was trying to express in the commit message was that we don't acquire a new lock while holding the namedport mutex (M), and hence don't establish any relations of the form M > X, for any X. I think that's correct. But yes, we do hold the ipcache mutex while acquiring the namedport mutex, and establish IPCache > M. This is precisely why it's crucial to not acquire any other mutex while also holding M.

Tests were green before pushing to clarify the commitmsg and fix a cosmetic issue in the benchmark, but more reruns can't hurt.

I've also dropped the Get interface from the NamedPortsMultiMap since it was needed only for tests, which I modified slightly to use GetNamedPorts instead. This also has the benefit that consumers of the API now don't have access to the raw PortProtoSet anymore, and thus are truly read-only by API.

@bimmlerd
Copy link
Member Author

bimmlerd commented Apr 21, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With native routing

Failure Output

FAIL: Pods are not ready in time: timed out waiting for pods with filter  to be ready: 4m0s timeout expired

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-5.4/66/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-5.4 so I can create one.

@bimmlerd
Copy link
Member Author

bimmlerd commented Apr 21, 2023

/mlh new-flake Cilium-PR-K8s-1.25-kernel-5.4

👍 created #25042

@bimmlerd
Copy link
Member Author

/test-1.25-5.4

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks for taking care of this! 🙏

There is one issue around ipc.needNamedPorts and the externally visible behavior that I'm not sure how to address, see inline comment.

pkg/ipcache/ipcache.go Show resolved Hide resolved
pkg/ipcache/ipcache_test.go Outdated Show resolved Hide resolved
pkg/types/portmap.go Outdated Show resolved Hide resolved
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/ipcache-startup-perf-regression branch from 23d9b0d to 56a943c Compare April 24, 2023 11:29
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/ipcache-startup-perf-regression branch from 56a943c to cb4cec8 Compare April 24, 2023 14:13
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 26, 2023
@bimmlerd
Copy link
Member Author

bimmlerd commented Apr 27, 2023

/test-1.24-5.4

(Required tests have changed, running the new ones now)

@bimmlerd
Copy link
Member Author

/test-1.25-4.19

@bimmlerd
Copy link
Member Author

/test-1.26-net-next

@bimmlerd
Copy link
Member Author

CI triage:

I'll gamble one rerun and otherwise rebase and run it all again

@bimmlerd
Copy link
Member Author

bimmlerd commented Apr 27, 2023

/test-1.26-net-next

No bueno. Rebasing and let's run it all.

This patch adds a benchmark for ipcache.Upsert, with the intention of
exercising the logic for named ports. This exposes a performance
regression when upserting many pods, as is the case on agent startup.

The startup regression was due to losing an optimization in 93cd67f,
namely that the named ports bookkeeping is done only once named ports
are present in policies. Without the lazy bookkeeping, we have
accidentally quadratic behavior, as we look at every pod whenever a
new pod is added.

Follow up commits will address the issue, this is merely a convenient
reproduction of the regression.

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
In preparation of introducing reference counting for named ports, change
the NamedPortMultiMap from a concrete type to an interface. This
produces a fair bit of churn in the test code (as it assumes the type is
a map). There are no functional changes, however.

The benefit of having this type be an interface is that we can switch
it's implementing type to use reference counting with a smaller patch.
That, in turn, will allow us to avoid looking at all other pods when a
new pod is added or deleted, just for the named port bookkeeping.

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
This commit introduces reference counting for named ports. Using the
reference counting, we know when to add or remove named ports in our
bookkeeping, without having to look through all other pods. This leads
to a significant speedup when inserting many pods, as is the case on
agent startup.

Note that this patch does _not_ restore the original optimization of not
keeping track of named ports until used in a policy. We did not find a
good way of doing so while also avoiding re-introducing the deadlock
fixed in the commit referenced below. Instead, we always do the
bookkeeping, but in a more efficient manner.

The reference counting requires the use of a RWLock, since read and
write accesses to a map must be protected. Thus, there is potential for
introducing yet another deadlock. However, we argue that this is not the
case: We do _not_ acquire any other lock while holding our mutex M. A
wait cycle/deadlock involving M must include holding M when acquiring
some other lock, however. Therefore this patch cannot introduce a
deadlock.

Fixes: 93cd67f (ipcache: fix potential deadlock in GetNamedPorts)
Fixes: cilium#24987

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
With the reference counting in place, we can simplify the involved code
paths in the ipcache. There is now little point to conditionally perform
the update, as the update itself is about as expensive as checking
whether it is necessary. Therefore, opt for simpler code and always call
into NamedPortMultiMapUpdater.Update.

Indeed, this even gives a mild speedup:

goos: linux
goarch: arm64
pkg: github.com/cilium/cilium/pkg/ipcache
                      │   fix.txt   │            cleanup.txt             │
                      │   sec/op    │   sec/op     vs base               │
IPCacheUpsert10-10      48.84µ ± 5%   47.59µ ± 5%       ~ (p=0.165 n=10)
IPCacheUpsert100-10     508.3µ ± 5%   485.0µ ± 2%  -4.58% (p=0.005 n=10)
IPCacheUpsert1000-10    5.282m ± 5%   5.056m ± 2%  -4.28% (p=0.011 n=10)
IPCacheUpsert10000-10   48.63m ± 5%   44.87m ± 5%  -7.74% (p=0.009 n=10)
geomean                 1.589m        1.513m       -4.81%

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/ipcache-startup-perf-regression branch from eb1b776 to df332c9 Compare April 27, 2023 14:38
@bimmlerd
Copy link
Member Author

/test

@rolinh rolinh merged commit 191b672 into cilium:main Apr 28, 2023
56 checks passed
@bimmlerd bimmlerd added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Apr 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 Apr 28, 2023
@bimmlerd bimmlerd deleted the pr/bimmlerd/ipcache-startup-perf-regression branch April 28, 2023 09:49
@bimmlerd bimmlerd added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels May 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 May 2, 2023
@bimmlerd bimmlerd added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels May 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/author The backport will be carried out by the author of the PR. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
1.12.10
Backport done to v1.12
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

Extremely slow agent startup
6 participants