chore(cache): increased the default lock timeout to 10s that mlcache #12956

ms2008 · 2024-04-29T08:25:19Z

Summary

Currently, this lock timeout is hardcoded and does not provide an optional configuration parameter out of the box. We have received multiple complaints from users with the following error in their logs:

failed to get from node cache: could not acquire callback lock: timeout

So, here we double the default timeout time

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

Fix FTI-4299

waits for when running L3 callbacks

bungle · 2024-04-29T10:24:48Z

How feasible it is to expect that if "the thing" doesn't happen in 5 secs, it will happen in 10 secs?

dubuqingfeng · 2024-05-07T02:57:38Z

Are there any updates?

ms2008 · 2024-05-11T08:32:36Z

How feasible it is to expect that if "the thing" doesn't happen in 5 secs, it will happen in 10 secs?

I don't have any quantifiable data to provide strong support for the fix here, much like setting a TCP timeout, which depends on the specific environment of each individual. However, in practice, increasing the value here to 10s doesn't seem to pose significant risks. Additionally, we have several users who, after independently increasing this limit, no longer encounter issues.

So, if we prefer not to expose this configuration to users, increasing this limit could indeed solve some users' problems.

bungle · 2024-05-13T09:27:59Z

I don't have any quantifiable data to provide strong support for the fix here, much like setting a TCP timeout, which depends on the specific environment of each individual. However, in practice, increasing the value here to 10s doesn't seem to pose significant risks.

Yes, but where is the limit. Next time we increase it to 30 secs? Is it risk that this starts showing up in max latencies or p99 latencies etc.? More things piling up? More memory used because of piling up? I know that current limit is magic number, just like this proposal is. I would see we somehow fix the root cause or something. Why does retrieving certificate take more than 5 secs. Why it even takes time, why isn't it cached already etc.

Additionally, we have several users who, after independently increasing this limit, no longer encounter issues.
So, if we prefer not to expose this configuration to users, increasing this limit could indeed solve some users' problems.

Yes, but at the same time hide the problems that we probably want to fix. Same question can be asked about any of our timeouts, why not just increase every timeout?

chore(cache): increased the default lock timeout to 10s that mlcache

f6c4ead

waits for when running L3 callbacks

pull-request-size bot added the size/XS label Apr 29, 2024

ms2008 added the skip-changelog label Apr 29, 2024

github-actions bot added the cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee label Apr 29, 2024

github-actions bot assigned ms2008 Apr 29, 2024

ms2008 marked this pull request as ready for review April 29, 2024 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(cache): increased the default lock timeout to 10s that mlcache #12956

chore(cache): increased the default lock timeout to 10s that mlcache #12956

ms2008 commented Apr 29, 2024

bungle commented Apr 29, 2024 •

edited

dubuqingfeng commented May 7, 2024

ms2008 commented May 11, 2024

bungle commented May 13, 2024

chore(cache): increased the default lock timeout to 10s that mlcache #12956

Are you sure you want to change the base?

chore(cache): increased the default lock timeout to 10s that mlcache #12956

Conversation

ms2008 commented Apr 29, 2024

Summary

Checklist

Issue reference

bungle commented Apr 29, 2024 • edited

dubuqingfeng commented May 7, 2024

ms2008 commented May 11, 2024

bungle commented May 13, 2024

bungle commented Apr 29, 2024 •

edited