New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM error when using the Kubernetes client to query and operate the API server. #5970
Comments
You need to see what is holding references to the Http2Connections. |
@shawkins Sorry, I didn't reply promptly. I just finished my vacation. |
Are there any other references to the Http2Connection instances besides ReaderRunnable? Are you using an OkHttp ConnectionPool for example? If there is nothing else obviously holding on to the references, then you'll need to provide more of your code or a reproducer so we can see what code path might be leaving a connection open. We had something like this in the past with http2 #4665 - but have not encountered anything like that in a while. |
Based on the heap, there isn't any other references pointing to
In this environment, it should be using the
Use the client to execute some commands.
Use client to query all pods by pagenation
|
A couple of thoughts:
|
Previously, I used a global client and found that after many times queries, I also encountered OOM errors. After analysis, I discovered that the |
It's impossible to say from just this description. It could range from:
If it's not a usage error, then you can try one of the other client types to see if the behavior changes. |
Okay,maybe I should use a client pool in some form. |
You don't need a pool of KubernetesClients - just 1 for a given configuration / cluster. All of the http clients underneath the kubernetesclient use connection pooling. Please double check that any InputStreams and Readers you obtain from the KuberenetesClient are getting closed. Testing this out locally seems to confirm some of what you are observing - these connections survive eviction from the pool because it has active allocations (open streams). However the ConnectionPool should not get garbage collected and should still have a reference to the connection. This is because there should be a thread called "OkHttp ConnectionPool" running holding a reference to it - and it should at 5 minute intervals checked for orphaned allocations and emit messages like "Did you forget to close a response body?". One thing we can consider is adding these streams to our internal closure list to ensure they are cleaned up sooner than 5 minutes. |
When I only used a single global client, the scenarios were limited to querying a list of Pods (without using pagination) and |
Correct. Neither of those operations maintains an on-going stream with the api server. |
Marking as closed until there is more information. |
Hi, |
@yan-v That should not fully resolve your issue if you are experiencing the same behavior as @tinystorm, and without a further reproducer it's hard to say exactly what is going on. If you are able to provide one that would be great.
The examples do not stress client reuse, that is correct - that should be covered in other parts of the docs and is certainly handled for you when you use the client as part of a platform, like quarkus. If you see a place where additional comments / docs would help, please open an issue.
The client implements Closeable because it exposes a close method - it does not require you to use it that way in a try catch block. I'm assuming that the examples were written the way they are so that they read as free-standing rather than showing injection of or separate lifecycle handling of the client. |
|
I am not sure - from a client perspective I'd hope that the connection at least returns to the pool, and that the job to cleanup orphaned allocations / streams works regardless of what is fronting the api server. The other things to keep in mind are - what you are seeing could be http2 specific - are you able to use http1 instead? And what you are seeing could be okhttp specific - using a different httpclient with the kubernetes client might clarify this, or could highlight more clearly what is being left open. |
This is the error I got(a different trace than in initial post):
The simplified code I used before making k8sClient a singletone: try (KubernetesClient k8sClient = new KubernetesClientBuilder().build()) {
String k8sNamespace = "MY_NAMESPACE";
k8sClient.apps().deployments().inNamespace(k8sNamespace).withName(deploymentName).scale(someMyNumber);
} catch (Exception e) {
...
} The error appears not immediately, but after a few executions and about a hour or two, even if service was idle in those hours. Thank you for your support! |
Unfortunately just the stacktrace is not enough - at least a heapdump to confirm what is being held in memory, and then if needed more reproduction steps.
Start with a heap dump and see what is being held - if possible also try the alternatives mentioned in #5970 (comment) - that should narrow things down as much as possible to where the problem lies. |
Luckily, client as a singleton works for now, i'll keep it as is. Thank you again for your quick responses! |
Describe the bug
I'm using the Kubernetes client to query and operate pods, not sure why I'm experiencing OOM (Out of Memory) errors.
I am executing commands and querying the complete list of pods (with caching) within a pod using a scheduled approach.
It seems that the OOM issue is not directly caused by the frequency of queries, as i haven't encountered the problem in larger environments with higher query and operation frequencies.
Based on the memory analysis, it appears that a large number of Http2Connection objects are not being released, causing them to occupy a significant portion of the memory. But I am confident that I am closing each client immediately after using it.
Note that my Kubernetes service is proxied through HAProxy and distributed to three API servers.
Fabric8 Kubernetes Client version
6.10.0
Steps to reproduce
The logic can be simplified into a loop.
Expected behavior
No OOM
Runtime
Kubernetes (vanilla)
Kubernetes API Server version
other (please specify in additional context)
Environment
Linux
Fabric8 Kubernetes Client Logs
Additional context
k8s version is 1.16.1.
If you need more infomation please let me know.
The text was updated successfully, but these errors were encountered: