-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption keeps increasing over time. #1291
Comments
"bad":
normal:
|
It's clear that we have too many connections open; but the number of edges (5 for the bad environment) doesn't justify this number
|
Kept digging a bit, and with log level set to
|
On Sat, Nov 4, 2023 at 10:09 AM Nicola Cadenelli ***@***.***> wrote:
Kept digging a bit, and with log level set to info we see a set of 3
connection opened every 10 seconds. Apparently these are the connections
that stay open; hence the memory consumption.
Were you seeing the same memory growth issue in the previous versions ?
What is the last version of the router where this kind memory growth did
not happen?
Also, what is the traffic pattern your test is generating ?
There seems to be 3 connections opened every 10 seconds, do the connections
disconnect at any point or do you simply keep adding more and more
connections in your test ?
Thanks.
… 2023-11-04 14:03:52.421565 +0000 SERVER (info) [C74] Accepted connection to :5671 from 172.16.73.96:54016
2023-11-04 14:03:52.447510 +0000 ROUTER_CORE (info) [C74] Connection Opened: dir=in host=172.16.73.96:54016 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=eKfLLvQChbDn_Xg1judMer01t8KpHzS8lI5X8ULIqNSJOw7KsWYehg props=
2023-11-04 14:03:52.448530 +0000 ROUTER_CORE (info) [C74][L332] Link attached: dir=out source={mc/sfe.5k5vq:0 expire:sess} target={<none> expire:sess}
2023-11-04 14:03:52.477837 +0000 SERVER (info) [C73] Accepted connection to :5671 from 172.16.73.96:54008
2023-11-04 14:03:52.480861 +0000 SERVER (info) [C75] Accepted connection to :5671 from 172.16.73.96:54028
2023-11-04 14:03:52.493783 +0000 ROUTER_CORE (info) [C73] Connection Opened: dir=in host=172.16.73.96:54008 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=opo6spKrgHVYObYnaRdWQCNuSToOCev4FIL1nlxuynIjqU40f11_VA props=
2023-11-04 14:03:52.495941 +0000 ROUTER_CORE (info) [C73][L333] Link attached: dir=out source={mc/sfe.5k5vq:0.flows expire:sess} target={<none> expire:sess}
2023-11-04 14:03:52.498905 +0000 ROUTER_CORE (info) [C75] Connection Opened: dir=in host=172.16.73.96:54028 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=MK2Fr0KGh6NDSK_6CseRKqOmcjSZ-DMn7F_UjkOmnZxFQWHuUb2X9A props=
2023-11-04 14:03:52.499577 +0000 ROUTER_CORE (info) [C75][L334] Link attached: dir=in source={<none> expire:sess} target={sfe.5k5vq:0 expire:sess}
2023-11-04 14:04:02.445625 +0000 SERVER (info) [C77] Accepted connection to :5671 from 172.16.73.96:49188
2023-11-04 14:04:02.456607 +0000 ROUTER_CORE (info) [C77] Connection Opened: dir=in host=172.16.73.96:49188 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=nOp39uX_kaiyBYqHJT-UWB1SCfW8k2ZAUZxiIVeY2UDapRBW-cmvFA props=
2023-11-04 14:04:02.456874 +0000 SERVER (info) [C76] Accepted connection to :5671 from 172.16.73.96:49180
2023-11-04 14:04:02.457993 +0000 ROUTER_CORE (info) [C77][L335] Link attached: dir=out source={mc/sfe.5k5vq:0 expire:sess} target={<none> expire:sess}
2023-11-04 14:04:02.464047 +0000 SERVER (info) [C78] Accepted connection to :5671 from 172.16.73.96:49194
2023-11-04 14:04:02.465785 +0000 ROUTER_CORE (info) [C76] Connection Opened: dir=in host=172.16.73.96:49180 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=1KyToNmJ7XEgh0gJRw2CI4Mssfbla5Z5Re8ek0fc-eBrdFk2aYO1mQ props=
2023-11-04 14:04:02.466459 +0000 ROUTER_CORE (info) [C76][L336] Link attached: dir=out source={mc/sfe.5k5vq:0.flows expire:sess} target={<none> expire:sess}
2023-11-04 14:04:02.470814 +0000 ROUTER_CORE (info) [C78] Connection Opened: dir=in host=172.16.73.96:49194 encrypted=TLSv1.3 auth=EXTERNAL user=CN=skupper-router-local container_id=TYhuzBUENuZ8HkfGFRZa_COagLMoPsmaoZ8OJvZqqC8GeYqLzS6BqA props=
2023-11-04 14:04:02.471996 +0000 ROUTER_CORE (info) [C78][L337] Link attached: dir=in source={<none> expire:sess} target={sfe.5k5vq:0 expire:sess}
—
Reply to this email directly, view it on GitHub
<#1291 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASKJ6TJ2RDHUPTC4C57LELYCZEA3AVCNFSM6AAAAAA642SU22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGQ2TGNRSGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
No, we didn't see this issue before. Or at least it wasn't so problematic.
We moved from 1.4.1 to 1.4.2 on September the 5th. When we did this we also set
We are not testing any traffic pattern per se. And I've tried to pin point these 3 connections but not sure where they come from. We have different services using Skupper to reach each others. |
It is not really clear to me what you are saying. Are you able to run the same reproducer against two versions of the router and able to say that in one version you are seeing the memory growth and not seeing memory growth in the other. That would give us a good starting point to analyze this issue. |
Hi @nicolacdnll - thanks for the detailed output dumps. From the look of it, the skupper controller seems to be opening an absurd number of connections and leaving them up. If you look at the log output you'll see references to "sfe.XXX" I'm going to ask our resident skupper controller expert to take a look at this. |
Hi @nicolacdnll the log reference to "sfe.XXX" indicates you are running the flow collector in your deployment. In the 1.4.2 release, there was an issue where the connection to a purged event source was not closed. This will be resolved in the 1.5.0 release which should be available next week. Disabling the collector should inhibit the memory growth you are seeing in 1.4.2. That said, it would be worth understanding why the collector is losing connectivity with the router indicated by "sfe.5k5vq:0". Can you provide a snippet of the log from the flow collector container that is running as a sidecar to the service-controller pod? Where have you placed your collector in relation to that router? |
Yes, we were using the flow collector and release 1.4.2. The router and flow-collector are in different pods that belong to the same namespace. But the pods are those created by the site-controller. The snippets I have are from the past days (in the meantime we had already moved to 1.4.3 and did many redeploy):
|
Also, here some snippet from the other "good" deployment always using 1.4.2 where the issue doesn't manifest:
|
We have multiple instances of Skupper v1.4.2 and, although the setup of the instances is very similar (if not the same), one of these has the skupper-router container v2.4.2-2 with a memory consumption that keeps increasing over time and reaches multiple GBs (approx 22 GiB in less than 24h). And the number of edges (5 for the bad environment) doesn't justify this high memory consumption.
Any idea what the problem could be?
Could it be an edge that is misconfigured?
Below the report of the
skstat -m
of the "bad instance" and another normal instance.The text was updated successfully, but these errors were encountered: