Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topics hono.command_internal.* are not being cleaned up on Docker versions >20.10 #3537

Closed
snorrihann opened this issue Aug 24, 2023 · 7 comments · Fixed by #3546
Closed
Assignees
Labels
bug C&C Command and Control
Milestone

Comments

@snorrihann
Copy link

The KubernetesContainerUtil.getContainerId() method doesn't find the containerId since Docker version 20.10, see:
https://stackoverflow.com/questions/68816329/how-to-get-docker-container-id-from-within-the-container-with-cgroup-v2

As a consequence the adapter instance id will be as for non-k8s, which will eventually result in hono.command_internal.* topics are not cleaned up when the adapter goes down.

@sophokles73
Copy link
Contributor

@calohmn would mind taking a look?

@calohmn calohmn self-assigned this Aug 26, 2023
@calohmn calohmn added the bug label Aug 26, 2023
@calohmn
Copy link
Contributor

calohmn commented Sep 3, 2023

The solution mentioned in the Stackoverflow link above - obtaining the container id from /proc/self/mountinfo if cgroup v2 is used - has for example been implemented in the opentelemetry-java-instrumentation module (see open-telemetry/opentelemetry-java-instrumentation#6694). It would be straightforward to use the corresponding ContainerResource class from that module in Hono.

But, the general issue here is, that /proc/self/mountinfo doesn't always contain the Kubernetes container id. The line that is being matched in there could contain another id in certain Kubernetes environments.

Example case on an AKS cluster, using containerd

Container id from kubectl describe pod: containerd://95154dedf5cfb2eac[...]

Line in /proc/self/mountinfo:

2994 2982 8:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/fc07d36ca7150d525[...]/hostname [...]

Running crictl inspect 95154dedf5cfb with the K8s container id on the K8s node returns:

{
  "status": {
    "id": "95154dedf5cfb2eac39167897e471373dfba3014ca2d883862811a2c71ae095e",
    [...]
  "info": {
    "sandboxID": "fc07d36ca7150d525ea42bfcfb8b5b7161865585007d1e6bdfff151da88de67a",

This issue is also mentioned in

There are initiatives to have a universal mechanism for obtaining the container ID, see

For the time being, I think the most general kind of solution in Hono would be to use the Kubernetes API client in the protocol adapter code, querying the adapter pod container status entry to get the container id from there.
(We are already using the Kubernetes API client in the command router pods to query the state of the adapter pods.)
This will require a small change in the Hono Helm chart, using a service account in the adapter pods which has the corresponding role assignment to get pod information.
For cgroup v1, we could still use the current approach, I think.

@sophokles73 sophokles73 added this to To do in 2.5.0 via automation Sep 4, 2023
@calohmn
Copy link
Contributor

calohmn commented Sep 18, 2023

KubernetesContainerUtil.getContainerId() is currently used in 2 places:

  • AbstractKafkaConfigProperties.getComponentUIdFromEnv()
    creating a Kafka client id containing K8s pod name and container id as unique identifiers
  • CommandConstants.getNewAdapterInstanceId()
    creating the adapter instance id used for routing command messages from the Command Router to the protocol adapters

For the Kafka client id, including the container id isn't strictly necessary. It could also be some other id that is unique for all containers of a pod (maybe containing a timestamp for better readability).

Suppose we keep the location of KubernetesContainerUtil.getContainerId() in hono-core, and keep both the above usages of the method. That means that all Hono protocol adapters, as well as Device Registry and Command Router (all containing Kafka clients) will need the K8s service account and role assignment to allow "get pods" operations and will have a few seconds added delay on startup for getting the container ID via the K8s API.

The alternative would be refactoring the method to create the Kafka client id, using some other id there. Then we could move the KubernetesContainerUtil.getContainerId() method to the hono-client-command module (that's where getNewAdapterInstanceId is used).
Then only the protocol adapters will need the K8s service account/role assignments and startup-times of Device Registry and Command Router will stay the same.

So, a question here would be if we want to keep KubernetesContainerUtil.getContainerId() as core functionality in hono-core (maybe also useful for other features in the future), or move it to the only place where it is really needed currently.
I tend to favor the latter approach, doing the small Kafka client id refactoring (the id there isn't that important anyway).

@sophokles73 WDYT?

@calohmn
Copy link
Contributor

calohmn commented Sep 20, 2023

I tend to favor the latter approach, doing the small Kafka client id refactoring (the id there isn't that important anyway).

I've created #3545 for that refactoring. In a subsequent PR, I would add the new K8s-API based getContainerId implementation then.

@sophokles73
Copy link
Contributor

sophokles73 commented Sep 21, 2023

For the Kafka client id, including the container id isn't strictly necessary. It could also be some other id that is unique for all containers of a pod (maybe containing a timestamp for better readability).

My understanding of the Kafka client ID is that it is used to determine membership in a consumer group, right?

The alternative would be refactoring the method to create the Kafka client id, using some other id there.

I guess using a simple UUID could also work here, because the consumer groups always consist of only one single consumer (a container in a pod), or am I mistaken?

@sophokles73
Copy link
Contributor

I tend to favor the latter approach, doing the small Kafka client id refactoring (the id there isn't that important anyway).

Yes, I agree.

@sophokles73 sophokles73 added this to the 2.5.0 milestone Sep 21, 2023
@calohmn
Copy link
Contributor

calohmn commented Sep 21, 2023

For the Kafka client id, including the container id isn't strictly necessary. It could also be some other id that is unique for all containers of a pod (maybe containing a timestamp for better readability).

My understanding of the Kafka client ID is that it is used to determine membership in a consumer group, right?

No, the client.id here is just a label to identify the client. We set it to a value containing the pod name, so as to make it obvious in which pod the client is used.
In general, the client id doesn't have to be unique. In fact, if client-quotas are used, you would use the same client id for each consumer in a consumer group for example.
IIRC, our objective to use unique client ids here was to make the clients distinguishable in logging/tracing data.

In terms of ids in association with a consumer group, there is the group.instance.id. This is used for static consumer group membership. This isn't used in Hono AFAIK.

The alternative would be refactoring the method to create the Kafka client id, using some other id there.

I guess using a simple UUID could also work here, because the consumer groups always consist of only one single consumer (a container in a pod), or am I mistaken?

Yes, I've used a UUID (portion) in connection with the pod name in #3545.

calohmn added a commit to calohmn/hono that referenced this issue Sep 22, 2023
Signed-off-by: Carsten Lohmann <carsten.lohmann@bosch.io>
calohmn added a commit to calohmn/hono that referenced this issue Sep 28, 2023
Signed-off-by: Carsten Lohmann <carsten.lohmann@bosch.io>
@sophokles73 sophokles73 moved this from To do to In progress in 2.5.0 Oct 7, 2023
@sophokles73 sophokles73 added this to To do in 2.3.2 via automation Oct 7, 2023
@sophokles73 sophokles73 added this to To do in 2.4.1 via automation Oct 7, 2023
calohmn added a commit to calohmn/hono that referenced this issue Oct 8, 2023
calohmn added a commit to calohmn/hono that referenced this issue Oct 9, 2023
2.5.0 automation moved this from In progress to Done Oct 9, 2023
2.3.2 automation moved this from To do to Done Oct 9, 2023
2.4.1 automation moved this from To do to Done Oct 9, 2023
@calohmn calohmn moved this from Done to In progress in 2.4.1 Oct 9, 2023
@calohmn calohmn moved this from Done to In progress in 2.3.2 Oct 9, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
calohmn added a commit that referenced this issue Oct 14, 2023
@calohmn calohmn moved this from In progress to Done in 2.3.2 Oct 14, 2023
@calohmn calohmn moved this from In progress to Done in 2.4.1 Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug C&C Command and Control
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants