-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forensic Container Checkpointing #2008
Comments
/sig node |
We recommend actively socializing your KEP with the appropriate sig to gain visibility, consensus and also for scheduling. Also as you are not sure of what SIG will sponsor this, reaching out to the SIGs to get clarity on that will be helpful to move your KEP forward. |
Hi @adrianreber Any updates on whether this will be included in 1.20? Enhancements Freeze is October 6th and by that time we require: The KEP must be merged in an implementable state Best, |
Hello @kikisdeliveryservice
Sorry, but how would I decide this? There has not been a lot of feedback on the corresponding KEP which makes it really difficult for me to answer that question. On the other hand, maybe the missing feedback is a good sign that it will take some more time. So probably this will not be included in 1.20. |
Normally the sig would give a clear signal that it would be included. That would be by : reviewing the KEP, agreeing to the milestone proposals in the KEP etc.. I'd encourage you to keep in touch with them and start the 1.21 conversation early if this does not end up getting reviewed/merged properly by October 6th. Best, |
@kikisdeliveryservice Thanks for the guidance. Will do. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@adrianreber: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale Still working on it. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Hi @adrianreber, I created a simple microservice pod and tried to migrate it. I found that when I just started it and used a counter-like function, I was able to checkpoint it, but when I used it to connect to the message broker and send the message, checkpointing it will raise the following error, is there any way to solve it? checkpointed: checkpointing of default/order-service-7c69b4d88b-n56xq/order-service failed running "/usr/local/bin/runc" failed: failed: time="2023-10-24T16:40:11Z" |
@Tobeabellwether Please open a bug at CRI-O with the |
@adrianreber Thanks for the tip, I checked that (00.134562) Error (criu/sk-inet.c:191): inet: Connected TCP socket, consider using --tcp-established option. So, I tried to forcefully interrupt the TCP connection between the pod and the message broker, and the checkpoint was successfully created. Should this still be considered a bug? |
Ah, good to know. No, this is not a real bug. We probably at some point need the ability to pass down different parameters from Kubernetes to CRIU. But this is something for the far future. You can also control CRIU options with a CRIU configuration file. Handling TCP connections that are established could be configured there. |
Hi @adrianreber again, when I try to restore the checkpoint of the pod with TCP connection on a new pod, I encounter a new problem: So I try to check the log file under and the restoration for pods without TCP connections works fine. |
@Tobeabellwether You can redirect the CRIU log file to another file using the CRIU configuration file: |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/stage beta |
Hello @adrianreber 👋, v1.30 Enhancements team here. Just checking in as we approach enhancements freeze on 02:00 UTC Friday 9th February 2024. This enhancement is targeting for stage Here's where this enhancement currently stands:
Everything is done in #4288 and #4305. Please make sure that these PRs are merged before the enhancements freeze. The status of this enhancement is marked as |
Hello 👋, v.130 Enhancements team here. Unfortunately, this enhancement did not meet requirements for enhancements freeze. #4288 is merged, but #4305 is still open. Please file an exception request to get this PR merged. If you still wish to progress this enhancement in v1.30, please file an exception request. Thanks! |
/milestone clear |
I have the same problem, have you solved it? |
I've read CRIU's doc, I think checkpointing Container with data in and out is in general tricky and not safe, so I just always close its connection before I checkpointing. |
Kubernetes 1.25 introduced the possibility to checkpoint a container. For details please see the KEP 2008 Forensic Container Checkpointing kubernetes/enhancements#2008 The initial implementation only provided a kubelet API endpoint to trigger a checkpoint. The main reason for not extending it to the API server and kubectl was that checkpointing is a completely new concept. Although the result of the checkpointing, the checkpoint archive, is only accessible by root it is important to remember that it contains all memory pages and thus all possible passwords, private keys and random numbers. With the checkpoint archive being only accessible by root it does not directly make it easier to access this potentially confidential information as root would be able to retrieve that information anyway. Now, at least three Kubernetes releases later, we have not heard any negative feedback about the checkpoint archive and its data. There were, however, many questions to be able to create a checkpoint via kubectl and not just via the kubelet API endpoint. This commit adds 'checkpoint' support to kubectl. The 'checkpoint' command is heavily influenced by the code of the 'exec' and 'logs' command. The checkpoint command is only available behind the 'alpha' sub-command as the "Forensic Container Checkpointing" KEP is still marked as Alpha. Example output: $ kubectl alpha checkpoint test-pod -c container-2 Node: 127.0.0.1/127.0.0.1 Namespace: default Pod: test-pod-1 Container: container-2 Checkpoint Archive: /var/lib/kubelet/checkpoints/checkpoint-archive.tar The tests are implemented that they handle a CRI implementation with and without a implementation of the CRI RPC call 'ContainerCheckpoint'. Signed-off-by: Adrian Reber <areber@redhat.com>
Hi @adrianreber 👋, 1.31 Enhancements Lead here. If you wish to progress this enhancement in v1.31, please have the SIG lead opt-in your enhancement by adding the lead-opted-in label and set the milestone to v1.31 before the Production Readiness Review Freeze. /remove-label lead-opted-in |
Enhancement Description
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):The text was updated successfully, but these errors were encountered: