-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reloading CRI-O configuration using SIGHUP overrides settings passed through command-line arguments #7586
Comments
/assign kwilczynski |
@kwilczynski thank you for the report. I don't think we need this to be part of the feature roadmap since it's a bug. |
@saschagrunert, ah oops. Sorry, I did it out of habit - to track work. 😄 |
A friendly reminder that this issue had no activity for 30 days. |
/remove-lifecycle stale |
/help |
@kwilczynski: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I would like to try to fix it |
@SD-13, thank you! Let me know if you need more information or if something is not clear. Happy to help! 😄 |
Hey @kwilczynski can you help me in reproducing the issue. Currently, I am trying to understand where Crio fits in the whole picture. From my understanding it seems like Crio works as a runtime interface (similar to docker daemon?) and on top of Crio we can run any OCI based runtimes like runc. So, first we need to create pods in Kubernetes using crio and a runtime for e.g runc. Then we can follow your mentioned steps. Please feel free to correct me if I am wrong and share any resource to learn more about it. Thanks! |
@SD-13 , the issue is very easy to reproduce. You also don't need to deploy an entire Kubernetes cluster—all that is required is a CRI-O binary that you can build using this repository. There will be some build-time dependencies required that need to be installed (this is documented here: Build and install CRI-O from source). When you can build a binary locally (or inside a container or a virtual machine), following the steps to reproduce the problem should be straightforward. But do let me know if you need help and such. But first, as you are contributing for the first time, have a look at the following, as these might help to get you up to speed with several concepts: Also, about containers, should you need a more deep-dive refresher: |
@kwilczynski I have built and installed CRI-O binary. But still having problem to reproduce the issue
Does this last line mean, I need to remove the existing connection before creating one? Please tell what it is trying to mean.
Am I doing this step correctly? What you mean by starting a new CRI-O process, and how should I do that? |
@SD-13, I am glad you were able to build CRI-O locally, and got it running! By the way, you don't need to use Going back to your issues:
This means that there is a CRI-O (as
It means exactly that 😄 Start a new process manually. So, the reproduction steps assume that you don't have anything else running at the moment, and you will launch CRI-O locally (manually) to test things. Have a look at the outlined steps again. You are very close to being able to reproduce the issue. |
@kwilczynski Thanks! I was able to reproduce the issue. But I needed to use sudo to start crio and to check status. Running without sudo privilege was giving errors like |
@SD-13, does it make sense about what is going on? |
@kwilczynski I would say yes but that is a bit high level, I want to understand more closely. Right now I am going through the codes you pointed out and understanding them. Initially it seems like I need to find where to look for already passed cmd line args and if they were passed, load them instead of the values from the config. |
@kwilczynski Just want to make sure I understand it correctly, the tasks we have here are
I think the last two tasks can be similar. I think printing the raw config even when the |
@SD-13, the main focus here would be to fix the problem with how runtime configuration is reloaded so that options are applied correctly per the suggested precedence order. If the user passes Everything else would be nice to have and can be added later once the main problem has been fixed. |
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
/remove-lifecycle stale |
A friendly reminder that this issue had no activity for 30 days. |
I would be interested in taking this issue. |
/assign |
@LenkaSeg, thank you! Let me know if there is anything that would need to be clarified. Happy to help 😄 |
What happened?
When the SIGHUP signal was sent to a currently running CRI-O process, it reloaded its configuration.
However, the CRI-O process was started with several command-line arguments, which were used to override default configuration options such as the log level or the pause image reference.
Following the request to CRI-O to reload its runtime configuration, the options specific via the command-line arguments were overridden with either a default or an item from the /etc/crio/crio.conf (or a drop-in from /etc/crio/crio.conf.d) file.
From this point onwards, the log level was changed, and the reference for the pause container image was updated to a hardcoded default.
What did you expect to happen?
The configuration to be reloaded with the options set through the applied command-line arguments to be preserved.
The precedence of configuration parsing should remain unchanged, and everything should also be merged accordingly, whether it was a fresh start of a runtime configuration reload request via the associated Linux signal delivered to the running process.
Additionally, the message that notifies the user about configuration reload should always be printed - this is not the case currently, as the log line itself is subjected to the presently set log level.
Preferably, it would also be nice if CRI-O had printed its runtime configuration on startup or even following a configuration reload, mainly to capture this type of detail in the log file for later reference (which can be helpful during troubleshooting). However, if the amount of logs this would produce is substantial, then perhaps the entire configuration will only be printed at the debug log level, and otherwise, only (so at the "info" level) difference to the default options would be printed - not sure if this would be desirable, though.
How can we reproduce it (as minimally and precisely as possible)?
Only a few steps are needed to reproduce this behaviour reliably:
# crio --log-level debug
INFO[2023-12-12 22:13:11.794601654Z] Starting CRI-O, version: 1.29.0, git: 7380d8ff553b9b2ead8bf3173e8c1db0bc9004db(clean)
(... log lines that follow omitted for brevity ...)
At this point, you can observe CRI-O process log level being set to "debug", and a rather substantial number of log lines being produced. Aside from the logs themselves, you can use the
crio
binary itself (assuming a modern version of CRI-O; otherwise, thecrio-status
can be used) to verify the current runtime configuration log level and what it's being set to, per:Please return your attention to the terminal session running the CRI-O process in the foreground. At this point, you can observe the following lines appearing amongst other messages:
(... a great many other log lines omitted for brevity ...)
Note the confirmation of signal reception and configuration reload. There is also a log line corresponding to a change in the log level, which is crucial to note.
Similarly to before, you can use the
crio
binary itself to obtain the current runtime configuration, per:This observed behaviour following a runtime configuration reload is not necessarily desirable.
Anything else we need to know?
Personally, I believe we should aim to preserve the order of precedence when reloading the runtime configuration. One could argue that "but, what when the user makes a change to the configuration file, and would like or override option set prior via a command-line arguments?" - well, it's a fair point. Nonetheless, what was set via the configuration arguments should remain unchanged and be the default behaviour. Otherwise, we risk violating the principle of least surprise for the user, which we should avoid if it can be helped.
The suggested order of preference when merging configuration would be (higher to lower):
Now, the "conf.d" directory also has its own rules of how files from there are to be read and in what order - a lot of users respect files to be sorted and read in some order, most likely lexicographic (sort with the "C" locale to remove ambiguity), such that file called "01-foo.conf" would be loaded before files such as "zzz_bar.conf" or "99-baz.conf", I suppose.
Much of this is already working as intended, of course.
The following code is relevant:
cri-o/pkg/config/config.go
Lines 65 to 80 in 7380d8f
cri-o/pkg/config/config.go
Lines 236 to 477 in 7380d8f
cri-o/server/server.go
Lines 583 to 600 in 7380d8f
cri-o/pkg/config/reload.go
Lines 17 to 77 in 7380d8f
cri-o/pkg/config/reload.go
Lines 88 to 102 in 7380d8f
CRI-O and Kubernetes version
OS version
Additional environment details (AWS, VirtualBox, physical, etc.)
The text was updated successfully, but these errors were encountered: