Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reloading CRI-O configuration using SIGHUP overrides settings passed through command-line arguments #7586

Open
kwilczynski opened this issue Dec 12, 2023 · 25 comments · May be fixed by #7771
Open
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@kwilczynski
Copy link
Member

What happened?

When the SIGHUP signal was sent to a currently running CRI-O process, it reloaded its configuration.

However, the CRI-O process was started with several command-line arguments, which were used to override default configuration options such as the log level or the pause image reference.

Following the request to CRI-O to reload its runtime configuration, the options specific via the command-line arguments were overridden with either a default or an item from the /etc/crio/crio.conf (or a drop-in from /etc/crio/crio.conf.d) file.

From this point onwards, the log level was changed, and the reference for the pause container image was updated to a hardcoded default.

What did you expect to happen?

The configuration to be reloaded with the options set through the applied command-line arguments to be preserved.

The precedence of configuration parsing should remain unchanged, and everything should also be merged accordingly, whether it was a fresh start of a runtime configuration reload request via the associated Linux signal delivered to the running process.

Additionally, the message that notifies the user about configuration reload should always be printed - this is not the case currently, as the log line itself is subjected to the presently set log level.

Preferably, it would also be nice if CRI-O had printed its runtime configuration on startup or even following a configuration reload, mainly to capture this type of detail in the log file for later reference (which can be helpful during troubleshooting). However, if the amount of logs this would produce is substantial, then perhaps the entire configuration will only be printed at the debug log level, and otherwise, only (so at the "info" level) difference to the default options would be printed - not sure if this would be desirable, though.

How can we reproduce it (as minimally and precisely as possible)?

Only a few steps are needed to reproduce this behaviour reliably:

  1. Prepare a configuration file drop-in to override the default log level (which is "info") to the level of warning (or "warn")
# cat > /etc/crio/crio.conf.d/99-log-level.conf
[crio.runtime]
log_level = "warn"
^D
  1. Start a new CRI-O process manually overriding the log level and setting it to the "debug" level
# crio --log-level debug
INFO[2023-12-12 22:13:11.794601654Z] Starting CRI-O, version: 1.29.0, git: 7380d8ff553b9b2ead8bf3173e8c1db0bc9004db(clean) 

(... log lines that follow omitted for brevity ...)

At this point, you can observe CRI-O process log level being set to "debug", and a rather substantial number of log lines being produced. Aside from the logs themselves, you can use the crio binary itself (assuming a modern version of CRI-O; otherwise, the crio-status can be used) to verify the current runtime configuration log level and what it's being set to, per:

# crio status config print | grep log_level
INFO[2023-12-12 22:08:57.099952659Z] Starting CRI-O, version: 1.28.1, git: unknown(clean) 
    log_level = "debug"
  1. Send the SIGHUP signal to the running CRI-O process in a different terminal session
$ kill -HUP $(pgrep crio) # Or using pkill -HUP crio, whichever is more convenient.

Please return your attention to the terminal session running the CRI-O process in the foreground. At this point, you can observe the following lines appearing amongst other messages:

(... a great many other log lines omitted for brevity ...)

DEBU[2023-12-12 22:13:13.762844491Z] received signal                               file="crio/main.go:57" signal=hangup
INFO[2023-12-12 22:13:13.763082980Z] Reloading configuration                       file="config/reload.go:18"
DEBU[2023-12-12 22:13:13.763491295Z] Using /sbin/apparmor_parser binary            file="supported/supported.go:61"
DEBU[2023-12-12 22:13:13.763702316Z] Using /sbin/apparmor_parser binary            file="supported/supported.go:61"
INFO[2023-12-12 22:13:13.764018888Z] Updating config from file /etc/crio/crio.conf  file="config/reload.go:27"
INFO[2023-12-12 22:13:13.764863975Z] Updating config from path /etc/crio/crio.conf.d  file="config/reload.go:36"
INFO[2023-12-12 22:13:13.765185045Z] Set config log_level to "warn"                file="config/reload.go:83"

Note the confirmation of signal reception and configuration reload. There is also a log line corresponding to a change in the log level, which is crucial to note.

Similarly to before, you can use the crio binary itself to obtain the current runtime configuration, per:

# crio status config print | grep log_level
INFO[2023-12-12 22:15:27.997584925Z] Starting CRI-O, version: 1.28.1, git: unknown(clean) 
    log_level = "warn"

This observed behaviour following a runtime configuration reload is not necessarily desirable.

Anything else we need to know?

Personally, I believe we should aim to preserve the order of precedence when reloading the runtime configuration. One could argue that "but, what when the user makes a change to the configuration file, and would like or override option set prior via a command-line arguments?" - well, it's a fair point. Nonetheless, what was set via the configuration arguments should remain unchanged and be the default behaviour. Otherwise, we risk violating the principle of least surprise for the user, which we should avoid if it can be helped.

The suggested order of preference when merging configuration would be (higher to lower):

  • Any command-line arguments that were passed
  • Environment variables set
  • Configuration files (including any files from the so-called "conf.d" directory where the drop-ins are to be stored)
  • Default configuration

Now, the "conf.d" directory also has its own rules of how files from there are to be read and in what order - a lot of users respect files to be sorted and read in some order, most likely lexicographic (sort with the "C" locale to remove ambiguity), such that file called "01-foo.conf" would be loaded before files such as "zzz_bar.conf" or "99-baz.conf", I suppose.

Much of this is already working as intended, of course.


The following code is relevant:

  • Main and runtime configuration types

type Config struct {
Comment string
singleConfigPath string // Path to the single config file
dropInConfigDir string // Path to the drop-in config files
RootConfig
APIConfig
RuntimeConfig
ImageConfig
NetworkConfig
MetricsConfig
TracingConfig
StatsConfig
NRI *nri.Config
SystemContext *types.SystemContext
}

cri-o/pkg/config/config.go

Lines 236 to 477 in 7380d8f

type RuntimeConfig struct {
// SeccompUseDefaultWhenEmpty specifies whether the default profile
// should be used when an empty one is specified.
// This option is currently deprecated, and will be replaced by the SeccompDefault FeatureGate in Kubernetes.
SeccompUseDefaultWhenEmpty bool `toml:"seccomp_use_default_when_empty"`
// NoPivot instructs the runtime to not use `pivot_root`, but instead use `MS_MOVE`
NoPivot bool `toml:"no_pivot"`
// SELinux determines whether or not SELinux is used for pod separation.
SELinux bool `toml:"selinux"`
// Whether container output should be logged to journald in addition
// to the kubernetes log file
LogToJournald bool `toml:"log_to_journald"`
// DropInfraCtr determines whether the infra container is dropped when appropriate.
DropInfraCtr bool `toml:"drop_infra_ctr"`
// ReadOnly run all pods/containers in read-only mode.
// This mode will mount tmpfs on /run, /tmp and /var/tmp, if those are not mountpoints
// Will also set the readonly flag in the OCI Runtime Spec. In this mode containers
// will only be able to write to volumes mounted into them
ReadOnly bool `toml:"read_only"`
// ConmonEnv is the environment variable list for conmon process.
// This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorEnv.
ConmonEnv []string `toml:"conmon_env"`
// HooksDir holds paths to the directories containing hooks
// configuration files. When the same filename is present in
// multiple directories, the file in the directory listed last in
// this slice takes precedence.
HooksDir []string `toml:"hooks_dir"`
// Capabilities to add to all containers.
DefaultCapabilities capabilities.Capabilities `toml:"default_capabilities"`
// AddInheritableCapabilities can be set to add inheritable capabilities. They were pre-1.23 by default, and were dropped in 1.24.
// This can cause a regression with non-root users not getting capabilities as they previously did.
AddInheritableCapabilities bool `toml:"add_inheritable_capabilities"`
// Additional environment variables to set for all the
// containers. These are overridden if set in the
// container image spec or in the container runtime configuration.
DefaultEnv []string `toml:"default_env"`
// Sysctls to add to all containers.
DefaultSysctls []string `toml:"default_sysctls"`
// DefaultUlimits specifies the default ulimits to apply to containers
DefaultUlimits []string `toml:"default_ulimits"`
// Devices that are allowed to be configured.
AllowedDevices []string `toml:"allowed_devices"`
// Devices to add to containers
AdditionalDevices []string `toml:"additional_devices"`
// CDISpecDirs specifies the directories CRI-O/CDI will scan for CDI Spec files.
CDISpecDirs []string `toml:"cdi_spec_dirs"`
// DeviceOwnershipFromSecurityContext changes the default behavior of setting container devices uid/gid
// from CRI's SecurityContext (RunAsUser/RunAsGroup) instead of taking host's uid/gid. Defaults to false.
DeviceOwnershipFromSecurityContext bool `toml:"device_ownership_from_security_context"`
// DefaultRuntime is the _name_ of the OCI runtime to be used as the default.
// The name is matched against the Runtimes map below.
DefaultRuntime string `toml:"default_runtime"`
// DecryptionKeysPath is the path where keys for image decryption are stored.
DecryptionKeysPath string `toml:"decryption_keys_path"`
// Conmon is the path to conmon binary, used for managing the runtime.
// This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorConfig.Path.
Conmon string `toml:"conmon"`
// ConmonCgroup is the cgroup setting used for conmon.
// This option is currently deprecated, and will be replaced with RuntimeHandler.MonitorConfig.Cgroup.
ConmonCgroup string `toml:"conmon_cgroup"`
// SeccompProfile is the seccomp.json profile path which is used as the
// default for the runtime.
SeccompProfile string `toml:"seccomp_profile"`
// ApparmorProfile is the apparmor profile name which is used as the
// default for the runtime.
ApparmorProfile string `toml:"apparmor_profile"`
// BlockIOConfigFile is the path to the blockio class configuration
// file for configuring the cgroup blockio controller.
BlockIOConfigFile string `toml:"blockio_config_file"`
// BlockIOReload instructs the runtime to reload blockio configuration
// rescan block devices in the system before assigning blockio parameters.
BlockIOReload bool `toml:"blockio_reload"`
// IrqBalanceConfigFile is the irqbalance service config file which is used
// for configuring irqbalance daemon.
IrqBalanceConfigFile string `toml:"irqbalance_config_file"`
// RdtConfigFile is the RDT config file used for configuring resctrl fs
RdtConfigFile string `toml:"rdt_config_file"`
// CgroupManagerName is the manager implementation name which is used to
// handle cgroups for containers.
CgroupManagerName string `toml:"cgroup_manager"`
// DefaultMountsFile is the file path for the default mounts to be mounted for the container
// Note, for testing purposes mainly
DefaultMountsFile string `toml:"default_mounts_file"`
// ContainerExitsDir is the directory in which container exit files are
// written to by conmon.
ContainerExitsDir string `toml:"container_exits_dir"`
// ContainerAttachSocketDir is the location for container attach sockets.
ContainerAttachSocketDir string `toml:"container_attach_socket_dir"`
// BindMountPrefix is the prefix to use for the source of the bind mounts.
BindMountPrefix string `toml:"bind_mount_prefix"`
// UIDMappings specifies the UID mappings to have in the user namespace.
// A range is specified in the form containerUID:HostUID:Size. Multiple
// ranges are separated by comma.
UIDMappings string `toml:"uid_mappings"`
// MinimumMappableUID specifies the minimum UID value which can be
// specified in a uid_mappings value, whether configured here or sent
// to us via CRI, for a pod that isn't to be run as UID 0.
MinimumMappableUID int64 `toml:"minimum_mappable_uid"`
// GIDMappings specifies the GID mappings to have in the user namespace.
// A range is specified in the form containerUID:HostUID:Size. Multiple
// ranges are separated by comma.
GIDMappings string `toml:"gid_mappings"`
// MinimumMappableGID specifies the minimum GID value which can be
// specified in a gid_mappings value, whether configured here or sent
// to us via CRI, for a pod that isn't to be run as UID 0.
MinimumMappableGID int64 `toml:"minimum_mappable_gid"`
// LogLevel determines the verbosity of the logs based on the level it is set to.
// Options are fatal, panic, error (default), warn, info, debug, and trace.
LogLevel string `toml:"log_level"`
// LogFilter specifies a regular expression to filter the log messages
LogFilter string `toml:"log_filter"`
// NamespacesDir is the directory where the state of the managed namespaces
// gets tracked
NamespacesDir string `toml:"namespaces_dir"`
// PinNSPath is the path to find the pinns binary, which is needed
// to manage namespace lifecycle
PinnsPath string `toml:"pinns_path"`
// CriuPath is the path to find the criu binary, which is needed
// to checkpoint and restore containers
EnableCriuSupport bool `toml:"enable_criu_support"`
// Runtimes defines a list of OCI compatible runtimes. The runtime to
// use is picked based on the runtime_handler provided by the CRI. If
// no runtime_handler is provided, the runtime will be picked based on
// the level of trust of the workload.
Runtimes Runtimes `toml:"runtimes"`
// Workloads defines a list of workloads types that are have grouped settings
// that will be applied to containers.
Workloads Workloads `toml:"workloads"`
// PidsLimit is the number of processes each container is restricted to
// by the cgroup process number controller.
PidsLimit int64 `toml:"pids_limit"`
// LogSizeMax is the maximum number of bytes after which the log file
// will be truncated. It can be expressed as a human-friendly string
// that is parsed to bytes.
// Negative values indicate that the log file won't be truncated.
LogSizeMax int64 `toml:"log_size_max"`
// CtrStopTimeout specifies the time to wait before to generate an
// error because the container state is still tagged as "running".
CtrStopTimeout int64 `toml:"ctr_stop_timeout"`
// SeparatePullCgroup specifies whether an image pull must be performed in a separate cgroup
SeparatePullCgroup string `toml:"separate_pull_cgroup"`
// InfraCtrCPUSet is the CPUs set that will be used to run infra containers
InfraCtrCPUSet string `toml:"infra_ctr_cpuset"`
// SharedCPUSet is the CPUs set that will be used for guaranteed containers that
// want access to shared cpus.
SharedCPUSet string `toml:"shared_cpuset"`
// AbsentMountSourcesToReject is a list of paths that, when absent from the host,
// will cause a container creation to fail (as opposed to the current behavior of creating a directory).
AbsentMountSourcesToReject []string `toml:"absent_mount_sources_to_reject"`
// EnablePodEvents specifies if the container pod-level events should be generated to optimize the PLEG at Kubelet.
EnablePodEvents bool `toml:"enable_pod_events"`
// IrqBalanceConfigRestoreFile is the irqbalance service banned CPU list to restore.
// If empty, no restoration attempt will be done.
IrqBalanceConfigRestoreFile string `toml:"irqbalance_config_restore_file"`
// seccompConfig is the internal seccomp configuration
seccompConfig *seccomp.Config
// apparmorConfig is the internal AppArmor configuration
apparmorConfig *apparmor.Config
// blockioConfig is the internal blockio configuration
blockioConfig *blockio.Config
// rdtConfig is the internal Rdt configuration
rdtConfig *rdt.Config
// ulimitConfig is the internal ulimit configuration
ulimitsConfig *ulimits.Config
// deviceConfig is the internal additional devices configuration
deviceConfig *device.Config
// cgroupManager is the internal CgroupManager configuration
cgroupManager cgmgr.CgroupManager
// conmonManager is the internal ConmonManager configuration
conmonManager *conmonmgr.ConmonManager
// namespaceManager is the internal NamespaceManager configuration
namespaceManager *nsmgr.NamespaceManager
// Whether SELinux should be disabled within a pod,
// when it is running in the host network namespace
// https://github.com/cri-o/cri-o/issues/5501
HostNetworkDisableSELinux bool `toml:"hostnetwork_disable_selinux"`
// Option to disable hostport mapping in CRI-O
// Default value is 'false'
DisableHostPortMapping bool `toml:"disable_hostport_mapping"`
}

  • Configuration reload handler

cri-o/server/server.go

Lines 583 to 600 in 7380d8f

func (s *Server) startReloadWatcher(ctx context.Context) {
// Setup the signal notifier
ch := make(chan os.Signal, 1)
signal.Notify(ch, signals.Hup)
go func() {
for {
// Block until the signal is received
<-ch
if err := s.config.Reload(); err != nil {
logrus.Errorf("Unable to reload configuration: %v", err)
continue
}
}
}()
log.Infof(ctx, "Registered SIGHUP reload watcher")
}

  • The Config type concrete Reload() function

func (c *Config) Reload() error {
logrus.Infof("Reloading configuration")
// Reload the config
newConfig, err := DefaultConfig()
if err != nil {
return fmt.Errorf("unable to create default config")
}
if _, err := os.Stat(c.singleConfigPath); !os.IsNotExist(err) {
logrus.Infof("Updating config from file %s", c.singleConfigPath)
if err := newConfig.UpdateFromFile(c.singleConfigPath); err != nil {
return err
}
} else {
logrus.Infof("Skipping not-existing config file %q", c.singleConfigPath)
}
if _, err := os.Stat(c.dropInConfigDir); !os.IsNotExist(err) {
logrus.Infof("Updating config from path %s", c.dropInConfigDir)
if err := newConfig.UpdateFromPath(c.dropInConfigDir); err != nil {
return err
}
} else {
logrus.Infof("Skipping not-existing config path %q", c.dropInConfigDir)
}
// Reload all available options
if err := c.ReloadLogLevel(newConfig); err != nil {
return err
}
if err := c.ReloadLogFilter(newConfig); err != nil {
return err
}
if err := c.ReloadPauseImage(newConfig); err != nil {
return err
}
c.ReloadPinnedImages(newConfig)
if err := c.ReloadRegistries(); err != nil {
return err
}
c.ReloadDecryptionKeyConfig(newConfig)
if err := c.ReloadSeccompProfile(newConfig); err != nil {
return err
}
if err := c.ReloadAppArmorProfile(newConfig); err != nil {
return err
}
if err := c.ReloadBlockIOConfig(newConfig); err != nil {
return err
}
if err := c.ReloadRdtConfig(newConfig); err != nil {
return err
}
if err := c.ReloadRuntimes(newConfig); err != nil {
return err
}
cdi.GetRegistry(cdi.WithSpecDirs(newConfig.CDISpecDirs...))
return nil
}

  • The function that handles log level reload and set-up

cri-o/pkg/config/reload.go

Lines 88 to 102 in 7380d8f

func (c *Config) ReloadLogLevel(newConfig *Config) error {
if c.LogLevel != newConfig.LogLevel {
level, err := logrus.ParseLevel(newConfig.LogLevel)
if err != nil {
return err
}
// Always log this message without considering the current
logrus.SetLevel(logrus.InfoLevel)
logConfig("log_level", newConfig.LogLevel)
logrus.SetLevel(level)
c.LogLevel = newConfig.LogLevel
}
return nil
}

CRI-O and Kubernetes version

$ crio --version
crio version 1.29.0
Version:        1.29.0
GitCommit:      7380d8ff553b9b2ead8bf3173e8c1db0bc9004db
GitCommitDate:  2023-12-12T19:22:16Z
GitTreeState:   clean
BuildDate:      2023-12-12T21:54:46Z
GoVersion:      go1.21.5
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  containers_image_ostree_stub
  libdm_no_deferred_remove
  containers_image_openpgp
  seccomp
  selinux
  apparmor
  static
  netgo
  exclude_graphdriver_btrfs
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  true

OS version

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux worker-node01 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

None.
@kwilczynski kwilczynski added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2023
@kwilczynski
Copy link
Member Author

/assign kwilczynski

@saschagrunert
Copy link
Member

@kwilczynski thank you for the report. I don't think we need this to be part of the feature roadmap since it's a bug.

@kwilczynski
Copy link
Member Author

@kwilczynski thank you for the report. I don't think we need this to be part of the feature roadmap since it's a bug.

@saschagrunert, ah oops. Sorry, I did it out of habit - to track work. 😄

Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2024
@kwilczynski
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 15, 2024
@kwilczynski
Copy link
Member Author

/help
/good-first-issue

Copy link
Contributor

openshift-ci bot commented Jan 22, 2024

@kwilczynski:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/help
/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 22, 2024
@kwilczynski kwilczynski removed their assignment Jan 22, 2024
@SD-13
Copy link

SD-13 commented Jan 30, 2024

I would like to try to fix it
/assign

@kwilczynski
Copy link
Member Author

@SD-13, thank you!

Let me know if you need more information or if something is not clear. Happy to help! 😄

@SD-13
Copy link

SD-13 commented Feb 1, 2024

Hey @kwilczynski can you help me in reproducing the issue. Currently, I am trying to understand where Crio fits in the whole picture. From my understanding it seems like Crio works as a runtime interface (similar to docker daemon?) and on top of Crio we can run any OCI based runtimes like runc. So, first we need to create pods in Kubernetes using crio and a runtime for e.g runc. Then we can follow your mentioned steps. Please feel free to correct me if I am wrong and share any resource to learn more about it. Thanks!

@kwilczynski
Copy link
Member Author

@SD-13 , the issue is very easy to reproduce.

You also don't need to deploy an entire Kubernetes cluster—all that is required is a CRI-O binary that you can build using this repository. There will be some build-time dependencies required that need to be installed (this is documented here: Build and install CRI-O from source).

When you can build a binary locally (or inside a container or a virtual machine), following the steps to reproduce the problem should be straightforward. But do let me know if you need help and such.

But first, as you are contributing for the first time, have a look at the following, as these might help to get you up to speed with several concepts:

Also, about containers, should you need a more deep-dive refresher:

@SD-13
Copy link

SD-13 commented Feb 4, 2024

@kwilczynski I have built and installed CRI-O binary. But still having problem to reproduce the issue
I added a config file with "warn" log-level and I can see that it is using the log-level value from the config, not the default (info log-level). But when I am passing the flag as a command line argument, the value is not changing.
Here are the logs.

$ sudo crio --log-level debug
[sudo] password for jack:
INFO[2024-02-05 02:58:38.110081631+05:30] Starting CRI-O, version: 1.29.0, git: 2d9351f(clean)
INFO[2024-02-05 02:58:38.115197197+05:30] Node configuration value for hugetlb cgroup is true
INFO[2024-02-05 02:58:38.115207497+05:30] Node configuration value for pid cgroup is true
INFO[2024-02-05 02:58:38.115265728+05:30] Node configuration value for memoryswap cgroup is true
INFO[2024-02-05 02:58:38.115274425+05:30] Node configuration value for cgroup v2 is true
INFO[2024-02-05 02:58:38.125697284+05:30] Node configuration value for systemd AllowedCPUs is true
DEBU[2024-02-05 02:58:38.125824949+05:30] [graphdriver] trying provided driver "overlay" file="drivers/driver.go:362"
DEBU[2024-02-05 02:58:38.126014954+05:30] Cached value indicated that overlay is supported file="overlay/overlay.go:247"
DEBU[2024-02-05 02:58:38.126078245+05:30] Cached value indicated that overlay is supported file="overlay/overlay.go:247"
DEBU[2024-02-05 02:58:38.126124084+05:30] Cached value indicated that metacopy is being used file="overlay/overlay.go:406"
DEBU[2024-02-05 02:58:38.126273951+05:30] Cached value indicated that native-diff is not being used file="overlay/overlay.go:800"
INFO[2024-02-05 02:58:38.126305843+05:30] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled file="overlay/overlay.go:801"
DEBU[2024-02-05 02:58:38.126344837+05:30] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true file="overlay/overlay.go:467"
INFO[2024-02-05 02:58:38.126478614+05:30] Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL file="capabilities/capabilities_linux.go:38"
DEBU[2024-02-05 02:58:38.126549220+05:30] Using runtime executable from $PATH "/usr/bin/runc" file="config/config.go:1516"
DEBU[2024-02-05 02:58:38.126596240+05:30] Found valid runtime "runc" for runtime_path "/usr/bin/runc" file="config/config.go:1528"
DEBU[2024-02-05 02:58:38.126637981+05:30] Allowed annotations for runtime: [io.containers.trace-syscall io.kubernetes.cri-o.Devices] file="config/config.go:1563"
DEBU[2024-02-05 02:58:38.136412815+05:30] Loading registries configuration "/etc/containers/registries.conf" file="sysregistriesv2/system_registries_v2.go:926"
DEBU[2024-02-05 02:58:38.136678405+05:30] Loading registries configuration "/etc/containers/registries.conf.d/00-shortnames.conf" file="sysregistriesv2/system_registries_v2.go:926"
DEBU[2024-02-05 02:58:38.139211750+05:30] Using hooks directory: /usr/share/containers/oci/hooks.d file="config/config.go:1129"
DEBU[2024-02-05 02:58:38.139348012+05:30] Using pinns from $PATH: /usr/local/bin/pinns file="config/config.go:1395"
INFO[2024-02-05 02:58:38.139405252+05:30] Checkpoint/restore support disabled file="config/config.go:1154"
INFO[2024-02-05 02:58:38.139446571+05:30] Using seccomp default profile when unspecified: true file="seccomp/seccomp.go:99"
INFO[2024-02-05 02:58:38.139487229+05:30] Using the internal default seccomp profile file="seccomp/seccomp.go:152"
INFO[2024-02-05 02:58:38.139510223+05:30] AppArmor is disabled by the system or at CRI-O build-time file="apparmor/apparmor_linux.go:34"
INFO[2024-02-05 02:58:38.139538618+05:30] No blockio config file specified, blockio not configured file="blockio/blockio.go:74"
INFO[2024-02-05 02:58:38.139563857+05:30] RDT not available in the host system file="rdt/rdt.go:56"
DEBU[2024-02-05 02:58:38.139603483+05:30] Using conmon from $PATH: /usr/bin/conmon file="config/config.go:1395"
INFO[2024-02-05 02:58:38.143933790+05:30] Conmon does support the --sync option file="conmonmgr/conmonmgr.go:85"
INFO[2024-02-05 02:58:38.143990549+05:30] Conmon does support the --log-global-size-max option file="conmonmgr/conmonmgr.go:71"
INFO[2024-02-05 02:58:38.144095370+05:30] Updated default CNI network name to file="ocicni/ocicni.go:375"
FATA[2024-02-05 02:58:38.144237824+05:30] validating api config: already existing connection on /var/run/crio/crio.sock file="crio/main.go:458"

Does this last line mean, I need to remove the existing connection before creating one? Please tell what it is trying to mean.

Start a new CRI-O process manually overriding the log level and setting it to the "debug" level

Am I doing this step correctly? What you mean by starting a new CRI-O process, and how should I do that?

@kwilczynski
Copy link
Member Author

@SD-13, I am glad you were able to build CRI-O locally, and got it running!

By the way, you don't need to use sudo to start CRI-O for a brief moment locally–you do need pinns in either in the current PATH (environment variable) or pointed at a location where the binaries are via the configuration file. The PATH route is easier as it only required you to temporarily prepend the local "bin" directory from within CRI-O code base to the beginning of your current PATH (to update the search path), since building Go binaries using the included Makefile (so, make binaries) would also trigger building C binaries (pinns etc.), so export PATH=${PWD}/bin:${PATH} executed from within CRI-O repository would be sufficient–no need for the sudo make install and such.

Going back to your issues:

FATA[2024-02-05 02:58:38.144237824+05:30] validating api config: already existing connection on /var/run/crio/crio.sock > file="crio/main.go:458"

Does this last line mean, I need to remove the existing connection before creating one? Please tell what it is trying to mean.

This means that there is a CRI-O (as crio) process already running and listening on this Unix socket. You might have gotten it started elsewhere.

Start a new CRI-O process manually overriding the log level and setting it to the "debug" level

Am I doing this step correctly? What you mean by starting a new CRI-O process, and how should I do that?

It means exactly that 😄 Start a new process manually. So, the reproduction steps assume that you don't have anything else running at the moment, and you will launch CRI-O locally (manually) to test things.

Have a look at the outlined steps again. You are very close to being able to reproduce the issue.

@SD-13
Copy link

SD-13 commented Feb 7, 2024

@kwilczynski Thanks! I was able to reproduce the issue. But I needed to use sudo to start crio and to check status. Running without sudo privilege was giving errors like Operation not permitted.

@kwilczynski
Copy link
Member Author

@kwilczynski Thanks! I was able to reproduce the issue.

@SD-13, does it make sense about what is going on?

@SD-13
Copy link

SD-13 commented Feb 7, 2024

@kwilczynski I would say yes but that is a bit high level, I want to understand more closely. Right now I am going through the codes you pointed out and understanding them. Initially it seems like I need to find where to look for already passed cmd line args and if they were passed, load them instead of the values from the config.

@SD-13
Copy link

SD-13 commented Feb 8, 2024

@kwilczynski Just want to make sure I understand it correctly, the tasks we have here are

  • Follow the below order from top to bottom before reloading configuration in pkg/config/reload.go reload function.
  1. Any command-line arguments that were
  2. Environment variables set
  3. Configuration files (including any files from the so-called "conf.d" directory where the drop-ins are to be stored)
  4. Default configuration
  • Print message about the change. Maybe just a diff of old and new value and highlight that. Currently, we are printing messages like Set config log_level to "warn", what do you think about highlight this message.
  • Printing config when fresh start or reload.

I think the last two tasks can be similar. I think printing the raw config even when the debug log level is set make will make it less readable. What do you think about printing the config by filtering out the comments and only when the debug level is set?
Please feel free to correct me. Thanks!

@kwilczynski
Copy link
Member Author

@SD-13, the main focus here would be to fix the problem with how runtime configuration is reloaded so that options are applied correctly per the suggested precedence order.

If the user passes --log-level info, even though the configuration file says that it should be set to debug, then the final runtime configuration should have the log level set to info since command-line arguments have the highest precedence.

Everything else would be nice to have and can be added later once the main problem has been fixed.

@SD-13 SD-13 linked a pull request Feb 12, 2024 that will close this issue
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2024
@SD-13 SD-13 removed their assignment Mar 14, 2024
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2024
@kwilczynski
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2024
@kwilczynski kwilczynski removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2024
@LenkaSeg
Copy link
Contributor

I would be interested in taking this issue.

@LenkaSeg
Copy link
Contributor

/assign

@kwilczynski
Copy link
Member Author

@LenkaSeg, thank you! Let me know if there is anything that would need to be clarified. Happy to help 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants