Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] FullCleanTest.test_private_map_was_generated timeouts when running in container #3283

Open
pmoravec opened this issue Jun 22, 2023 · 5 comments

Comments

@pmoravec
Copy link
Contributor

When running avocado tests in a container(*), this test easily timeouts despite it has 10 minutes timeout (https://github.com/sosreport/sos/blob/main/tests/cleaner_tests/full_report/full_report_run.py#L25).

The main cause is sos report takes 8 minutes (while subsequent clean is supposed to run a few times longer, so even 20 minutes timeout might not be sufficient). We can increase the timeout as a defensive resolution, but .. to what value? Also does it make sense to optimise the run somehow? Since the most lengthy plugins are:

[stdlog] 2023-06-21 10:36:36,750 avocado.utils.process DEBUG| [stdout] [plugin:process] collected plugin 'process' in 79.25696086883545
[stdlog] 2023-06-21 10:39:20,913 avocado.utils.process DEBUG| [stdout] [plugin:system] collected plugin 'system' in 97.09700441360474
[stdlog] 2023-06-21 10:37:35,111 avocado.utils.process DEBUG| [stdout] [plugin:processor] collected plugin 'processor' in 123.39306426048279
[stdlog] 2023-06-21 10:39:20,920 avocado.utils.process DEBUG| [stdout] [plugin:selinux] collected plugin 'selinux' in 148.71357417106628
[stdlog] 2023-06-21 10:38:41,883 avocado.utils.process DEBUG| [stdout] [plugin:cgroups] collected plugin 'cgroups' in 341.3810544013977

(*) I think the fact sos runs in container vastly contributes to the duration of all those plugins (esp. cgroups).

Does it make sense to call this sos with option e.g. --plugin-timeout 60 (or maybe 90)? For the sake of cleaner testing, we are not much interested in files like /sys/fs/cgroup/cpuacct/system.slice/sys-kernel-config.mount/tasks (collecting this file took over 2 seconds alone).

@pmoravec
Copy link
Contributor Author

Optionally, we might have an env.variable (with current default) to customize the sos_timeout per an avocado run..? (but that does not answer my "too lengthy plugins" point).

@arif-ali
Copy link
Member

That's interesting, most of my testing happens on a container on my laptop, not seen any timeout issues like this so far. Albeit it's an LXD container and not podman/buildah/docker

@pmoravec
Copy link
Contributor Author

The "container blame" is just a theory as I dont exactly know the full environment where we noticed such timeouts. The lengthy plugins usually run much faster (esp. cgroups) and their execution time "scale up" with number of containers on the system, afaik.

@TurboTurtle
Copy link
Member

How are these potentially problematic containers launched, exactly? Containers are in most respects the same as running on bare metal, so this kind of performance drop is surprising.

That being said, cgroups taking longer makes sense if there are dozens or even hundreds of containers running, as each container will create a lot of new collections in the cgroups plugin - same for openshift, crio, etc... if the container logs are requested. But the ones like system, selinux, and process are surprising to see.

@pmoravec
Copy link
Contributor Author

pmoravec commented Jun 23, 2023

We are still investigating this, but we can make tests/report_tests/options_tests/options_tests.py:OptionsFromConfigTest much faster in general by skipping many plugins (or enabling just those we have a particular test case).

#3288 raised for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants