Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune: limit concurrency based on file descriptors #3842

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bk2204
Copy link
Member

@bk2204 bk2204 commented Sep 30, 2019

When there's a small number of file descriptors, we can exhaust them all during a prune, causing it to fail. This is common on macOS, which has a limit of 256 file descriptors by default. Use a heuristic to allocate FDs based on the limit, and on Windows, which does not support gerlimit(2), use a heuristic depending on the number of CPUs, as before.

Fixes #3752.

@bk2204 bk2204 marked this pull request as ready for review October 3, 2019 16:55
@bk2204 bk2204 requested a review from a team October 3, 2019 16:56
Copy link
Contributor

@PastelMobileSuit PastelMobileSuit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Looks good!

When there's a small number of file descriptors, we can exhaust them all
during a prune, causing it to fail.  This is common on macOS, which has
a limit of 256 file descriptors by default.  Use a heuristic to allocate
FDs based on the limit, and on Windows, which does not support
gerlimit(2), use a heuristic depending on the number of CPUs, as before.
if maxThreads < 1 {
return 1
}
return maxThreads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be min(maxThreads, runtime.NumCPU() * 2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the old heuristic was to run twice as many goroutines as CPUs. We can spawn arbitrarily many goroutines and the Go scheduler will only run as many as there are threads to run them (which is usually the number of CPUs).

The only limiting factor that's required here is file descriptors; the actual CPU usage is handled by the scheduler. We retain the old check just to handle cases where we can't get a good value.

return int64(runtime.NumCPU() * 2)
}
// Assume 64 FDs for each operation and leave 64 for other purposes.
maxThreads := (int64(fds) - 64) / 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to set unlimited file descriptors? If yes, will this math behave properly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed possible to set the value to RLIM_INFINITY. Assuming that's a positive and large number, that will work fine. If it's a negative number, it will limit it to 1 thread, which, while not ideal, will also work. I believe it's a positive and large number most places.

@bk2204
Copy link
Member Author

bk2204 commented Oct 17, 2019

The reason this hasn't merged is that there's some evidence that we're not actually limiting the number of processes to the number of items mentioned in the pull request, and I need to look more into why that is. Adjusting the threshold isn't going to be effective if it isn't an effective control on the number of processes we spawn.

Base automatically changed from master to main February 2, 2021 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prune fails: too many files open
3 participants