Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate concurrency limits #3493

Merged
merged 3 commits into from
May 10, 2024

Conversation

ibraheemdev
Copy link
Member

@ibraheemdev ibraheemdev commented May 9, 2024

Summary

This PR consolidates the concurrency limits used throughout uv and exposes two limits, UV_CONCURRENT_DOWNLOADS and UV_CONCURRENT_BUILDS, as environment variables.

Currently, uv has a number of concurrent streams that it buffers using relatively arbitrary limits for backpressure. However, many of these limits are conflated. We run a relatively small number of tasks overall and should start most things as soon as possible. What we really want to limit are three separate operations:

  • File I/O. This is managed by tokio's blocking pool and we should not really have to worry about it.
  • Network I/O.
  • Python build processes.

Because the current limits span a broad range of tasks, it's possible that a limit meant for network I/O is occupied by tasks performing builds, reading from the file system, or even waiting on a OnceMap. We also don't limit build processes that end up being required to perform a download. While this may not pose a performance problem because our limits are relatively high, it does mean that the limits do not do what we want, making it tricky to expose them to users (#1205, #3311).

After this change, the limits on network I/O and build processes are centralized and managed by semaphores. All other tasks are unbuffered (note that these tasks are still bounded, so backpressure should not be a problem).


The terminology here is a little weird. From #3311, what we want is to expose a way to limit the overall concurrency of downloads. However, for builds, we're really only limiting the number of parallel python processes we spawn. Technically those two are different, but I think sticking to one of concurrency vs. parallelism would be less confusing. Maybe these should be UV_PARALLEL_* instead?

Test Plan

@ibraheemdev
Copy link
Member Author

There is one seemingly important limit left in https://github.com/astral-sh/uv/blob/main/crates/uv-resolver/src/resolver/mod.rs#L323, from #1163. In practice this bound is quite large compared to our concurrent downloads and should not be a source of contention, it's mainly there to allow us to use a bounded channel and avoid starving the prefetcher.

@zanieb
Copy link
Member

zanieb commented May 9, 2024

Regarding parallel vs concurrent: I don't think that's important as a user-facing detail. "CONCURRENT" is fine.

@ibraheemdev
Copy link
Member Author

I agree, I was just wondering which one would be more intuitive, regardless of which is technically correct.

@ibraheemdev ibraheemdev force-pushed the concurrency-limits branch 2 times, most recently from 3efeab5 to 21f83db Compare May 9, 2024 18:47
@zanieb
Copy link
Member

zanieb commented May 9, 2024

👍 I think people have a broader understanding of "concurrency limits" than parallelism.

@charliermarsh
Copy link
Member

I've gotten comfortable just using the word "concurrency" because of this: https://doc.rust-lang.org/book/ch16-00-concurrency.html

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There's maybe one other thing we care about here: the number of concurrent installs (which, today, is controllable by RAYON_NUM_THREADS). Once we've downloaded and built all the wheels, we then have to install them into the virtual environment, and we do this with Rayon. I think it's fine not to add a setting for that, but we could.

@charliermarsh charliermarsh added the configuration Settings and such label May 9, 2024
@zanieb
Copy link
Member

zanieb commented May 9, 2024

I have a minor preference for our own setting over a RAYON variable if it's easy.

@ibraheemdev
Copy link
Member Author

LGTM! There's maybe one other thing we care about here: the number of concurrent installs (which, today, is controllable by RAYON_NUM_THREADS). Once we've downloaded and built all the wheels, we then have to install them into the virtual environment, and we do this with Rayon. I think it's fine not to add a setting for that, but we could.

We almost might want a way to limit the size of tokio's blocking threadpool for fs operations (#3311). Ideally those options could be combined. Installs are fs bound so maybe we should just use tokio's blocking pool instead of rayon for that as well?

Either way I think I'll keep this PR limited to network and build limits.

@ibraheemdev ibraheemdev merged commit 783df8f into astral-sh:main May 10, 2024
43 checks passed
ibraheemdev added a commit that referenced this pull request May 18, 2024
…#3646)

## Summary

Continuation of #3493. This gives us
more flexibility in case we decide to move away from `rayon` in the
future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configuration Settings and such
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants