Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency resolution with large large of artifacts (500 or more) take 20+ minutes #1120

Open
oliviernotteghem opened this issue Apr 30, 2024 · 4 comments

Comments

@oliviernotteghem
Copy link

oliviernotteghem commented Apr 30, 2024

When bumping up a given library version, we need to repin, i.e dependency management runs again and can take more than 20 min in our case (on a relatively fast connection @ 800mbs) and seems very network intensive.

Anything that can be done to speed up this operation?
It looks like each JAR/AAR are being download, however we might only need POM files, maybe this step could be skipped?

@shs96c
Copy link
Collaborator

shs96c commented Apr 30, 2024

The actual jar files are required since they're used for calculating the sha256 attributes we use to allow the repository cache to work. Fortunately, there are some thing you can do to make dependency resolution faster:

  1. Always use a lock file, and set the fail_if_repin_required attribute to True. This will mean that the only time you need to do the dependency resolution is when a dependency is changed.
  2. Take a look at the env vars you can pass to the resolvers. The recently added Maven-based resolver is a little more configurable, but you can also set things like the COURSIER_CACHE to speed up the default resolver.
  3. Temporarily add m2local as a repository for your maven_install to use. You'll need to hand-edit the resulting lock file to remove the flag telling rules_jvm_external to look at the local maven repo at build time, but that should be an easy change to make.

How many deps are you resolving? And which resolver are you using? 20 minutes seems longer than I'd expect on the kind of connection you're using.

@oliviernotteghem
Copy link
Author

Thanks @shs96c

  1. we are using pinning, we already run dep resolution only when strictly necessary.
  2. thanks for making maven-based resolver more configurable, we will look into replicating your RJE_UNSAFE_CACHE with existing resolver. This sounds doable relatively easily as cache folder seems to be already configurable (via COURSIER_CACHE env variable or cmd line arg)
  3. A bit too manual / hacky I guess :) But thanks for bringing this up

@jbarr21
Copy link

jbarr21 commented May 2, 2024

@shs96c rules_jvm_external already sets the --cache v1 flag so the COURSIER_CACHE env var does not apply

we ended up adding an env var to our fork of RJE which allows us to control the --parallel flag of coursier which allows to adjust the default number of parallel downloads. the default is 6, but when building on cloud machines, we can raise this to 64 & bring our dependency resolution time down to 2-3min

@honnix
Copy link
Contributor

honnix commented May 20, 2024

I believe #1137 should be able to help. You can pass --parallel 64 from maven_install to coursier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants