Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 'multidep' (RFC-3176) #10061

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

Byron
Copy link
Member

@Byron Byron commented Nov 9, 2021

Tracking issue: #10030
Depends on: #9992

Tasks

  • 'multidep' feature toggle
  • spread out multi-name dependencies to same crate
    • represent these in cargo-metadata
  • multiple targets may be specified in renamed artifact dependencies
  • 🧹cleanup and finalization
    • assure no TODO(ST) markers are left in code
    • assure no tests are ignored

Implementation and review notes

  • cargo metadata is ignored and was adapted to work exactly as before, ignoring duplicate packages for now. This is due to the suggestion made in this review comment.

  • 🧪 tests, possibly for more subtle RFC constraints

    • no-multidep toggle resorts to previous behaviour

Review Progress

Sponsored by Profian

@rust-highfive
Copy link

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 9, 2021
@Byron Byron marked this pull request as draft November 9, 2021 02:04
@Byron Byron force-pushed the rfc-3176 branch 2 times, most recently from 1205b3c to 7030a2b Compare November 14, 2021 04:10
@Byron
Copy link
Member Author

Byron commented Nov 14, 2021

I have created a draft of simple renames in resolve metadata, which result in some ambiguity in the deps.name field as it now possibly corresponds to multiple names which are also repeated in the "extern_name" field of the respective "dep_kinds". For now I have set the deps.name to an empty string to be somewhat backwards compatible, but since feature is gated I could imagine it to become an array of names instead as well.

CC @joshtriplett

@bors
Copy link
Collaborator

bors commented Nov 14, 2021

☔ The latest upstream changes (presumably #10081) made this pull request unmergeable. Please resolve the merge conflicts.

@Byron Byron force-pushed the rfc-3176 branch 6 times, most recently from 8b0b17a to 4fbf0fe Compare November 17, 2021 04:23
@Byron
Copy link
Member Author

Byron commented Nov 17, 2021

@ehuss I believe the RFC is implemented from all I can see and has a few tests to support that. Maybe that qualifies it for a first review round. What makes this more difficult is that this PR depends on another one which isn't merged yet, and @joshtriplett as reviewer of that one and you might align to find an approach that minimizes work. My preference clearly is to focus on this PR and mostly forget about the other one, implementing both RFCs at once when this one is merged, and avoiding changing #9992 entirely.

Both PRs have reviewer notes, some of which describe limitations.

Thanks for your time, I am looking forward to hearing from you :).

@Byron Byron marked this pull request as ready for review November 17, 2021 10:48
@bors
Copy link
Collaborator

bors commented Dec 13, 2021

☔ The latest upstream changes (presumably #10172) made this pull request unmergeable. Please resolve the merge conflicts.

@Byron Byron force-pushed the rfc-3176 branch 3 times, most recently from 54c9aa6 to ae44640 Compare December 14, 2021 02:55
@ehuss ehuss assigned joshtriplett and unassigned ehuss Dec 14, 2021
@bors
Copy link
Collaborator

bors commented Dec 16, 2021

☔ The latest upstream changes (presumably #10201) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Jan 12, 2022

☔ The latest upstream changes (presumably #10265) made this pull request unmergeable. Please resolve the merge conflicts.

@Byron Byron force-pushed the rfc-3176 branch 3 times, most recently from f69bbb3 to 4f35bd0 Compare January 14, 2022 01:06
@Byron Byron force-pushed the rfc-3176 branch 2 times, most recently from 2a57c7a to a0c50db Compare February 5, 2022 08:13
@bors
Copy link
Collaborator

bors commented Feb 22, 2022

☔ The latest upstream changes (presumably #9992) made this pull request unmergeable. Please resolve the merge conflicts.

@Byron
Copy link
Member Author

Byron commented Feb 23, 2022

Now that the PR for RFC-3028 has been merged (🎉) I am in the process of readying this one for review.

When rebasing against master, the filter-madness that was generously fixed by @ehuss returned and I 'just made it work' for now and will try to simplify it along with finishing some other tasks as tracked in the Leftover from RFC 3028 headline.

@Byron
Copy link
Member Author

Byron commented Feb 24, 2022

After trying to address the left overs from RFC-3028 I have arrived in a state where I'd need input to learn how to proceed from here.

Besides the question around metadata tests I tried to change file placement by uplifting them instead of adjusting the units output directory as a whole as suggested by @joshtriplett. The intermediate results can be found in this branch - some tests are failing but they are fixable with more time.

Even though uplifting is generally working, it also has to tests indicating filename collisions. This probably stems from the fact that the library is both depended on by the main manifest as well as by the binary. Several attempts to deduplicate this failed and one reason probably is that it is the job server who would have to avoid scheduling a duplicate job while allowing the unit to be used as dependency (when building the rustc command-line) as it normally would, while waiting for similar jobs to finish before proceeding. Doing it like this would have the benefit of avoiding duplicate work while further reducing the chance for collisions. This approach however seems like a major time investment both into the job server as well as into the collision check.
This commit shows the only way I could make it work with uplifting, effectively separating the bulk of files produced into its own directory (here deps/artifact_deps) while uplifting the desirable files into the folder described in the RFC, i.e. deps/artifacts.

Thus I wonder how @joshtriplett feels about this. Is there something obvious I am missing to make uplifting the way you had in mind? I'd be glad if you had directions on how to proceed here.

Thanks for your help.

@joshtriplett
Copy link
Member

@Byron I answered the metadata test question in the corresponding linked issue.

On further consideration, I don't think you need to put any effort at all into reducing the number of rustc invocations. Trying to unify builds across artifact and non-artifact dependencies in the uncommon case where they even can be unified seems like unnecessary complexity in an already complex implementation.

(You do need to deduplicate multiple artifact dependencies for the same target, in multiple places in the tree. But don't worry as much about trying to deduplicate between host and same-target-as-host.)

You should not need to modify the job server at all here; any deduplication should be happening in the crate graph before it gets built.

I would focus on correctness, and on placing files in the desired locations, rather than on deduplication.

I would have expected that you could build each dependency in a subdirectory of the same directory, even multiple copies of the same dependency, since they should have different hashes and those hashes are included in the subdirectory name. Am I missing something that's leading to the conflict?

@Byron
Copy link
Member Author

Byron commented Mar 15, 2022

Thanks @joshtriplett for sharing, for a moment I thought I knew how to proceed but then the moment vanished, and here is why 😅.

I think the following parts are key to understanding what I think is going on (this goes with the usual disclaimer of me probably not really knowing what's happening, but here we go anyway).

I would have expected that you could build each dependency in a subdirectory of the same directory, even multiple copies of the same dependency, since they should have different hashes and those hashes are included in the subdirectory name. Am I missing something that's leading to the conflict?

This is something that was surprising to me at first, but is probably exactly why we see windows failure like this recent one occasionally. The binary artifact is built multiple times into the same directory as they are exactly the same, but in multiple positions in the dependency graph with different names in the manifest. The package name is used instead of the name in the manifest, the hashes end up being the same. The easiest way out would be to change the name of the target directory to contain the name in toml if it differs. Would that be OK for you?

(You do need to deduplicate multiple artifact dependencies for the same target, in multiple places in the tree. But don't worry as much about trying to deduplicate between host and same-target-as-host.)

This is something I consider very difficult as it's an entirely new concept for the jobs server, I believe. Previously building the same library multiple times was only possible if they would differ in version, which yields different outputs. What I don't understand is how it could ever work if that a library (not even an artifact, in a single version) is used multiple times in the same dependency graph while only building it once (and into the same location), as the unit-dependencies don't seem to do any kind of deduplication either which means a build would be triggered multiple times. The reason this works is probably that builds of the same library aren't usually done concurrently (even though that could happen), so the job server detects they are fresh and won't rebuild because of that. Otherwise the current system seems quite able to build the same library multiple times and concurrently, causing races around writing files on windows.
Actual deduplication could happen on unit-dependency level, but it would require to know which instance of the crate will be built first, while making other instance of it more like 'proxies' which aren't built/scheduled but somehow wait until the actual instance is done building so it can be used. (I hope this makes sense at all). The alternative to such a system would probably be to assure each instance of a crate gets its own unique name so they never overlap.

I hope that I am missing something essential, but until then I am more or less on the same state as I was when I wrote this comment right above the test that tends to fail due to races.

Fixing this could probably partially be accomplished by putting dependencies into a more uniquely named directory that takes into consideration their name in toml, but that wouldn't fix the possibility of races which I think has existed before but is now exacerbated with the introduction of artifact dependencies.

All of the above is probably more like a brain dump but I hope it helps to point me towards the solution which I seem to be unable to see without complicating things tremendously (and I would want to do better).

@joshtriplett
Copy link
Member

joshtriplett commented Mar 15, 2022

The package name is used instead of the name in the manifest, the hashes end up being the same.

Ah, I see! I think, for artifact dependencies, it may make sense to feed whether the build is an artifact build or not into both the metadata calculation and the fingerprint calculation, so that artifact and non-artifact builds get different hashes. (The target name is already fed in.) That should eliminate collisions between artifact and non-artifact builds. (If, in the future, it becomes feasible to deduplicate between artifact and non-artifact builds, we can always drop that change.)

See the documentation at the top of https://github.com/rust-lang/cargo/blob/master/src/cargo/core/compiler/fingerprint.rs for more details there.

What I don't understand is how it could ever work if that a library (not even an artifact, in a single version) is used multiple times in the same dependency graph while only building it once (and into the same location), as the unit-dependencies don't seem to do any kind of deduplication either which means a build would be triggered multiple times. The reason this works is probably that builds of the same library aren't usually done concurrently (even though that could happen), so the job server detects they are fresh and won't rebuild because of that. Otherwise the current system seems quite able to build the same library multiple times and concurrently, causing races around writing files on windows.

I may be missing something myself, but as far as I can tell, the dependency graph does already get deduplicated. I just did a test with a crate toplevel depending on b and c, both of which depend on a. Cargo built a and waited for it to be done before building either b or c.

Perhaps this deduplication is happening as part of unification, and since you're not supporting unification, you're getting duplicates in the dependency graph? You might try a test with a similar diamond-shaped dependency graph like the one I described above, without any artifact dependencies, and look at how stock Cargo processes that to see where a gets deduplicated. I would explore that first, before making any changes related to the metadata or fingerprint hashes.

In any case, the deduplication should definitely not happen in the job server; it should just run the jobs it's handed. Any deduplication should happen in the graph before ever handing it to the job server.

@Byron
Copy link
Member Author

Byron commented Mar 16, 2022

Perhaps this deduplication is happening as part of unification, and since you're not supporting unification, you're getting duplicates in the dependency graph? You might try a test with a similar diamond-shaped dependency graph like the one I described above, without any artifact dependencies, and look at how stock Cargo processes that to see where a gets deduplicated. I would explore that first, before making any changes related to the metadata or fingerprint hashes.

In any case, the deduplication should definitely not happen in the job server; it should just run the jobs it's handed. Any deduplication should happen in the graph before ever handing it to the job server.

Thanks so much, I regained hope :), and what you say makes sense. Indeed, there must be some mechanism happening that avoids duplicate work, and probably it's more than just freshness of fingerprint files as the example you mentioned wouldn't have worked otherwise. Once the mechanism is discovered, it should be possible to either adjust fingerprinting to know about artifacts and/or make sure the unification handles artifact dependencies.

@Byron
Copy link
Member Author

Byron commented Mar 16, 2022

Alright, I think I am onto something here and should be able to eventually get to the bottom of this.

The recent failure on windows is indeed because…

  • units aren't deduplicated (after all, they use different profiles in different dependency types) - (deduplication happens in the unit-interner)
  • The extra-filename hash that disambiguates both outputs on unix isn't present here, and maybe that's the reason both binaries are placed into the same directory as well - they lack the extra-filename information which does affect the build generated build dir.

Here is the log lines produced on windows for completeness.

  Running `rustc --crate-name bar bar\src\main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=1 -C embed-bitcode=no -C debuginfo=2 -C debug-assertions=on -C metadata=5d165022df23eea2 --out-dir D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps\artifact\bar-8f668ecb076f65c2\bin -L dependency=D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps --extern bar=D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps\libbar-43b90b8924029221.rlib`
[2684](https://github.com/rust-lang/cargo/runs/5547963529?check_suite_focus=true#step:9:2684)
     Running `rustc --crate-name bar bar\src\main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C embed-bitcode=no -C debuginfo=2 -C debug-assertions=on -C metadata=83de8e9fff94a191 --out-dir D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps\artifact\bar-8f668ecb076f65c2\bin -L dependency=D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps --extern bar=D:\a\cargo\cargo\target\tmp\cit\t56\foo\target\debug\deps\libbar-bfc0e72712f573b8.rlib`
[2685](https://github.com/rust-lang/cargo/runs/5547963529?check_suite_focus=true#step:9:2685)
error: linking with `link.exe` failed: exit code: 1104

The target_short_hash() would probably have to learn about the name in the manifest to further disambiguate, and I will try that something to disambiguate.

Edit: The disambiguation must be independent of the name in the manifest, this test only renames to avoid clashes on MSVC apparently and that doesn't work as the name doesn't matter. It looks more like a general problem on systems with missing filename extensions whose fix contradicts the purpose of using a stable hash in the first place. The meta-data hash should be reproducible so maybe it's OK to use in conjunction with the stable hash, but why would we not want to use the metadata hash in place of the stable hash? The only way to fix this without changing the way stable hashes work fundamentally would be to detect that case in a post-processing step and adjust the stable hash only when needed.

Once that is working though, I think the uplifting shouldn't suffer from collisions anymore either.

@bors
Copy link
Collaborator

bors commented Mar 16, 2022

☔ The latest upstream changes (presumably #10433) made this pull request unmergeable. Please resolve the merge conflicts.

This is a squashed commit comprised of the following:
-----------------------------------------------------

Add metadata test to see how it would respond to multiple names for a crate with the same version

Add all multidep specific artifact tests

They are ignored for now until the actual implementation comes in for
validation.

Add support for multiple names for the same crate

It's worth mentioning that crates, called packages, are identified
by their name and their semver version. The resolver handles
figuring out which versions are allowed, but typically separates
crates of the same name by breaking semver version. Thus a:0.1 and
a:0.2 are treated as two different dependencies as far as the code
is concerned. a:0.1.0 is a package id.

In order to allow multiple names to the same package id code that used
to be fine with a single id now has to deal with multiple pairs of
package-id/dependency-name pairs.

The key to understanding why so many functions are affected is
Resolver::extern_crate_names_and_dep_names() which trickles up
into unit-dependencies.

Re-activate all tests validating 'multidep' thus far

Remove adjustments to `cargo metadata` test with bindeps

It's there to show what currently happens if there are multideps,
which basically means these are ignored entirely, showing only
the first available name of the package.

Fix build errors due to dep-filter signature changes
It's really only important when main dependencies are handled, but
and otherwise is always letting dependencies pass.
…0407)

That way, we know that the case without non-artifact dependencies keeps
working, see rust-lang#10407 (comment)
for the source of the change.
Collisions in binaries or dylibs are quite easy to get into when
compiling artifacts deps in conjunction with build overrides.

The latter will cause multiple builds and prevent deduplication,
while certain platform rules prevent unique filename to be generated
while dropping files into the same folder.

Now we recomputate outputs after detecting certain avoidable collisions
to be more usable in this specific situation.
@Byron
Copy link
Member Author

Byron commented Mar 16, 2022

Just to follow up on the previous question about deduplication and in short: the existing deduplication mechanism works exactly as it should. Issues occur due to uplifting when there is a lib = true artifact dependency like in this test.

Since the artifact units are built separately for bin, cdylib and staticlib kinds, one will get duplication when the library itself is built which is all the three kinds in one. This is where the output filename collision occours, it actually builds certain artifacts multiple times. A simple fix doesn't actually fix the duplication, but the collision by placing artifact libraries into their own directory.

Doing the actual deduplication appears like it would complicate unit dependency generation more, along with the outstanding changes needed to get the uplifting itself to work consistently.

Now I wonder if uplifting is a feature really worth having for the cost of such deduplication.
Can I proceed with what's there and place duplicate artifact library dependencies into its own folder to avoid collisions, or is deduplication on unit-dependency level to avoid collisions the way to go?
Thanks again for guidance.


On another note, I have added 05a76cb to fix a flaky test in a way that I think it can be fixed if file name extensions aren't supported. These kinds of collisions we will see a lot of with artifact dependencies and multi-dep at least on certain platforms, so I think it's valuable to correct them if we can. The implementation feels a bit forced as it rides on the back of the collision check which may now try to fix output destinations. I'd be happy to learn about better ways :).

@joshtriplett
Copy link
Member

joshtriplett commented Apr 26, 2022

@Byron I think it's perfectly fine to skip uplifting for the initial PR, and we can re-evaluate it before stabilization. And if that allows skipping changes to the scheduler, that sounds like a great simplification. (Update: based on our conversation, it sounds like uplifting will actually end up working after this PR. Thank you!)

It's also fine to skip de-duplication for now; having the same dependency for both an artifact dependency and a regular dependency will be an unusual case, and if taking a bit more time to build that twice substantially simplifies the code, that's not a problem for now.

The one case I want to make sure this does address correctly, though, is feature unification. If you build something twice, once for the regular dependency and once for the artifact dependency, that's fine, as long as feature unification occurs for builds targeting the same target, as specified in the RFC.

Can I proceed with what's there and place duplicate artifact library dependencies into its own folder to avoid collisions, or is deduplication on unit-dependency level to avoid collisions the way to go?

Go ahead and place them in their own folders to avoid collisions, so that you don't have to poke at the scheduler for now.

@bstrie
Copy link
Contributor

bstrie commented Apr 27, 2022

having the same dependency for both an artifact dependency and a regular dependency will be an unusual case

We actually do have a use for this, so perhaps not that unusual. :P But in our case our bindep and our direct dep are built for different targets so you'd have to compile their dependencies twice anyway (though I suppose it would be nice if the proc macro crates only get built once, for the host).

The one case I want to make sure this does address correctly, though, is feature unification. If you build something twice, once for the regular dependency and once for the artifact dependency, that's fine, as long as feature unification occurs for builds targeting the same target, as specified in the RFC.

For our use case, we may have some general objection to the idea of aggressively unifying features; there's an argument to be made that artifact dependencies might want to select distinctly different features from the main crate, even when built for the same target. However, this is just me registering our future concern, and I don't think that we need to block this PR on that, especially if it contradicts the RFC.

@npmccallum
Copy link
Contributor

The one case I want to make sure this does address correctly, though, is feature unification. If you build something twice, once for the regular dependency and once for the artifact dependency, that's fine, as long as feature unification occurs for builds targeting the same target, as specified in the RFC.

We actually have a case where this behavior is problematic. Our project is complicated, but imagine we are creating a VM hypervisor. We have host code and guest code. The guest code is built for -musl. The host code is currently built for -gnu. This currently allows us to have independent dependency graphs. However, we are now unable to build a static binary for the host code because this would unify the dependency graphs when we need them independent.

I understand that in the typical case, unification probably is the right decision. But I would love the option to disable unification in bindeps even on the same target.

@bors
Copy link
Collaborator

bors commented Jul 16, 2022

☔ The latest upstream changes (presumably #10868) made this pull request unmergeable. Please resolve the merge conflicts.

@jedbrown
Copy link

Hi, I'm curious if this is being abandoned or needs some specific help. My use case is building crates with --target nvptx64-nvidia-cuda and also as a regular lib dependency on the host. I'm currently doing this with a build.rs, but this RFC would offer a much better user experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants