Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Constraints #120

Open
baszalmstra opened this issue Oct 13, 2022 · 13 comments
Open

Additional Constraints #120

baszalmstra opened this issue Oct 13, 2022 · 13 comments

Comments

@baszalmstra
Copy link
Contributor

Im (still) working on using PubGrub to solve Conda packages. I've made a lot of progress but currently run into the problem of "constraints".

Conda packages can have two types of dependencies: regular dependencies and constraints. Regular dependencies are packages that need to be included in the solution but constraints are dependencies that don't have to be included, but if they are (via the dependency of another package), have to adhere to the constraints. I think this is similar to peer dependencies in NodeJS.

I initially modeled these as regular dependencies which I could strip from the solution after solving. However, sometimes these constraints are used to ensure a certain other package is not selected. For instance, python packages that are incompatible with pypi often have the constraint: pypy <0a0. There is no version compatible with this since it denotes the lowest possible version. Using this as a dependency results in an error DependencyOnTheEmptySet which makes sense: that's exactly what the user intends.

Is this something that could be supported with pubgrub? Is this something I could potentially add?

@Eh2406
Copy link
Member

Eh2406 commented Oct 13, 2022

For some context #36 especially this comment and the PR #55.

Our surface level interface does not have a way to express this. It can only say "A depends on B" meaning "do not build the solution where a version of A is selected and a version of B is not also selected". However this is just a limitation of our interface. The internal Incompatibility data structure can happily express more complicated constraints like "do not build the solution where a version of A is selected and a version of B is selected". (Along with arbitrarily more exotic things.)

So far we've done a pretty good job of having a user-friendly public interface and documenting it at the module level, package level, and book level. I would be open to designing a more flexible interface as long as it continued to be easy to use. Alternatively we could design a lower level interface as long as we have a good pattern for how straightforward use cases can be done easily.
As you've noticed no one has had time to work on this project given its existing level of polish, so maybe we should lower our standards to let you get unstuck.

@mpizenberg
Copy link
Member

mpizenberg commented Oct 13, 2022 via email

@mpizenberg
Copy link
Member

This comment and the few following it are what I had in mind in my previous comment:
https://rust-lang.zulipchat.com/#narrow/stream/260232-t-cargo.2FPubGrub/topic/partial_solution/near/240019814

Basically, we make the hypothesis that a package term first appearance is necessarily coming from a dependency from a previously selected package in the partial solution. If we change the expressiveness of what users can put in dependencies, we just need to make sure that it doesn't contradict the shortcut we take when identifying the previous satisfier and the id of the step to backtrack.

If there is no issue, it's fine. And if there is an issue, we probably can just revert to the previous stricter approach of backtracking.

@baszalmstra
Copy link
Contributor Author

@Eh2406 I would very much appreciate getting unstuck! But I would also love to help keep up the level of polish for this project if at all possible!

Would it be an idea to modify the interface of the DependencyProvider to also return how a dependency should be used? Whether it should be included in the solution or not? I'm not really sure what other kinds of dependencies one might want to express.

@Eh2406
Copy link
Member

Eh2406 commented Oct 14, 2022

Yes we will have to change the return type of get_dependencies from Result<Dependencies<P, VS>, Box<dyn Error>>.

We could add in Enum for whether each dependency is required or constrained (wordsmithing required), something like Result<Dependencies<P, Requirement<VS>>, Box<dyn Error>>. This lets us add bespoke syntax for each kind of dependency we have found a use for, but this is beginning to get a little complicated.

The other kind of dependency we may want to add in the future is for modeling "week dependencies". (Cargo currently handles this is a post resolution pass.) Where fundamentally we want to say "if A and B are selected then C must also be selected".

While that's all I can think of at the moment, I wouldn't be surprised to discover there are more things people need. At some point we will probably need to add some kind of raw direct access for particularly weird cases. Possibly as a direct modification like #121.

Extra credit: There are two minor performance issues with the current API. Both examples of things that I wouldn't even know or care about if this were written in another language, but bug me because it's obvious in Rust. (Previous discussion about the same problems with an older version of the API at #18.) The only thing we do with the Dependencies returned from get_dependencies is look at each item one at a time and convert it to an Incompatibility. (The code looks a little more complicated because of interning.)

  1. We don't need to own the Dependencies to iterate over it. A caching dependency provider ends up having to clone this hash map just for us to iterate over it and throw it out.
  2. We don't need a hash map per se, we could almost take any iterator. The big thing a hash map ensures is that our dependency provider doesn't give us duplicate dependencies on the same package. But I don't know how bad that would actually be in practice.

The direct solution to the first problem is to take a reference Result<&Dependencies<P, VS>, Box<dyn Error>>, but that does not work for non-caching dependency providers. We could solve it by making everyone's life more complicated with something like Result<Cow<Dependencies<P, VS>>, Box<dyn Error>>, but that is a lot of complexity budget given the size of the problem.
Solving the second by returning Result<impl Into<Incompatibilitie<P, VS>>, Box<dyn Error>>, also blows the complexity budget even before we realize that impl trait is not supported in traits. So it actually involves in associated type, or two. And last time I tried it requires GATs. (lending iterators.)

I don't mean to say that these extra credit problems need to be solved (now or at all). Just saying if we end up with a high level "just return a hash map of dependencies" and a low level "give us the incompatibilities to add" it would be nice if the low level at least did not have these issues.

@baszalmstra
Copy link
Contributor Author

Thanks for the info, I will give it a go.

Ill start simple by implementing your suggestion: Result<Dependencies<P, Requirement<VS>>, Box<dyn Error>> where Requirement is an enum of Required, and Constrainted.

Im trying to figure out how to start:

  • You pointed to the Incompatibility struct, should a "constraint" be added as a new "Kind" but act the same as a dependency? Or should the package_terms be (in terms of how it is documented) "not A = 1, B = 2` instead of "A = 1, not B = 2"?
  • What exactly constitutes adding a package to the bill?

@mpizenberg
Copy link
Member

Honestly, I think the most important thing to start with would be improving the property test generator strategies to be able to add more general constraints (like any incompatibilities). I'd feel much less safe not knowing we have property tests to check such a change in the expressiveness of user constraints.

@baszalmstra
Copy link
Contributor Author

Could you provide some hints as to what you had in mind? Im still a bit in over my head with this project but Im happy to give it a go!

@mpizenberg
Copy link
Member

mpizenberg commented Oct 15, 2022

@Eh2406 has written in the guide a bit about how our property based strategies are done to generate sets of packages and dependencies https://pubgrub-rs-guide.netlify.app/testing/property.html#end-to-end-property-testing
I haven't looked at that for a long time. What would be awesome would be trying to find strategies to generate valid sets with more expressive dependencies, up to the expressiveness we want to allow in the API. Maximum expressiveness would be to allow any kind of valid incompatibility. (Incompatibilities are explained here https://pubgrub-rs-guide.netlify.app/internals/incompatibilities.html).
Currently the only kind of incompatibilities allowed coming from users are of the shape:

{ a : range_a, b : not(range_b) }

meaning versions within range_a of package a depends on versions within range_b of package b.
Actually even stricter, not range_a but exactly one version v1 since we ask dependencies of a specific package version.

If we want for example to allow users to specify constraints directly, such as (1) "this package a at version v1 is incompatible with all versions of package b", or (2) "this package a cannot be installed simultaneously with both b and c", these would be represented by the following incompatibilities (notice the absence of not compared to previously):

(1) { a : v1, b : all }
(2) { a : v1, b : all, c : all }

And if we allow users to provide these kind of incompatibilities via pubgrub API, we want to have property based strategies able to check situations with similar expressiveness. So this may not be trivial, but it's an important step to stay confident that an implementation is correct.
We have had the current property tests finding issues with pubgrub when it was v0.1 so it's important that we try to stay bug-free.

@Eh2406
Copy link
Member

Eh2406 commented Oct 15, 2022

Sorry I did not get time to respond to you yesterday.

What exactly constitutes adding a package to the bill?

"bill" was a typo for "build". You kind of have to expect typos when communicating with me, I understand that this makes things difficult and appreciate everyone's consideration. Thank you for asking for clarification.

You pointed to the Incompatibility struct, should a "constraint" be added as a new "Kind" but act the same as a dependency?

"Kind" has no influence on the algorithm, it is only used for building an error tree after the fact. I think it is reasonable for kind to develop a new variant, but I would lean toward FromDependencyOf having a field for storing the kind of Requirement. It will take experimenting with the code to figure out which ones cleaner.

Or should the package_terms be (in terms of how it is documented) "not A = 1, B = 2` instead of "A = 1, not B = 2"?

package_terms is the critical part in terms of the algorithm. The documentation you referred to is correct, but skips over the relevant subtleties. Specifically, the difference between a version set and a Term. A version set is a thing like < 3.0.0, you can take its "complement" and get like >= 3.0.0. For any concrete version v vs.contains(v) == !vs.complement().contains(v). But this ends up being insufficient for pubgrubs algorithm, because it is ambiguous about what happens if no version is selected. (Exactly relevance to our problem.) A Term add this bit of information. Term::Positive does not kick in unless there is a version selected, but Term::Negative is triggered by a lack of selected version.

A traditional dependency is expressed as { a : Term::Positive(set_a), b : Term::Negative(set_b) } where as your new constraint wants to be { a : Term::Positive(set_a), b : Term::Positive(set_b.complement()) }.

I will post this comment and work on the proptest side in the next comment.

@Eh2406
Copy link
Member

Eh2406 commented Oct 15, 2022

As to testing...

You should definitely add a couple normal tests here, that check for the basic semantics of your new API. If a package is only mentioned in constraints, then it is not selected. However if the package is required by a dependency and conflict with the constraint then the build errors. This will be very helpful for making sure we've added the correct number of complement calls in the code I just described. But any such examples are going to be superficial, not really touching the core of the algorithm. To make interesting bigger examples we will need to move on to the next kind of test.

We use proptest here to generate large interesting examples. And the only thing you need to add is one bit of information about each dependency, is it required or constrained. raw_dependency can get a new any::<bool>() for it, and then make the corresponding kind of dep here. Ok, so we have a random input how do we know it's right? There are a couple of proptests, but the big important one is that we compare the result to an SAT solver as defined here. Specifically converting deps into SAT var here, if the dep is a constraint then it could also be satisfied by there being no other version selected (not just by a matching version being selected or by the dependeror not being selected).

@baszalmstra
Copy link
Contributor Author

Thanks for the detailed explanation, I implemented the above and got it working for the simple tests (see this commit: baszalmstra@46def28). 🥳

I understand what to do with the SAT test but Im struggling with how to express "and no other versions are selected". Looking at the varisat crate also didn't help me. Would you be able to point me in the right direction?

I modified the Dependencies enum to contain:

/// Container for all available package versions.
Known(DependencyConstraints<P, Requirement<VS>>), 

a Requirement is:

pub struct Requirement<VS: VersionSet> {
    pub range: VS,
    pub kind: RequirementKind,
}

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum RequirementKind {
    /// A dependency that is resolved as part of the solution.
    Required,

    /// Constrains the version of package but does not require inclusion of the package.
    Constrained,
}

I used the term "requirement" instead of "dependency" (wordsmithing required as you said) but I didn't apply it throughout the code yet.

Any feedback at this point is already very welcome.

@Eh2406
Copy link
Member

Eh2406 commented Oct 17, 2022

I understand what to do with the SAT test but Im struggling with how to express "and no other versions are selected". Looking at the varisat crate also didn't help me. Would you be able to point me in the right direction?

I've been thinking about this since you asked. I keep writing up simple solutions, only to realize they don't work. The best I have is 12355f3 and that was easier to write than it was to explain. Basically I added a var represents no version of the package being selected.

Looking at your Requirement + RequirementKind, why not all in one:

pub enum Requirement<VS: VersionSet> {
    /// A dependency that is resolved as part of the solution.
    Required<VS>,

    /// Constrains the version of package but does not require inclusion of the package.
    Constrained<VS>,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants