Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils/translator: produce merged cycles #216

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wmertens
Copy link
Contributor

Fixes #198

  • refactor with speed up via skipping visited nodes
  • merge cycles that have members in common
  • pick shortest name to be head of cycle set, so that between projects,
    the cycles remain the same

Question: Since this seems to be fast enough, can the python cycle finder in the executable translator be removed? I didn't look at it but based on the comment it does the same thing?

@DavHau
Copy link
Member

DavHau commented Jul 31, 2022

Did you do any performance benchmarks to see how the performance compares against the existing implementation?

Some newer pure translators use the simpleTranslate2 helper instead of simpleTranslate. The latter should be deprecated at some point.
Could you factor out the cycle logic to /src/lib/cycles.nix or something similar and use it in both translate helpers? Then all example projects will actually use it and we can see if the tests pass.

Question: Since this seems to be fast enough, can the python cycle finder in the executable translator be removed? I didn't look at it but based on the comment it does the same thing?

The python logic does not exist anymore. If it's still mentioned somewhere, that mention should be removed.

@wmertens
Copy link
Contributor Author

wmertens commented Aug 1, 2022

Ok, I moved the logic to getCycles.nix. The tests still pass but I couldn't really find anything that tests cycles.

When I test it on es-abstract it correctly generates the cyclicDependencies

{
  "cyclicDependencies": {
    "es-abstract": {
      "1.20.1": [
        [ "function.prototype.name", "1.1.5" ],
        [ "string.prototype.trimend", "1.0.5" ],
        [ "string.prototype.trimstart", "1.0.5" ]
      ]
    }
  }
}

The resulting build doesn't have es-abstract, but that is a different issue, being handled in #195.

The python comment is at

# don't use nix to detect cycles, this will be more efficient in python

As for benchmarking, it should be at least as fast as the previous version since it keeps track of cyclic nodes in the same way (without needing backlinks), but it also keeps track of visited nodes. I didn't really notice a speed difference between the two, since there's no separate unit test for the cyclic deps generation, and it's a fraction of the interpretation time for the whole flake.

@DavHau
Copy link
Member

DavHau commented Aug 4, 2022

There now seems to be an infinite recursion in the haskell example for some reason. BTW, you can run all tests locally via nix run .#tests-all. Or just the examples via nix run .#tests-examples.

The resulting build doesn't have es-abstract, but that is a different issue, being handled in #195.

Due to this, the current builder will remove direct dependencies function.prototype.name, string.prototype.trimend, and string.prototype.trimstart.

AFAICT the old algo was optimized to remove dependency edges as deep in the tree as possible. That guaranteed that top-level dependencies were never removed and that any removed dependencies could always be found within the ancestors.

The new algo now removes direct top-level deps (as seen in the es-abstract example). This is incompatible to the current build logic which speculates on nodejs searching ancestors to fulfill missing deps. These ancestors might be removed with the new algo therefore potentially breaking builds.

Anyways, it's good to have the logic separated now. But I think if this new logic only goes together with your new builder, it should be moved inside your builder itself, so it doesn't affect the other existing ecosystems or other existing builders. This would enable a smooth transition to your builder without breaking existing stuff.

Your changes to the cycle finder are optimized towards the idea of grouping dependencies together. That is specific for your build logic and not necessarily the best fit for other ecosystems or other builders. Therefore it makes sense to keep the logic inside your builder for now. If that works, and proves to be better, then we can still expand it to other ecosystems later.

@wmertens
Copy link
Contributor Author

wmertens commented Aug 4, 2022

I can let the builder call it directly, but it doesn't make sense to me that the algo is wrong.

A cycle is transitive: if a->b, b->a, a->c, c->a then a,b,c all belong in a cycle. b does not directly depend on c, but if you want b you will always get c as well.

Since the Nix store cannot have circular references, all dependencies must not directly depend on their circular paths (this includes a->b->c->a). This means either resolving at runtime via a separate list of references (the grouped node_modules + symlink resolving works this way), or co-locating all cycle members in the same store path.

The previous logic would not allow you to have e.g. both node-sp-auth and node-sp-auth-config visible as modules unless you use the non-standard symlink resolving.

IMHO, the cycles should be presented fully merged, and then each builder can decide what to do with them. The fact that picking a different cycle head breaks the build means that it was only accidentally working.

@DavHau
Copy link
Member

DavHau commented Aug 4, 2022

I can let the builder call it directly, but it doesn't make sense to me that the algo is wrong.

I'm not saying it is wrong, but your algorithm is specifically developed to support your strategy of bundling several packages into a single store path. But this is not the only possible approach to solve the problem.

Since recently in the nodejs builder installMethod=copy is enabled by default on all top-level packages. This solves the problem for nearly all packages including the node-sp-auth example. The only exception would be if a transitive dependency has an install script that requires exactly one of the removed deps. But this is quite unlikely I'd claim and could be fixed easily by just including the source of the missing dep into node_modules.

The installMethod=copy method works, because the current cycle remover only removes edges. All nodes are still present, and therefore transforming the graph into a flat copy will result in an installation that doesn't have missing deps. installMethod=copy can only work if no nodes are missing. This is not the case with the current PR's algo. It removes some nodes from the graph completely and this will lead to breakages in the current builder.

Now, I'm not saying that my current method is the best in terms of efficiency. Your builder might be better. But as of now, merging the current PR will break builds.Therefore I'd like to see it shipped together with your builder. I would like to see that the new build logic is in fact more reliable. We could test that using approaches like dream2nix-npm to test it on a massive amount of packages. Though that project is still work in progress. But I hope we will get it into a state soon that makes it usable for this purpose.

Also I don't think it's easy to determine a best way of dealing with cyclic deps. This is a highly ecosystem specific problem and the solutions might look completely different depending on the subsystem. Therefore maybe we shouldn't even have a default at all.

@wmertens
Copy link
Contributor Author

wmertens commented Aug 8, 2022

Right, that all makes sense. I'll set it up so it's a prototype.

However, I must note that this PR does not remove nodes at all. It only merges cycles.

Maybe I'm misunderstanding the previous code? It should behave the same as the previous code if mergeCycles is a simple unique concatenation.

@wmertens
Copy link
Contributor Author

wmertens commented Aug 8, 2022

Ah - found an issue with the code, in case of a->b->c->a, b should be part of the cycle but isn't right now. That probably causes the issue with Haskell. Will fix.

- refactor with speed up via skipping visited nodes
- merge cycles that have members in common
- pick shortest name to be head of cycle set, so that between projects,
  the cycles remain the same
@wmertens
Copy link
Contributor Author

wmertens commented Aug 10, 2022

@DavHau the cycles it creates are now correct but the haskell example still goes infinite, which means it didn't snip a cyclic dependency. To me this points to a bug in the haskell builder maybe.

Before I look at that code, can you perhaps tell me how haskell deps work?

I'm wondering if it wouldn't be better to store the cycles as a list of lists in the dream.lock and then have helpers produce the desired format for the builder.

So suppose you have a->b->c->a, and b->d->b, I think the current code produces the cycles [c, a] and [d, b], which really means "don't add a to c" and "don't add b to d", right? But that means that when you import c or d by themselves, they won't have a or b, and the language must cope with that.

Instead, this PR creates the single cycle [a, b, c, d] since any of these packages will result in importing the others. This is the basic information, and then the builder can decide how to handle that. In #195 the builder co-locates all members under a single store path, but another way would be to build each member separately and then have a way to import them at runtime, from a separate list (not possible with nodejs).

But if that latter is an option with Haskell, then each cycle member must simply exclude each other cycle member as a dependency when building. And while currently dream.lock allows {a = [b c];} as a cycle, it seems to me that the Haskell builder can't handle that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cyclic dependencies need grouping
2 participants