Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pnpm dedupe command #5958

Merged
merged 3 commits into from Jan 23, 2023
Merged

feat: add pnpm dedupe command #5958

merged 3 commits into from Jan 23, 2023

Conversation

gluxon
Copy link
Member

@gluxon gluxon commented Jan 20, 2023

Changes

This PR adds a pnpm dedupe command that works similarly to yarn dedupe --strategy highest.

Motivation

Today, pnpm is already very good at deduplicating on pnpm install. If the specifier of a new dependency matches an existing dependency already present in pnpm-lock.yaml, that existing version will be reused. βœ…

However, the pnpm-lock.yaml file can get into a state that requires cleanup. Suppose multiple workspace packages depend on ajv@^6.10.2.

# pnpm-lock.yaml
lockfileVersion: 5.4

importers:

  .:
    specifiers: {}

  packages/bar:
    specifiers:
      ajv: ^6.10.2
    dependencies:
      ajv: 6.10.2

  packages/foo:
    specifiers:
      ajv: ^6.10.2
    dependencies:
      ajv: 6.10.2

packages:

  /ajv/6.10.2:
    # ...

  # ...

If a user introduces a new version specifier for ajv that doesn't overlap with any existing versions in pnpm-lock.yaml, pnpm will install a new version of ajv alongside the old one. This makes sense and is generally desirable.

# pnpm-lock.yaml
lockfileVersion: 5.4

importers:

  .:
    specifiers: {}

  packages/bar:
    specifiers:
      ajv: ^6.10.2
    dependencies:
      ajv: 6.10.2

  packages/foo:
    specifiers:
      ajv: ^6.12.5  # πŸ‘ˆ A user changed only this, without changing it in bar.
    dependencies:
      ajv: 6.12.6

packages:

  /ajv/6.10.2:
    # ...

  /ajv/6.12.6: # πŸ‘ˆ A new version of ajv is added to the lockfile.
    # ...

  # ...

To avoid unexpected surprises, pnpm only updates the foo workspace package's wanted version of ajv since it's the project that changed. It does not rewrite the bar package's wanted version, despite ^6.10.2 satisfying the newly installed 6.12.6 version.

While the above makes sense for normal pnpm install behavior, from time to time it's desirable slim down the pnpm-lock.yaml file by re-resolving. This was added as a new command since upgrading existing resolutions should be explicitly intentional and generally requires testing afterwards.

In a large monorepo on my team, this new command was able to remove hundreds of dependencies correctly.

Other Discussions

Questions

  • Should pnpm dedupe fail if the wanted lockfile isn't the same as the current lockfile? This is a scenario that would cause new dependencies to be added.

Testing

The above example was added as a fixture named workspace-with-lockfile-dupes. Running pnpm dedupe on the fixture creates the following pnpm-lock.yaml diff.

diff --git a/__fixtures__/workspace-with-lockfile-dupes/pnpm-lock.yaml b/__fixtures__/workspace-with-lockfile-dupes/pnpm-lock.yaml
index a82a185fd..f8de69cc0 100644
--- a/__fixtures__/workspace-with-lockfile-dupes/pnpm-lock.yaml
+++ b/__fixtures__/workspace-with-lockfile-dupes/pnpm-lock.yaml
@@ -9,7 +9,7 @@ importers:
     specifiers:
       ajv: ^6.10.2
     dependencies:
-      ajv: 6.10.2
+      ajv: 6.12.6
 
   packages/foo:
     specifiers:
@@ -19,15 +19,6 @@ importers:
 
 packages:
 
-  /ajv/6.10.2:
-    resolution: {integrity: sha512-TXtUUEYHuaTEbLZWIKUr5pmBuhDLy+8KYtPYdcV8qC+pOZL+NKqYwvWSRrVXHn+ZmRRAu8vJTAznH7Oag6RVRw==}
-    dependencies:
-      fast-deep-equal: 2.0.1
-      fast-json-stable-stringify: 2.0.0
-      json-schema-traverse: 0.4.1
-      uri-js: 4.2.2
-    dev: false
-
   /ajv/6.12.6:
     resolution: {integrity: sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g==}
     dependencies:
@@ -37,18 +28,10 @@ packages:
       uri-js: 4.4.1
     dev: false
 
-  /fast-deep-equal/2.0.1:
-    resolution: {integrity: sha512-bCK/2Z4zLidyB4ReuIsvALH6w31YfAQDmXMqMx6FyfHqvBxtjC0eRumeSu4Bs3XtXwpyIywtSTrVT99BxY1f9w==}
-    dev: false
-
   /fast-deep-equal/3.1.3:
     resolution: {integrity: sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==}
     dev: false
 
-  /fast-json-stable-stringify/2.0.0:
-    resolution: {integrity: sha1-1RQsDK7msRifh9OnYREGT4bIu/I=}
-    dev: false
-
   /fast-json-stable-stringify/2.1.0:
     resolution: {integrity: sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==}
     dev: false
@@ -57,22 +40,11 @@ packages:
     resolution: {integrity: sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==}
     dev: false
 
-  /punycode/2.1.1:
-    resolution: {integrity: sha512-XRsRjdf+j5ml+y/6GKHPZbrF/8p2Yga0JPtdqTIY2Xe5ohJPD9saDJJLPvp9+NSBprVvevdXZybnj2cv8OEd0A==}
-    engines: {node: '>=6'}
-    dev: false
-
   /punycode/2.3.0:
     resolution: {integrity: sha512-rRV+zQD8tVFys26lAGR9WUuS4iUAngJScM+ZRSKtvl5tKeZ2t5bvdNFdNHBW9FWR4guGHlgmsZ1G7BSm2wTbuA==}
     engines: {node: '>=6'}
     dev: false
 
-  /uri-js/4.2.2:
-    resolution: {integrity: sha512-KY9Frmirql91X2Qgjry0Wd4Y+YTdrdZheS8TFwvkbLWf/G5KNJDCh6pKL5OZctEW4+0Baa5idK2ZQuELRwPznQ==}
-    dependencies:
-      punycode: 2.1.1
-    dev: false
-
   /uri-js/4.4.1:
     resolution: {integrity: sha512-7rKUyy33Q1yc98pQ1DAmLtwX109F7TIfWlW1Ydo8Wl1ii1SeHieeh0HHfPeL2fMXK6z0s8ecKs9frCuLJvndBg==}
     dependencies:

@gluxon gluxon force-pushed the dedupe branch 3 times, most recently from f4f4bc9 to cca24e2 Compare January 20, 2023 09:10
@gluxon gluxon marked this pull request as ready for review January 20, 2023 09:40
@gluxon gluxon requested a review from zkochan as a code owner January 20, 2023 09:40
@zkochan
Copy link
Member

zkochan commented Jan 22, 2023

This solution feels a bit hacky.

Have you tested this package: pnpm-deduplicate?

I guess we could integrate it to our CLI if it works good (or better than the suggested solution).

cc @ocavue

@gluxon
Copy link
Member Author

gluxon commented Jan 22, 2023

Have you tested this package: pnpm-deduplicate?

Yup! I've seen pnpm-deduplicate and linked to the pnpm issue that @ocavue filed originally in the description. pnpm-deduplicate was noted to be a temporary tool until an official pnpm dedupe exists. See ocavue/pnpm-deduplicate#9 (comment)

I guess we could integrate it to our CLI if it works good (or better than the suggested solution).

I'm appreciative of the work @ocavue made to create pnpm-deduplicate, but do feel it won't ever work as well as something built into pnpm simply because it can't benefit from knowing some pnpm internals.

At the moment, the pnpm-deduplicate tool works by:

  1. Running pnpm install on start.
  2. Noting what version specifiers can be bumped by analyzing dependencies and package.json files.
  3. Generates a dictionary of pnpm.overrides to update those versions, and adds them to the root package.json. It runs pnpm install again for those new overrides.
  4. Removes the pnpm.overrides it added and runs pnpm install a third time to regenerate pnpm-lock.yaml.

The pnpm-deduplicate tool needing to run pnpm install 3 times was what made my teammates ask if something more performant could be built into pnpm itself.

There's also an edge case with pnpm-deduplicate since it doesn't take existing pnpm.overrides or readPackage hooks into account. These are pnpm internals a third party CLI probably shouldn't ever know about. Running pnpm-deduplicate on a monorepo I work on left it in a state where it kept trying to deduplicate a package that was declared in pnpm.overrides.

This solution feels a bit hacky.

This approach relies on the existing broken lockfile code to re-resolve dependencies. Was that the part that looked a bit fishy to you? Admittedly this assumption should be more explicit.

I can think of more ways to guarantee the deleted dependencies are regenerated if that's what blocks this from merging, but settled for the test case to start.

@zkochan
Copy link
Member

zkochan commented Jan 22, 2023

This approach relies on the existing broken lockfile code to re-resolve dependencies. Was that the part that looked a bit fishy to you? Admittedly this assumption should be more explicit.

ok, add a comment about it to the code

@@ -37,6 +37,7 @@ export interface StrictInstallOptions {
linkWorkspacePackagesDepth: number
lockfileOnly: boolean
fixLockfile: boolean
resetResolutionsWithPreferredVersions: boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just name it dedupe?

Suggested change
resetResolutionsWithPreferredVersions: boolean
dedupe: boolean

Also, you're saying that this works because of the fixLockfile logic but I don't see the fixLockfile set to true anywhere

Copy link
Member Author

@gluxon gluxon Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just name it dedupe?

Can do! Wasn't sure if the extra verbose name here was that helpful. Thanks for the check.

Also, you're saying that this works because of the fixLockfile logic but I don't see the fixLockfile set to true anywhere

I think I could have been more clear in the last comment.

My understanding of fixLockfile is that it can regenerate parts of pnpm-lock.yaml if there are any dangling resolutions. For example, a package refers to foo@1.0.0, but that version doesn't exist in the lockfile anymore.

Since there's no dangling resolutions, fixLockfile isn't needed explicitly since pnpm in general rebuilds the wanted lockfile as it resolves the dependency graph. This should be using the same assumptions the existing forgetResolutionsOfPrevWantedDeps function does for importers. Admittedly, there could be stronger guarantees of this assumption for packages.

I think I could extend the test to make sure a pnpm install after pnpm dedupe doesn't change the lockfile to better capture the assumption packages is regenerated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, there could be stronger guarantees of this assumption for packages.

Here are the parts of resolve-dependencies that regenerate packages. The toLockfileDependency function is fine with prevSnapshot.dependencies being a partial. This runs regardless of fixLockfile.

const newResolvedDeps = updateResolvedDeps(
opts.prevSnapshot?.dependencies ?? {},

return mergeRight(
prevResolvedDeps,
newResolvedDeps

I'll comment that this is happening in resolve-dependencies.

@gluxon
Copy link
Member Author

gluxon commented Jan 22, 2023

This approach relies on the existing broken lockfile code to re-resolve dependencies. Was that the part that looked a bit fishy to you? Admittedly this assumption should be more explicit.

ok, add a comment about it to the code

Can do! I can make that more clear.

I also wanted to note that we do something similar here already:

// If the specifier is new, the old resolution probably does not satisfy it anymore.
// By removing these resolutions we ensure that they are resolved again using the new specs.
function forgetResolutionsOfPrevWantedDeps (importer: ProjectSnapshot, wantedDeps: WantedDependency[]) {

Thinking through this again, I think it might be more clear if I move the existing logic into a forgetResolutionsOfAllPrevWantedDeps function below the one above?

@zkochan
Copy link
Member

zkochan commented Jan 22, 2023

a new function sounds good

@gluxon gluxon force-pushed the dedupe branch 2 times, most recently from 2ce3394 to 5539069 Compare January 22, 2023 01:53
@gluxon
Copy link
Member Author

gluxon commented Jan 22, 2023

ok, add a comment about it to the code

For packages, I added the following comment:

// The resolveDependencies function takes previous PackageSnapshot
// dependencies/optionalDependencies, and merges them with new resolved
// deps. Clear the previous PackageSnapshot fields so the newly resolved
// deps are always used.

For importers, I made this more clear with a new forgetResolutionsOfAllPrevWantedDeps function, as discussed. This function was added below the existing forgetResolutionsOfPrevWantedDeps, which uses the same assumptions.

// Reset all previously recorded resolutions for the importer to force them to
// be resolved again.
function forgetResolutionsOfAllPrevWantedDeps (importer: ProjectSnapshot): ProjectSnapshot {

@gluxon
Copy link
Member Author

gluxon commented Jan 22, 2023

Latest push has the following changes:

  • Moved logic deleting importer resolutions into forgetResolutionsOfAllPrevWantedDeps.
  • Updated comments around resolveDependencies assumptions.
  • Renamed resetResolutionsWithPreferredVersions to dedupe.

@zkochan
Copy link
Member

zkochan commented Jan 22, 2023

Overall it looks OK. I'll review the tests later.

@gluxon
Copy link
Member Author

gluxon commented Jan 22, 2023

Latest push:

  • Moves all the forget resolutions logic into the forgetResolutionsOfAllPrevWantedDeps function. There might be a better name for this since the new function operates on the entire lockfile while the existing forgetResolutionsOfPrevWantedDeps function just operates on a single importer.
  • Adds another pnpm install to make sure the pnpm dedupe'ed lockfile is correct in tests.

// Run pnpm install one last time to ensure the deduped lockfile is in a good
// state. If so, the "pnpm install" command should pass successfully and not
// make any further edits to the lockfile.
await install.handler(opts)
expect(await readProjectLockfile()).toEqual(dedupedLockfile)

Overall it looks OK. I'll review the tests later.

Thanks for reviewing!

---
"@pnpm/plugin-commands-installation": minor
pnpm: minor
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnpm/core should also be bumped.

Copy link
Member Author

@gluxon gluxon Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"integrity": "sha512-XRsRjdf+j5ml+y/6GKHPZbrF/8p2Yga0JPtdqTIY2Xe5ohJPD9saDJJLPvp9+NSBprVvevdXZybnj2cv8OEd0A==",
},
},
"/punycode/2.3.0": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that two versions of punycode are present? And both are v2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the other version comes in from the earlier version of uri-js@4.2.2, which also appears twice in the lockfile before deduping.

❯ pnpm view uri-js@4.2.2

# Omitted for brevity

dependencies:
punycode: ^2.1.0 

The earlier version of uri-js@4.2.2 is brought in by the earlier version of ajv.

❯ pnpm view ajv@6.10.2  

# Omitted for brevity

dependencies:
fast-deep-equal: ^2.0.1
fast-json-stable-stringify: ^2.0.0
json-schema-traverse: ^0.4.1
uri-js: ^4.2.2

Fortunately there's only one version of uri-js and punycode after deduping. πŸ™‚

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to test that if uri-js@4.2.2 and uri-js@4.4.1 are present in the lockfile, that uri-js@4.2.2 dedupes to use punycode@2.3.0. I'll add that test now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could clarify one more thing here: The earlier versions of ajv's dependencies appear because I copied the with-peer fixture.

"name": "with-peer",
"version": "1.0.0",
"dependencies": {
"ajv": "^6.10.2",

I then added a new workspace package that depends on the later version of ajv of that same major version. That should simulate what happens in a lot of monorepos, where a developer adds a later version of a dependency that already existed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to test that if uri-js@4.2.2 and uri-js@4.4.1 are present in the lockfile, that uri-js@4.2.2 dedupes to use punycode@2.3.0. I'll add that test now.

Done in 8010d61

@gluxon
Copy link
Member Author

gluxon commented Jan 23, 2023

Latest push

  • Refactors the test to use jest-diff rather than serializing the lockfile before/after. I think this is much clearer. 832e6f8
  • Adds a smaller scale test when no edits to importers are needed: 8010d61

@zkochan zkochan merged commit e8f6ab6 into pnpm:main Jan 23, 2023
@gluxon gluxon deleted the dedupe branch January 23, 2023 17:05
@gluxon
Copy link
Member Author

gluxon commented Jan 23, 2023

Thanks! Please feel free to ping me if any issues related to this new command come up. I can take responsibility and troubleshoot in any threads.

@zkochan zkochan added this to the v7.26 milestone Jan 24, 2023
@ocavue
Copy link

ocavue commented Feb 1, 2023

@gluxon Thank you for implement this! I'm very happy to see that pnpm supports this feature natively. I will archive pnpm-deduplicate.

@gluxon
Copy link
Member Author

gluxon commented Feb 3, 2023

Thanks again for making pnpm-deduplicate @ocavue. It was really helpful for cleaning up one of my project's lockfile a few months ago. πŸš€

@kachkaev
Copy link

Cool stuff! I wonder if we can also run this command in --check mode for linting. Feature request here: #6105

@gluxon
Copy link
Member Author

gluxon commented Feb 19, 2023

@kachkaev I think something in that direction makes sense! Will comment on your issue with some of my thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants