Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time-based resolution mode #2

Merged
merged 3 commits into from Sep 2, 2022
Merged

Time-based resolution mode #2

merged 3 commits into from Sep 2, 2022

Conversation

zkochan
Copy link
Member

@zkochan zkochan commented Aug 16, 2022

To read it in rendered view: https://github.com/pnpm/rfcs/blob/time-base-resolution-mode/text/0002-time-based-resolution-mode.md

Created a discussion in Verdaccio to support a new metadata format: verdaccio/verdaccio#3315

Copy link

@d3lm d3lm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We should implement a PoC and see how it goes and if we run into issues during development.


Resolution will be divided into two stages. The first stage will resolve all the direct dependencies of all the workspace projects. This stage may work the same way as it works now (highest version is picked). When all the direct dependencies are resolved, we check when were the picked versions released. This information is present in the package metadata at the "time" field. For example, if we install webpack and eslint, we get webpack resolved to v5.74.0 and eslint resolved to v8.22.0. `webpack@5.74.0` was released at "2022-07-25T08:00:33.823Z". `eslint@8.22.0` was released at "2022-08-14T01:23:41.730Z". Now we compare the dates of each released packages and pick the nearest date. In this case, the nearest date is the date eslint was released: "2022-08-14T01:23:41.730Z". Let's call this date T.

At the second stage, we resolve all the subdependencies. At this stage, instead of resolving a range to the highest available version, we filter out any versions that were released after T and pick the highest version from those. Let's say we need to resolve `ajv@^6.10.0`. We already have a metadata of ajv in cache and it was saved after T, so we don't need to redownload it. This are the versions of ajv that match `^6.10.0`:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some monorepos packages are published in random order. So it might happen that a subdep was released later than the parent dep. Maybe we should add 10 minutes to the T.

Copy link

@d3lm d3lm Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, adding some delta might be a good idea. In the most common cases it will still not fetch the metadata from sub deps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a config-level setting to prefer the minimum version that matches the set of merged specifiers, rather than the maximum, for the sake of stability? With how most monorepos operate, this would also provide the same guarantee, because it is common to publish with other-workspace-package: ^current in dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this. It doesn't work as you would expect. There are many projects that update the dependencies but keep the old ranges. So the "current" doesn't really work anymore with the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is hard or not possible to implement with pnpm as I have stated in the alternatives section.

@zkochan
Copy link
Member Author

zkochan commented Aug 20, 2022

I have implemented a PoC for this resolution mode: pnpm/pnpm#5238

It works well but there is a bottleneck. The npm registry currently allows to fetch the full metadata of a package or a compact metadata. The compact metadata doesn't contain the "time" field. Hence, we can only implement this resolution mode using the full metadata. But that slows down installation almost twice.

I have tested this resolution mode with a hot cache (a cache created from full metadata obfuscated after download) and this resolution mode is proven to be faster than the current resolution mode used by pnpm

before

Benchmark 1: pnpm i --lockfile-only
  Time (mean ± σ):     12.794 s ±  3.560 s    [User: 12.259 s, System: 1.660 s]
  Range (min … max):   11.092 s … 22.805 s    10 runs

after

Benchmark 1: pnpm i --lockfile-only
  Time (mean ± σ):     11.489 s ±  0.124 s    [User: 13.898 s, System: 1.858 s]
  Range (min … max):   11.320 s … 11.663 s    10 runs

So it definitely makes sense to ship such option but in order to make it the default one, we will have to cooperate with a registry to include the "time" in the compact metadata.

@zkochan
Copy link
Member Author

zkochan commented Aug 20, 2022

npm has a before option and when it is used, npm CLI requests the full metadata, making installation slower. So it may be in npm's interest to support such new metadata format that includes "time"

@zkochan
Copy link
Member Author

zkochan commented Aug 20, 2022

cc @pnpm/collaborators

@zkochan
Copy link
Member Author

zkochan commented Aug 25, 2022

@juanpicado has created a branch in the Verdaccio repository, where the "abbreviated" (the non-full metadata) contains the "time" field. This has allowed me to test the new resolution mode and run some benchmarks.

With the new resolution mode, installation is a bit slower with cold cache:

new resolution mode:

❯ hyperfine --prepare="rm -rf .pnpm-cache pnpm-lock.yaml node_modules" "/home/zoli/src/pnpm/pnpm/packages/artifacts/linux-x64/pnpm i --ignore-scripts"
  Time (mean ± σ):      8.457 s ±  2.596 s    [User: 5.094 s, System: 1.644 s]
  Range (min … max):    6.821 s … 15.578 s    10 runs

old resolution mode:

❯ hyperfine --prepare="rm -rf .pnpm-cache pnpm-lock.yaml node_modules" "pnpm i --ignore-scripts"
  Time (mean ± σ):      8.293 s ±  3.024 s    [User: 5.086 s, System: 1.579 s]
  Range (min … max):    6.608 s … 16.753 s    10 runs

However, installation is a LOT faster with hot cache:

New resolution mode:

❯ hyperfine --prepare="rm -rf pnpm-lock.yaml node_modules" "/home/zoli/src/pnpm/pnpm/packages/artifacts/linux-x64/pnpm i --ignore-scripts"
  Time (mean ± σ):      4.249 s ±  0.421 s    [User: 3.854 s, System: 1.422 s]
  Range (min … max):    3.755 s …  4.785 s    10 runs

Old resolution mode:

❯ hyperfine --prepare="rm -rf pnpm-lock.yaml node_modules" "pnpm i --ignore-scripts"
  Time (mean ± σ):      7.236 s ±  1.114 s    [User: 4.909 s, System: 1.587 s]
  Range (min … max):    6.548 s …  9.746 s    10 runs

@octogonz
Copy link
Member

Suppose a company is concerned about NPM versions that might contain malware, maybe because the publishing token was compromised. In such an incident, eventually someone from the community will discover the malware, and report it, and then the bad version will get deprecated/removed from npmjs.com. In other words, if a version still has NOT been deprecated after a week or two, the risk of malware is much lower versus the day when it was first published.

This suggests a simple strategy to reduce exposure to malware, by introducing a "waiting period" before new versions are installed. An NPM registry proxy can implement such a policy. But could PNPM provide this feature using similar logic as time-based resolution?

@zkochan
Copy link
Member Author

zkochan commented Aug 26, 2022

@octogonz we may add something like a minAge setting (it won't be hard to do). But we still need a change on the registry side. We need "time" to be included in the non-full metadata.

@zkochan zkochan merged commit 75f49cc into main Sep 2, 2022
@zkochan zkochan deleted the time-base-resolution-mode branch September 2, 2022 11:54
@zkochan
Copy link
Member Author

zkochan commented Sep 4, 2022

This is released in v7.10

@nathanhammond
Copy link

nathanhammond commented Sep 6, 2022

Worth mentioning that the resolution strategy here (resolving transitives to be "under" the time of the most-recent direct) will constrain the ability to fix-forward a release by updating a package that would be a transitive to the installing project.

// my-app/package.json
{
  "dependencies": {
    "ember-cli": "^3.1.0",
  }
}
// ember-cli/package.json
{
  "version": "3.1.0",
  "dependencies": {
    "ember-cli-broccoli": "^1.1.0"
  }
}

Use-case:

  1. I publish ember-cli@3.1.0. It depends on ember-cli-broccoli@^1.1.0.
  2. One week later it is discovered that ember-cli@3.1.0 has an issue with integrating ember-cli-broccoli@1.1.0.
  3. The issue is resolved by releasing ember-cli-broccoli@1.1.1.
  4. pnpm install of my-app which depends on ember-cli@3.1.0 will continue using ember-cli-broccoli@1.1.0 in time mode because (without further updates to my-app/package.json) the metadata for ember-cli-broccoli@1.1.0 continues to match the time window constraint.

This type of issue matters most particularly for generators like ember-cli, create-next-app, and anything that repeatedly instantiates projects with a specified package.json.

Workarounds:

  • Bump the direct dependency version number. It's unrealistic to require a version number update any time a transitive dependency updates (there can be hundreds of them, dozens of levels deep) so this is a nonstarter.
  • Don't use time mode.
  • ???

@zkochan
Copy link
Member Author

zkochan commented Sep 6, 2022

Bump the direct dependency version number. It's unrealistic to require a version number update any time a transitive dependency updates (there can be hundreds of them, dozens of levels deep) so this is a nonstarter.

Why? If a fix is important, that is what should happen. Even now there is no guarantee that a transitive dependency will be updated as the deduplication algorithm might pick a lower version.

@nathanhammond
Copy link

The way in which this issue will manifest itself is particularly at issue for code generators (which is why my examples are of a code generator). Extending the previous example, as the author of ember-cli I can plainly know how things will resolve even in the context of the deduplication algorithm. It will be 100% reproducible in CI tests.

  1. ember new my-app (&& cd my-app && pnpm install) after the release of ember-cli-broccoli@1.1.1 will function identically on CI and end-user's projects when not using time mode.
  2. If fixing ember-cli-broccoli and releasing it at v1.1.1 resolves CI, it will 100% resolve all issues for end-users generated projects. It is not dependent upon the deduplication algorithm.

However, if, a user runs:

pnpx create-react-app@latest my-app
cd my-app
pnpm install --use-time-mode

The local cache will create a situation of "accidentally locked all projects created on this box to an old version of a transitive" situation until one of the directs updates, triggering an update cascade. This is especially problematic for something like create-react-app which goes to great lengths to push as many dependencies as possible into transitives:

  "dependencies": {
    "@testing-library/jest-dom": "^5.16.5",
    "@testing-library/react": "^13.4.0",
    "@testing-library/user-event": "^13.5.0",
    "react": "^18.2.0",
    "react-dom": "^18.2.0",
    "react-scripts": "5.0.1",
    "web-vitals": "^2.1.4"
  },

This would be bonus "fun" in CI for the authors of code generators; we'd need to opt out of the metadata cache in order to ensure that CI pulls up-to-date versions of everything.


Setting aside whether or not you believe the strategy is reasonable, bumping just the transitive has been an explicit strategy for fix-forward releases within the Ember community for nearly a decade—especially as it doesn't require any user action. The left-pad debacle was also resolved this same way. This breaks that strategy.

@zkochan
Copy link
Member Author

zkochan commented Sep 6, 2022

I have been using this feature for a week in several projects. I had no issues yet. It is opt-in for now.

I understand your points, but I think you don't realize how rarely dependencies are updated in enterprise applications. The workflow you describe was already "broken" with the introduction of lockfiles.

If this resolution mode will cause issues, we may add some minimum time for the threshold. Like metadata files should not be older than a week (or even a day, we can make it configurable).

@larixer
Copy link

larixer commented Sep 6, 2022

The workflow you describe was already "broken" with the introduction of lockfiles.

It would be nice to see the lockfiles abandoned altogether and superseded by the time based resolution feature, does it sounds realistic?

@zkochan
Copy link
Member Author

zkochan commented Sep 6, 2022

I think a lockfile is still useful due to a few reasons.

A lockfile makes installation faster.

A lockfile also stores the checksum of each package. So if someone publishes new content to the same version, you will get an error. This is more secure than no lockfile.

This resolution mode doesn't cover all dependency sources. For instance, a lockfile also locks githosted dependencies.

@nathanhammond
Copy link

nathanhammond commented Sep 6, 2022

This workflow is also not broken via lockfiles. For example create-react-app, ember-cli, create-turbo, and the other code generators I'm aware of emit a package.json, but they explicitly do not emit a lockfile. The first install inside of the generated application will generate a new, fresh lockfile for that installation.

Subsequent installations on the same machine using this time-based resolution strategy will "lock" those transitive dependencies via the metadata cache—even though these generators explicitly opted against a lockfile and specified floating versions. Yes, the package resolution is still valid, but it does not allow users to get security/bugfix updates when generating a new project and breaks established workflows.

This issue is explicitly not with dependency updates, or enterprise usage patterns (though a failure scenario could be constructed for these). The issue presents most-significantly with project generators that emit a package.json.

It could become the responsibility of every single generator application to help people avoid this trap—but first installs are where the metadata cache get you the largest wins. If the entire community of generators is guiding against using this (two I maintain, create-turbo and ember-cli would be) the performance advantages of this strategy are reduced tremendously.

Template repositories that do not include lockfiles also suffer from this same problem and generally do not execute installation for their users so they don't have a hook to bypass the metadata cache—which increases the teaching costs for pnpm.


Alright, left-pad issue is resolved, all newly-generated projects will work with no changes. Existing broken projects need to re-run npm install. pnpm users: make sure you clear your metadata cache or your newly-generated projects will still be broken.

@zkochan
Copy link
Member Author

zkochan commented Sep 6, 2022

make sure you clear your metadata cache or your newly-generated projects will still be broken.

Clearing the metadata cache is not enough. One of the direct dependencies should be updated and if the new version of the direct dependency has a newer publish date, then the transient dependency might be updated (the transient dep should still have an older publish date than the publish date of the newest direct dependency).

I understand the disadvantages but I still think the advantages outweigh them. Speed is not the only reason for this change. Decentralization and security are also two important points. With good caching a third party registry like Verdaccio can work more effectively. We can also implement effective server side resolution.

And from security perspective, as you may have noticed in the past year there were several incidents where the maintainers of transient dependencies have published malware in their packages. This doesn't protect the users completely from such updates but it reduces the chance by delaying the update.

We could make an exception for pnpm create but I would personally vote for more security.

@nathanhammond
Copy link

Oh, this is stricter than I originally understood. It's not just a cache; it's an extension to the resolution algorithm.

This resolution mode entirely disables floating versions on transitive dependencies from being upgraded beyond the most-recent direct dependency release date.

It prevents leaf-node packages from having any hope of keeping their package up-to-date and relies on direct dependencies (e.g. one of those seven dependencies installed by create-react-app) to bump a version every single time there is a security vulnerability or bugfix anywhere in their dependency graph. (As a generator maintainer I assure you that I don't have time to keep up with this.)

pnpx create-react-app ./my-app
> added 1394 packages in 26s

Current state is actually decent, only two days out of date:

npm view "@testing-library/jest-dom" time.modified --json
npm view "@testing-library/react" time.modified --json
npm view "@testing-library/user-event" time.modified --json
npm view "react" time.modified --json
npm view "react-dom" time.modified --json
npm view "react-scripts" time.modified --json
npm view "web-vitals" time.modified --json

"2022-08-04T21:47:42.603Z"
"2022-09-04T16:52:40.753Z"
"2022-08-09T15:25:09.793Z"
"2022-09-02T16:14:03.823Z"
"2022-09-02T16:14:12.918Z"
"2022-06-26T08:04:11.784Z"
"2022-08-31T20:10:10.678Z"

But, if you dive into the full set of release times, you'll find that none of the directs updated between August 4 - August 23, and June 26 - July 14.

No package I know of releases a new version every time a transitive dependency updates, and with 1394 packages in a standard create-react-app installation that would probably be daily. So, if security is the angle...

We need to weigh:

  • The frequency of security issues and bugfixes in extant packages.
  • The ability to get security and bugfix updates to end-users.

Against:

  • The frequency of attacks by releasing a new version of a package. (e.g. registry account compromise, PR with not-fully-understood behavior, untrustworthy maintainer)
  • Alternative methods to reduce exposure to "newly-released version" supply chain attacks. (Note that I don't believe that "just wait longer" is a viable answer; if everybody waits 7 days then the attack just takes longer. A real solution, shared auditing of packages, is too expensive.)

Without data I can't be certain which of these is more-likely to be worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants