Skip to content

Commit

Permalink
feat: cache npm metadata (#5491)
Browse files Browse the repository at this point in the history
**What's the problem this PR addresses?**
<!-- Describe the rationale of your PR. -->
<!-- Link all issues that it closes. (Closes/Resolves #xxxx.) -->

Resolving package metadata is slower than it has to be because, most
times, Yarn has already fetched it in the past, and some things can be
cached and reused.

This should improve performance in various cases (ranging from creating
new projects and cache-only-but-no-lockfile installs to `yarn up` when
no new versions are available), since the server can avoid resending the
response body if nothing has changed.

**How did you fix it?**
<!-- A detailed description of your implementation. -->

This PR makes Yarn cache npm package metadata inside
`<globalFolder>/npmMetadata/<cacheKey>/<registry>/<package>.json` when
`getPackageMetadata` is used.

If an exact version is requested, Yarn will return the metadata from
disk directly and avoid hitting the network altogether.

Otherwise, Yarn will set the `If-None-Match` & `If-Modified-Since`
headers using the `etag` & `last-modified` values that were cached
during previous requests. This tells the server to skip sending the
response body and just respond with `304`, making Yarn reuse the cached
metadata.

TODO:
- [x] Trim the cached metadata of unnecessary fields to decrease cache
size
- [x] Update benchmark scripts to make sure that they take the metadata
cache into account
- [x] Run more benchmarks
- [ ] Make `yarn cache clean` clean the npm metadata cache (different
PR)

**Checklist**
<!--- Don't worry if you miss something, chores are automatically
tested. -->
<!--- This checklist exists to help you remember doing the chores when
you submit a PR. -->
<!--- Put an `x` in all the boxes that apply. -->
- [X] I have read the [Contributing
Guide](https://yarnpkg.com/advanced/contributing).

<!-- See
https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released
for more details. -->
<!-- Check with `yarn version check` and fix with `yarn version check
-i` -->
- [X] I have set the packages that need to be released for my changes to
be effective.

<!-- The "Testing chores" workflow validates that your PR follows our
guidelines. -->
<!-- If it doesn't pass, click on it to see details as to what your PR
might be missing. -->
- [X] I will check that all automated PR checks pass before the PR gets
reviewed.

---------

Co-authored-by: Maël Nison <nison.mael@gmail.com>
  • Loading branch information
paul-soporan and arcanis committed Jun 24, 2023
1 parent 3eedcba commit 4897712
Show file tree
Hide file tree
Showing 13 changed files with 352 additions and 84 deletions.
44 changes: 44 additions & 0 deletions .pnp.cjs

Large diffs are not rendered by default.

39 changes: 39 additions & 0 deletions .yarn/versions/07d34d58.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
releases:
"@yarnpkg/cli": minor
"@yarnpkg/core": minor
"@yarnpkg/fslib": minor
"@yarnpkg/plugin-npm": minor

declined:
- "@yarnpkg/plugin-compat"
- "@yarnpkg/plugin-constraints"
- "@yarnpkg/plugin-dlx"
- "@yarnpkg/plugin-essentials"
- "@yarnpkg/plugin-exec"
- "@yarnpkg/plugin-file"
- "@yarnpkg/plugin-git"
- "@yarnpkg/plugin-github"
- "@yarnpkg/plugin-http"
- "@yarnpkg/plugin-init"
- "@yarnpkg/plugin-interactive-tools"
- "@yarnpkg/plugin-link"
- "@yarnpkg/plugin-nm"
- "@yarnpkg/plugin-npm-cli"
- "@yarnpkg/plugin-pack"
- "@yarnpkg/plugin-patch"
- "@yarnpkg/plugin-pnp"
- "@yarnpkg/plugin-pnpm"
- "@yarnpkg/plugin-stage"
- "@yarnpkg/plugin-typescript"
- "@yarnpkg/plugin-version"
- "@yarnpkg/plugin-workspace-tools"
- vscode-zipfs
- "@yarnpkg/builder"
- "@yarnpkg/doctor"
- "@yarnpkg/extensions"
- "@yarnpkg/libzip"
- "@yarnpkg/nm"
- "@yarnpkg/pnp"
- "@yarnpkg/pnpify"
- "@yarnpkg/sdks"
- "@yarnpkg/shell"
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ The following changes only affect people writing Yarn plugins:

### Installs

- Yarn now caches npm version metadata, leading to faster resolution steps and decreased network data usage.
- The `pnpm` linker avoids creating symlinks that lead to loops on the file system, by moving them higher up in the directory structure.
- The `pnpm` linker no longer reports duplicate "incompatible virtual" warnings.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ describe(`Commands`, () => {
await expect(run(`stage`, `-n`, {cwd: path})).resolves.toMatchObject({
stdout: [
`${npath.fromPortablePath(`${path}/.pnp.cjs`)}\n`,
`${npath.fromPortablePath(`${path}/.yarn/global/metadata/npm/b98544/localhost/no-deps.json`)}\n`,
`${npath.fromPortablePath(`${path}/.yarn/global/cache/no-deps-npm-1.0.0-cf533b267a-0.zip`)}\n`,
`${npath.fromPortablePath(`${path}/.yarn/cache/.gitignore`)}\n`,
`${npath.fromPortablePath(`${path}/.yarn/cache/no-deps-npm-1.0.0-cf533b267a-e0e60294c2.zip`)}\n`,
Expand Down
2 changes: 2 additions & 0 deletions packages/plugin-npm/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"dependencies": {
"@yarnpkg/fslib": "workspace:^",
"enquirer": "^2.3.6",
"lodash": "^4.17.15",
"semver": "^7.1.2",
"ssri": "^6.0.1",
"tslib": "^2.4.0"
Expand All @@ -20,6 +21,7 @@
"@yarnpkg/plugin-pack": "workspace:^"
},
"devDependencies": {
"@types/lodash": "^4.14.136",
"@types/semver": "^7.1.0",
"@types/ssri": "^6.0.1",
"@yarnpkg/core": "workspace:^",
Expand Down
16 changes: 6 additions & 10 deletions packages/plugin-npm/sources/NpmSemverResolver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,9 @@ export class NpmSemverResolver implements Resolver {
if (range === null)
throw new Error(`Expected a valid range, got ${descriptor.range.slice(PROTOCOL.length)}`);

const registryData = await npmHttpUtils.get(npmHttpUtils.getIdentUrl(descriptor), {
customErrorMessage: npmHttpUtils.customPackageError,
configuration: opts.project.configuration,
ident: descriptor,
jsonResponse: true,
const registryData = await npmHttpUtils.getPackageMetadata(descriptor, {
project: opts.project,
version: semver.valid(range.raw) ? range.raw : undefined,
});

const candidates = miscUtils.mapAndFilter(Object.keys(registryData.versions), version => {
Expand Down Expand Up @@ -127,11 +125,9 @@ export class NpmSemverResolver implements Resolver {
if (version === null)
throw new ReportError(MessageName.RESOLVER_NOT_FOUND, `The npm semver resolver got selected, but the version isn't semver`);

const registryData = await npmHttpUtils.get(npmHttpUtils.getIdentUrl(locator), {
customErrorMessage: npmHttpUtils.customPackageError,
configuration: opts.project.configuration,
ident: locator,
jsonResponse: true,
const registryData = await npmHttpUtils.getPackageMetadata(locator, {
project: opts.project,
version,
});

if (!Object.prototype.hasOwnProperty.call(registryData, `versions`))
Expand Down
6 changes: 2 additions & 4 deletions packages/plugin-npm/sources/NpmTagResolver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,8 @@ export class NpmTagResolver implements Resolver {
async getCandidates(descriptor: Descriptor, dependencies: unknown, opts: ResolveOptions) {
const tag = descriptor.range.slice(PROTOCOL.length);

const registryData = await npmHttpUtils.get(npmHttpUtils.getIdentUrl(descriptor), {
configuration: opts.project.configuration,
ident: descriptor,
jsonResponse: true,
const registryData = await npmHttpUtils.getPackageMetadata(descriptor, {
project: opts.project,
});

if (!Object.prototype.hasOwnProperty.call(registryData, `dist-tags`))
Expand Down
212 changes: 183 additions & 29 deletions packages/plugin-npm/sources/npmHttpUtils.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport} from '@yarnpkg/core';
import {MessageName, ReportError} from '@yarnpkg/core';
import {prompt} from 'enquirer';
import {URL} from 'url';
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport, structUtils, IdentHash, hashUtils, Project, miscUtils} from '@yarnpkg/core';
import {MessageName, ReportError} from '@yarnpkg/core';
import {Filename, PortablePath, ppath, toFilename, xfs} from '@yarnpkg/fslib';
import {prompt} from 'enquirer';
import pick from 'lodash/pick';
import {URL} from 'url';

import {Hooks} from './index';
import * as npmConfigUtils from './npmConfigUtils';
import {MapLike} from './npmConfigUtils';
import {Hooks} from './index';
import * as npmConfigUtils from './npmConfigUtils';
import {MapLike} from './npmConfigUtils';

export enum AuthType {
NO_AUTH,
Expand Down Expand Up @@ -33,7 +35,7 @@ export type Options = httpUtils.Options & RegistryOptions & {
* It doesn't handle 403 Forbidden, as the npm registry uses it when the user attempts
* a prohibited action, such as publishing a package with a similar name to an existing package.
*/
export async function handleInvalidAuthenticationError(error: any, {attemptedAs, registry, headers, configuration}: {attemptedAs?: string, registry: string, headers: {[key: string]: string} | undefined, configuration: Configuration}) {
export async function handleInvalidAuthenticationError(error: any, {attemptedAs, registry, headers, configuration}: {attemptedAs?: string, registry: string, headers: {[key: string]: string | undefined} | undefined, configuration: Configuration}) {
if (isOtpError(error))
throw new ReportError(MessageName.AUTHENTICATION_INVALID, `Invalid OTP token`);

Expand Down Expand Up @@ -64,15 +66,169 @@ export function getIdentUrl(ident: Ident) {
}
}

export type GetPackageMetadataOptions = Omit<Options, 'ident' | 'configuration'> & {
project: Project;

/**
* Warning: This option will return all cached metadata if the version is found, but the rest of the metadata can be stale.
*/
version?: string;
};

// We use 2 different caches:
// - an in-memory cache, to avoid hitting the disk and the network more than once per process for each package
// - an on-disk cache, for exact version matches and to avoid refetching the metadata if the resource hasn't changed on the server

const PACKAGE_METADATA_CACHE = new Map<IdentHash, Promise<PackageMetadata> | PackageMetadata>();

/**
* Caches and returns the package metadata for the given ident.
*
* Note: This function only caches and returns specific fields from the metadata.
* If you need other fields, use the uncached {@link get} or consider whether it would make more sense to extract
* the fields from the on-disk packages using the linkers or from the fetch results using the fetchers.
*/
export async function getPackageMetadata(ident: Ident, {project, registry, headers, version, ...rest}: GetPackageMetadataOptions): Promise<PackageMetadata> {
return await miscUtils.getFactoryWithDefault(PACKAGE_METADATA_CACHE, ident.identHash, async () => {
const {configuration} = project;

registry = normalizeRegistry(configuration, {ident, registry});

const registryFolder = getRegistryFolder(configuration, registry);
const identPath = ppath.join(registryFolder, `${structUtils.slugifyIdent(ident)}.json`);

let cached: CachedMetadata | null = null;

// We bypass the on-disk cache for security reasons if the lockfile needs to be refreshed,
// since most likely the user is trying to validate the metadata using hardened mode.
if (!project.lockfileNeedsRefresh) {
try {
cached = await xfs.readJsonPromise(identPath) as CachedMetadata;

if (typeof version !== `undefined` && typeof cached.metadata.versions[version] !== `undefined`) {
return cached.metadata;
}
} catch {}
}

return await get(getIdentUrl(ident), {
...rest,
customErrorMessage: customPackageError,
configuration,
registry,
ident,
headers: {
...headers,
// We set both headers in case a registry doesn't support ETags
[`If-None-Match`]: cached?.etag,
[`If-Modified-Since`]: cached?.lastModified,
},
wrapNetworkRequest: async executor => async () => {
const response = await executor();

if (response.statusCode === 304) {
if (cached === null)
throw new Error(`Assertion failed: cachedMetadata should not be null`);

return {
...response,
body: cached.metadata,
};
}

const packageMetadata = pickPackageMetadata(JSON.parse(response.body.toString()));

PACKAGE_METADATA_CACHE.set(ident.identHash, packageMetadata);

const metadata: CachedMetadata = {
metadata: packageMetadata,
etag: response.headers.etag,
lastModified: response.headers[`last-modified`],
};

// We append the PID because it is guaranteed that this code is only run once per process for a given ident
const identPathTemp = `${identPath}-${process.pid}.tmp` as PortablePath;

await xfs.mkdirPromise(registryFolder, {recursive: true});
await xfs.writeJsonPromise(identPathTemp, metadata, {compact: true});

// Doing a rename is important to ensure the cache is atomic
await xfs.renamePromise(identPathTemp, identPath);

return {
...response,
body: packageMetadata,
};
},
});
});
}

type CachedMetadata = {
metadata: PackageMetadata;
etag?: string;
lastModified?: string;
};

export type PackageMetadata = {
'dist-tags': Record<string, string>;
versions: Record<string, any>;
};

const CACHED_FIELDS = [
`name`,

`dist.tarball`,

`bin`,
`scripts`,

`os`,
`cpu`,
`libc`,

`dependencies`,
`dependenciesMeta`,
`optionalDependencies`,

`peerDependencies`,
`peerDependenciesMeta`,
];

function pickPackageMetadata(metadata: PackageMetadata): PackageMetadata {
return {
'dist-tags': metadata[`dist-tags`],
versions: Object.fromEntries(Object.entries(metadata.versions).map(([key, value]) => [
key,
pick(value, CACHED_FIELDS),
])),
};
}

/**
* Used to invalidate the on-disk cache when the format changes.
*/
const CACHE_KEY = hashUtils.makeHash(...CACHED_FIELDS).slice(0, 6);

function getRegistryFolder(configuration: Configuration, registry: string) {
const metadataFolder = getMetadataFolder(configuration);

const parsed = new URL(registry);
const registryFilename = toFilename(parsed.hostname);

return ppath.join(metadataFolder, CACHE_KEY as Filename, registryFilename);
}

function getMetadataFolder(configuration: Configuration) {
return ppath.join(configuration.get(`globalFolder`), `metadata/npm`);
}

export async function get(path: string, {configuration, headers, ident, authType, registry, ...rest}: Options) {
if (ident && typeof registry === `undefined`)
registry = npmConfigUtils.getScopeRegistry(ident.scope, {configuration});
registry = normalizeRegistry(configuration, {ident, registry});

if (ident && ident.scope && typeof authType === `undefined`)
authType = AuthType.BEST_EFFORT;

if (typeof registry !== `string`)
throw new Error(`Assertion failed: The registry should be a string`);

const auth = await getAuthenticationHeader(registry, {authType, configuration, ident});
if (auth)
headers = {...headers, authorization: auth};
Expand All @@ -87,11 +243,7 @@ export async function get(path: string, {configuration, headers, ident, authType
}

export async function post(path: string, body: httpUtils.Body, {attemptedAs, configuration, headers, ident, authType = AuthType.ALWAYS_AUTH, registry, otp, ...rest}: Options & {attemptedAs?: string}) {
if (ident && typeof registry === `undefined`)
registry = npmConfigUtils.getScopeRegistry(ident.scope, {configuration});

if (typeof registry !== `string`)
throw new Error(`Assertion failed: The registry should be a string`);
registry = normalizeRegistry(configuration, {ident, registry});

const auth = await getAuthenticationHeader(registry, {authType, configuration, ident});
if (auth)
Expand Down Expand Up @@ -123,11 +275,7 @@ export async function post(path: string, body: httpUtils.Body, {attemptedAs, con
}

export async function put(path: string, body: httpUtils.Body, {attemptedAs, configuration, headers, ident, authType = AuthType.ALWAYS_AUTH, registry, otp, ...rest}: Options & {attemptedAs?: string}) {
if (ident && typeof registry === `undefined`)
registry = npmConfigUtils.getScopeRegistry(ident.scope, {configuration});

if (typeof registry !== `string`)
throw new Error(`Assertion failed: The registry should be a string`);
registry = normalizeRegistry(configuration, {ident, registry});

const auth = await getAuthenticationHeader(registry, {authType, configuration, ident});
if (auth)
Expand Down Expand Up @@ -159,11 +307,7 @@ export async function put(path: string, body: httpUtils.Body, {attemptedAs, conf
}

export async function del(path: string, {attemptedAs, configuration, headers, ident, authType = AuthType.ALWAYS_AUTH, registry, otp, ...rest}: Options & {attemptedAs?: string}) {
if (ident && typeof registry === `undefined`)
registry = npmConfigUtils.getScopeRegistry(ident.scope, {configuration});

if (typeof registry !== `string`)
throw new Error(`Assertion failed: The registry should be a string`);
registry = normalizeRegistry(configuration, {ident, registry});

const auth = await getAuthenticationHeader(registry, {authType, configuration, ident});
if (auth)
Expand Down Expand Up @@ -194,6 +338,16 @@ export async function del(path: string, {attemptedAs, configuration, headers, id
}
}

function normalizeRegistry(configuration: Configuration, {ident, registry}: Partial<RegistryOptions>): string {
if (typeof registry === `undefined` && ident)
return npmConfigUtils.getScopeRegistry(ident.scope, {configuration});

if (typeof registry !== `string`)
throw new Error(`Assertion failed: The registry should be a string`);

return registry;
}

async function getAuthenticationHeader(registry: string, {authType = AuthType.CONFIGURATION, configuration, ident}: {authType?: AuthType, configuration: Configuration, ident: RegistryOptions['ident']}) {
const effectiveConfiguration = npmConfigUtils.getAuthConfiguration(registry, {configuration, ident});
const mustAuthenticate = shouldAuthenticate(effectiveConfiguration, authType);
Expand Down Expand Up @@ -242,7 +396,7 @@ function shouldAuthenticate(authConfiguration: MapLike, authType: AuthType) {
}
}

async function whoami(registry: string, headers: {[key: string]: string} | undefined, {configuration}: {configuration: Configuration}) {
async function whoami(registry: string, headers: {[key: string]: string | undefined} | undefined, {configuration}: {configuration: Configuration}) {
if (typeof headers === `undefined` || typeof headers.authorization === `undefined`)
return `an anonymous user`;

Expand Down
4 changes: 2 additions & 2 deletions packages/yarnpkg-core/sources/Plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ export interface Hooks {
* add some logging.
*/
wrapNetworkRequest?: (
executor: () => Promise<any>,
executor: () => Promise<httpUtils.Response>,
extra: WrapNetworkRequestInfo
) => Promise<() => Promise<any>>;
) => Promise<() => Promise<httpUtils.Response>>;

/**
* Called before the build, to compute a global hash key that we will use
Expand Down

0 comments on commit 4897712

Please sign in to comment.