Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Improving Jest performance with a custom resolver #11034

Closed
lencioni opened this issue Jan 27, 2021 · 20 comments
Closed

[discussion] Improving Jest performance with a custom resolver #11034

lencioni opened this issue Jan 27, 2021 · 20 comments

Comments

@lencioni
Copy link
Contributor

lencioni commented Jan 27, 2021

I work in a fairly large codebase (~80,000 TypeScript files plus ~250,000 files in node_modules), and I recently improved a performance issue we were seeing with Jest and wanted to share my findings and code somewhere that others may find useful. Please feel free to close this issue.

We are currently using Jest features like --findRelatedTests or --changedSince in some places in our CI setup. I wanted to see the list of test files, so I tried this out locally with the --listTests flag and noticed that it was very slow (~150 seconds).

Here's a CPU profile I took in devtools:

image

Zooming in, that very large block all looks basically just like this:

image

I discovered that almost all of the time was spent in the loadpkg function of the resolve package, which Jest uses as part of its default resolver: https://github.com/facebook/jest/blob/baf9f9937720e87d2b2bd09f4c053fa4f16424ec/packages/jest-resolve/src/defaultResolver.ts#L44-L54

image

Unfortunately, the resolve package is not very efficient for this. This package will attempt to look for package.json files in every directory, and repeatedly read and parse package.json files, for every file that it needs to resolve: https://github.com/browserify/resolve/blob/4bece07740878577be9570efe47fde66d289b5ff/lib/sync.js#L123-L147

We have some guarantees in our repo, e.g. we only have one package.json file outside of node_modules and we don't rely on any of its fields for module resolution, and we also don't expect any of these files to change within a single run. Therefore, we can make this a lot more efficient by avoiding the resolve package for files outside of node_modules (i.e. doing a custom resolution) and caching the results. Since we have ~80,000 TypeScript files in our repo, and only one package.json file that happens to be irrelevant to how these are resolved, this means we might be reading that same package.json file off disk and calling JSON.parse on its contents ~80,000 times more than is actually necessary.

To ensure that the caching doesn't make watch mode confusing when the file system changes while watch mode is running, I added a watch plugin to clear the caches when the filesystem changes. This replicates the default resolver's integration with watch mode: https://github.com/facebook/jest/blob/7edfb105f3f377434d30e143a4dbcc86e541b361/packages/jest-core/src/watch.ts#L288-L289

This resolver reduces my original --listTests --findRelatedTests command from ~150s to ~15s.

Here's my code. Most of the speed improvement is due to avoiding the resolve package for our files outside of node_modules, with the additional full resolution caching adding a little bit of speed improvement on top.

// custom-resolver.js
const fs = require('graceful-fs');
const path = require('path');

const IPathType = {
  FILE: 1,
  DIRECTORY: 2,
  OTHER: 3,
};

const checkedPaths = new Map();

function statSyncCached(filePath) {
  const result = checkedPaths.get(filePath);
  if (result !== undefined) {
    return result;
  }

  let stat;
  try {
    stat = fs.statSync(filePath);
  } catch (e) {
    if (!(e && (e.code === 'ENOENT' || e.code === 'ENOTDIR'))) {
      throw e;
    }
  }

  if (stat) {
    if (stat.isFile() || stat.isFIFO()) {
      checkedPaths.set(filePath, IPathType.FILE);
      return IPathType.FILE;
    }

    if (stat.isDirectory()) {
      checkedPaths.set(filePath, IPathType.DIRECTORY);
      return IPathType.DIRECTORY;
    }
  }

  checkedPaths.set(filePath, IPathType.OTHER);
  return IPathType.OTHER;
}

function isFile(filePath) {
  return statSyncCached(filePath) === IPathType.FILE;
}

function isDirectory(dir) {
  return statSyncCached(dir) === IPathType.DIRECTORY;
}

const resolverCache = new Map();

const defaultModuleDirectory = ['node_modules'];

// Jest's default resolver uses the `resolve` npm package, which is woefully
// inefficient. The resolve package will attempt to look for package.json files
// in every directory, and repeatedly read and parse package.json files, for
// every file that it needs to resolve. Since we have some guarantees here, like
// we only have one package.json file for our repo, and we don't expect these
// files to change within a single run, we can make this a lot more efficient by
// doing a custom resolution and caching the results.
module.exports = function customResolver(request, options) {
  const isRequestAbs = request.startsWith('/');

  const cacheKey = isRequestAbs
    ? // If the request is an absolute path, we don't need to include the
      // basedir in the cache key, since it will not affect the resolution
      request
    : `${options.basedir};${request}`;

  const resolverCachedResult = resolverCache.get(cacheKey);
  if (resolverCachedResult !== undefined) {
    return resolverCachedResult;
  }

  const moduleDirectory = options.moduleDirectory || defaultModuleDirectory;
  const isInModuleDirectory = moduleDirectory.some((dir) =>
    isRequestAbs
      ? // If the request is an absolute path, we don't need to check the
        // basedir
        request.includes(`/${dir}/`)
      : options.basedir.includes(`/${dir}/`) || request.includes(`/${dir}/`),
  );

  // Local request paths can be of the form like "../foo" or
  // "/Users/foo/repo/bar", so we need to make sure that all of our bases
  // are covered.
  const isLocalFile = !isInModuleDirectory && (request.startsWith('.') || isRequestAbs);

  if (isLocalFile) {
    // This is a local file, so we want to do a custom fast resolution.

    const absPath = isRequestAbs ? request : path.resolve(options.basedir, request);

    if (isFile(absPath)) {
      // Path exists and is a file, so we are done here
      resolverCache.set(cacheKey, absPath);
      return absPath;
    }

    // If the file was not resolved using the extensions, and the absolute path is a directory,
    // we also want to check the index in the directory.
    const pathsToCheck = isDirectory(absPath) ? [absPath, `${absPath}/index`] : [absPath];

    // Prevent resolve from trying to look up, read, and parse the package.json
    // for this file, to improve performance.
    for (let i = 0; i < pathsToCheck.length; i++) {
      for (let j = 0; j < options.extensions.length; j++) {
        const resolvedPathWithExtension = `${pathsToCheck[i]}${options.extensions[j]}`;

        if (isFile(resolvedPathWithExtension)) {
          resolverCache.set(cacheKey, resolvedPathWithExtension);
          return resolvedPathWithExtension;
        }
      }
    }

    throw new Error(`Could not resolve module ${request} from ${options.basedir}`);
  }

  const defaultResolverResult = options.defaultResolver(request, options);
  resolverCache.set(cacheKey, defaultResolverResult);
  return defaultResolverResult;
};

module.exports.clearCache = function clearCache() {
  checkedPaths.clear();
  resolverCache.clear();
};
// custom-resolver-watch-plugin.js
const { clearCache } = require('./custom-resolver');

module.exports = class CustomResolverWatchPlugin {
  apply(jestHooks) {
    let isFirstRun = true;

    jestHooks.onFileChange(() => {
      // Clear the resolver cache whenever the filesystem changes so that files
      // will be correctly resolved again.

      if (isFirstRun) {
        // This is triggered when the watcher first starts up, but we don't
        // actually need to clear the cache at this time. We can get a small
        // speed benefit by skipping this on the first run.
        isFirstRun = false;
      } else {
        clearCache();
      }
    });
  }
};
@lencioni
Copy link
Contributor Author

Oh, also I think there may be an opportunity here to improve the performance of Jest's default resolver.

The resolver is currently synchronous and therefore uses resolve's synchronous method. However, I noticed that resolve's async method has a package option that allows you to provide package.json data applicable to the module being resolved. I assume that resolve will skip reading and running JSON.parse on the package.json files if this data is provided up front.

If this is the case, Jest's resolver could be made asynchronous to take advantage of this, and add some package.json lookup logic with a bit of caching built in so that the package data can be provided to resolve. I think this will matter most for packages that have multiple files that need to be resolved, and may obviate the need for my custom resolver entirely.

@ahnpnl
Copy link
Contributor

ahnpnl commented Jan 29, 2021

I'm curious if TypeScript module resolution is better than resolve npm package. Since TypeScript module resolution has 2 modes: classic and node, it can also be used as an option in combination with the improvement above.

The downside is that, users are forced to install TypeScript unless another resolver package is introduced separately from current package.

@SimenB
Copy link
Member

SimenB commented Jan 29, 2021

I'm surprised there's not more caching here. E.g. reading and parsing the same json file multiple times seems unnecessary. Also, crawling the file system each time resolution happens seems overkill - some short circuiting based on paths and parent paths seems like it'd solve it to at least only read/parse package.json once per directory in the project.

We can implement readFile ourselves and cache the result, but I'm not sure we can avoid calling that function every time?

/cc @ljharb

@ljharb
Copy link
Contributor

ljharb commented Jan 29, 2021

resolve, just like node (and tbh jest also), can not safely assume that the filesystem is unmodified between resolutions.

An individual app, however, certainly can assume this.

Perhaps resolve could add an option that does assume this, and jest could as well, and then an individual app could pass that option through?

@SimenB
Copy link
Member

SimenB commented Jan 29, 2021

Within a single test run I think we can make that assumption. But maybe not... Might make sense to allow users to modify this.

Adding that capability to resolve so we can pass an option to enable it would be pretty sweet 👍

@ljharb
Copy link
Contributor

ljharb commented Jan 29, 2021

I’d be happy to accept a PR :-) any time i have to work on resolve directly will be focused on exports support.

@lencioni
Copy link
Contributor Author

This might be a relevant starting point if someone wants to pick it up: browserify/resolve#186

@SimenB
Copy link
Member

SimenB commented Jan 29, 2021

One last comment - any API in resolve that does caching should have a "clear cache" API. resolve itself cannot know when it can (should?) clear caches, so it needs an API for it. (in Jest's case it'll probably clear between test runs by default, but others probably have other requirements)

@ljharb
Copy link
Contributor

ljharb commented Jan 29, 2021

The other alternative would be a wrapper package that uses readFile and handles the caching, which I might like better than making resolve stateful.

@SimenB
Copy link
Member

SimenB commented Jan 30, 2021

I opened up browserify/resolve#236 which adds readPkg which should allow consumers (like Jest) to cache the JSON themselves.

A further improvement might be adding a findClosestPackageFile or something. By default recursing up the file system (like resolve does today), but in theory somebody could add some caching on top of that. isFile and isDirectory can already be cached though, so I'm not sure how much impact that would have in practice

@SimenB
Copy link
Member

SimenB commented Feb 11, 2021

Above PR has been released, here's a PR using that: #11076

@SimenB
Copy link
Member

SimenB commented Feb 19, 2021

^ released in 27.0.0-next.3

@SimenB
Copy link
Member

SimenB commented Feb 19, 2021

I'd like to investigate the aforementioned findClosestPackageFile before closing. I'm not sure how much of a difference it'll make for Jest by default though as we cannot skip the stat calls, and they are already cached

@lencioni
Copy link
Contributor Author

Agreed, I think it would only be faster if the paths could be determined quickly ahead of time and then looked up in a Set or something in the resolver. Do you happen to know if the haste map knows about package.json files and if that is accessible by the resolver? If so, maybe that information could be used here and would give a speed boost.

And speaking of haste map, if we can use that for package.json perhaps we could also use it for isFile?

@SimenB
Copy link
Member

SimenB commented Feb 19, 2021

Haste map has all user files, but not files in node_modules

@lencioni
Copy link
Contributor Author

Gotcha. That might still be worth doing though for user files.

@ahnpnl
Copy link
Contributor

ahnpnl commented Mar 3, 2021

Interesting, it seems like #11076 in next.3 made module resolution smarter for Angular package format that Jest no longer asks to transform Angular umd files. I haven’t checked yet with ESM mode but hopefully Jest doesn’t ask to transform Angular esm files either.

next.3 did help something with performance for Angular projects since umd files are not processed by transformer anymore. Thanks !

@github-actions
Copy link

github-actions bot commented Mar 3, 2022

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the Stale label Mar 3, 2022
@github-actions
Copy link

github-actions bot commented Apr 2, 2022

This issue was closed because it has been stalled for 7 days with no activity. Please open a new issue if the issue is still relevant, linking to this one.

@github-actions github-actions bot closed this as completed Apr 2, 2022
@github-actions
Copy link

github-actions bot commented May 3, 2022

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants