Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: improve loader performance #641

Open
Acring opened this issue Dec 1, 2022 · 2 comments
Open

Question: improve loader performance #641

Acring opened this issue Dec 1, 2022 · 2 comments

Comments

@Acring
Copy link

Acring commented Dec 1, 2022

I know that the goal of this library is to encapsulate the webpack loader of graphql code generator, but when our project is in use, we do feel that the webpack loader makes the startup of dev a little slow. Graphql-let cli will take nearly 80s to generate, but when dev is started, webpack loader still needs to spend more than 1 minute on graphql-let, which makes me feel unacceptable. It seems to me that graphql-let has generated the corresponding graphql.d.ts file and the code to call Apollo Client, so it shouldn't take so much time on the loader. Maybe I'm using it incorrectly, or do you have any suggestions for performance improvement? Or, can we abandon HMR and let graphql-let cli do all the work, which can reduce our project startup time.

@DeyLak
Copy link

DeyLak commented Jan 16, 2023

I spent some time, trying to improve the loader, but couldn't came up with an elegant solution.

I found out, that the longest operation in the loader is glob search using globby. The problem is that webpack loaders don't have any common context during the build, so globby search is executed each loader run. I couldn't think of a solution to fix the loader itself, but I managed to create my own cache with paths to graphql documents, which I update using a node script and patch the loader, using patch-package. Building time reduced from more than half an hour to a couple of minutes. Maybe my solution could help somebody:

Patch file I used:

diff --git a/node_modules/graphql-let/dist/lib/exec-context.js b/node_modules/graphql-let/dist/lib/exec-context.js
index 0991c67..ebf380f 100644
--- a/node_modules/graphql-let/dist/lib/exec-context.js
+++ b/node_modules/graphql-let/dist/lib/exec-context.js
@@ -5,41 +5,21 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
 Object.defineProperty(exports, "__esModule", { value: true });
 exports.createExecContextSync = exports.createExecContext = void 0;
 const fs_1 = require("fs");
+const path_1 = require("path");
 const gensync_1 = __importDefault(require("gensync"));
 const slash_1 = __importDefault(require("slash"));
-const gensynced_1 = require("./gensynced");
 const hash_1 = require("./hash");
 const paths_1 = require("./paths");
-function getSchemaPointers(schema, _acc = []) {
-    if (typeof schema === 'string') {
-        _acc.push(schema);
-    }
-    else if (Array.isArray(schema)) {
-        for (const s of schema)
-            getSchemaPointers(s, _acc);
-    }
-    else if (typeof schema === 'object') {
-        for (const s of Object.keys(schema))
-            getSchemaPointers(s, _acc);
-    }
-    return _acc;
-}
-function prepareCreateSchemaHashArgs(execContext) {
-    const { config, configHash, cwd } = execContext;
-    const schemaPointers = getSchemaPointers(config.schema);
-    // TODO: How can we detect update of remote GraphQL Schema? ETag?
-    // It caches the remote introspection forever in the current implementation.
-    const filePointers = schemaPointers.filter((p) => !paths_1.isURL(p));
-    return { configHash, cwd, filePointers };
-}
+
 const createSchemaHashGenerator = gensync_1.default(function* (execContext) {
-    const { configHash, cwd, filePointers } = prepareCreateSchemaHashArgs(execContext);
-    const files = yield* gensynced_1.globby(filePointers, { cwd, absolute: true });
-    const contents = files
-        .map(slash_1.default)
-        .sort()
-        .map((file) => fs_1.readFileSync(file, 'utf-8'));
-    return hash_1.createHashFromBuffers([configHash, ...contents]);
+  const { configHash, cwd } = execContext;
+  // We use our own cache here to speed up the generation and avoid running glob search on each loader run
+  const files = JSON.parse(fs_1.readFileSync(path_1.resolve(cwd, '.graphql-cache/graphql-documents.json'), 'utf-8'))
+  const contents = files
+    .map(slash_1.default)
+    .sort()
+    .map((file) => fs_1.readFileSync(file, 'utf-8'));
+  return hash_1.createHashFromBuffers([configHash, ...contents]);
 });
 const appendSchemaImportContextGenerator = gensync_1.default(function* (execContext, codegenContext) {
     const createdPaths = paths_1.createSchemaPaths(execContext);
diff --git a/node_modules/graphql-let/dist/loader.js b/node_modules/graphql-let/dist/loader.js
index 3e6c98c..4e4eaaf 100644
--- a/node_modules/graphql-let/dist/loader.js
+++ b/node_modules/graphql-let/dist/loader.js
@@ -86,7 +86,8 @@ const processLoaderForSources = memoize_1.default(async (sourceFullPath, sourceC
     return code;
 }, (gqlFullPath) => gqlFullPath);
 const processLoaderForDocuments = memoize_1.default(async (gqlFullPath, gqlContent, addDependency, cwd, options) => {
-    const [config, configHash] = await config_1.default(cwd, options.configFile);
+    // Sync is significantly faster
+    const [config, configHash] = await config_1.loadConfigSync(cwd, options.configFile);
     const { silent } = config;
     const graphqlRelPath = path_1.relative(cwd, gqlFullPath);
     if (!silent)
@@ -105,11 +106,12 @@ const processLoaderForDocuments = memoize_1.default(async (gqlFullPath, gqlConte
     const [, fileContext] = codegenContext;
     if (!fileContext)
         throw new Error('Never');
-    const { skip, tsxFullPath } = fileContext;
+    const { skip, tsxFullPath, tsxRelPath } = fileContext;
     if (skip) {
         if (!silent)
-            print_1.updateLog(`Nothing to do. Cache was fresh.`);
-        return await file_1.readFile(tsxFullPath, 'utf-8');
+            print_1.updateLog(`Nothing to do. Cache was fresh. Reading ${tsxRelPath}...`);
+        // Sync is significantly faster
+        return file_1.readFileSync(tsxFullPath, 'utf-8');
     }
     if (!silent)
         print_1.updateLog(`Processing codegen for ${graphqlRelPath}...`);

Script file:

import * as fs from 'fs';
import { glob } from 'glob';
import * as os from 'os';
import * as path from 'path';

// We use require, cause this module doesn't support import syntax
// and we don't want to use fs.makeDir, cause it throws, if directory exists
const makeDir = require('make-dir');

const GRAPHQL_CACHE_PATH = '../.graphql-cache/graphql-documents.json';
const GRAPHQL_FILES_PATTERN = '**/*.graphql';

const generateCacheFile = () => {
  return new Promise((resolve, reject) => {
    let pattern = path.resolve(__dirname, `../${GRAPHQL_FILES_PATTERN}`);

    if (os.type().includes('Windows')) {
      pattern = pattern.replace(/\\/g, '/');
    }

    glob(
      pattern,
      {
        absolute: true,
      },
      async (err, files) => {
        if (err) {
          return reject(err);
        }

        if (!files.length) {
          return reject(new Error('No graphql documents found'));
        }

        const fileName = path.resolve(__dirname, GRAPHQL_CACHE_PATH);
        await makeDir(path.dirname(fileName));
        return fs.writeFile(fileName, JSON.stringify(files), error => {
          if (error) {
            return reject(error);
          }

          return resolve(true);
        });
      },
    );
  });
};

generateCacheFile().then(() => {
  console.log('*.graphql documents cache file created');
});

@comp615
Copy link
Contributor

comp615 commented Jan 25, 2023

I've looked into this a bit as well, and wanted to share my findings. For our config we use something like:

schema:
  - '../server/graphql/**/*.graphql'

Because we use # import statements to split up graphql files and this requires that we glob all of them to get a proper hash (i.e. so HMR updates when any one of them changes). @DeyLak maybe you can share a bit more about your setup? I'm also curious how you determined that was so slow, and if perhaps your trick has some other benefit downstream I'm missing.

So performance issues I saw: First, the one pointed out above in exec-context. You can think of every file that needs processing as coming through the loader in its own pass. Instead of the changes above, the fastest thing I can think of is to:

  • Simply save the execContext in the loader and reuse across files (i.e. memoize it based on configHash)
  • Optionally, you could use the gql-let memoize (not real memoize) function to ensure it runs once across parallel runs

Nice! However, in my measuring, this portion only took a few hundred ms at worst. What takes a lot more time is the call out to { generate } from '@graphql-codegen/cli'; which is taking around 35s per file in my setup. I'm still trying to figure out what's causing that. But my hunch is that if we were to compile the schema once and cache it in the same way I mentioned around execContext above, that may speed it up.

EDIT 1: Loading the schema into gql-let and then passing as a string to the CLI took the per-file processing from 35s to 100ms for me. So I think that may be the way to go.

EDIT 2: From fixing all that, the slowest bit then becomes writing .d.ts. I can't find a way to speed this up. It's roughly the same speed either in batch with hundreds of files or with a single file. So this likely comes down to the architecture generally. It might be faster to do a gql-gen before server boot, then load the server and rely on HMR only for edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants