Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allMarkdownRemark graphql query in createPages intermittently hangs when excerpt or timeToRead fields are included #38855

Open
2 tasks done
tbantle22 opened this issue Feb 16, 2024 · 0 comments
Labels
status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer type: bug An issue or pull request relating to a bug in Gatsby

Comments

@tbantle22
Copy link

tbantle22 commented Feb 16, 2024

Preliminary Checks

Description

We use gatsby for our blog and have about 650 markdown files. We use the gatsby-transformer-remark plugin. This graphql query in createPages often hangs when running gatsby build or gatsby develop. It never completes, errors, or gives any useful information about what went wrong. This is not a recent issue for us and has been around since at least Gatsby v4, maybe earlier. When it does complete, createPages takes about 90s.

 const result = await graphql(`
    query CreatePagesQuery {
      allMarkdownRemark(sort: { frontmatter: { date: DESC } }) {
        edges {
          node {
            id
            excerpt(pruneLength: 250)
            timeToRead
            fields {
              slug
            }
            frontmatter {
              title
              author
              authorHref
              date
              tags
            }
          }
        }
      }
    }
  `);

These are the gatsby build --verbose logs where it hangs:

blog % yarn build
verbose set gatsby_log_level: "verbose"
verbose set gatsby_executing_command: "build"
verbose loading local command from: [path redacted]/web/node_modules/gatsby/dist/commands/build.js
verbose running command: build
verbose Running build in "production" environment
success compile gatsby files - 0.586s
success load gatsby config - 0.034s
verbose No adapter was found for the current environment. Skipping adapter initialization.
warn Plugin gatsby-plugin-hubspot is not compatible with your gatsby version 5.13.3 - It requires gatsby@^4.0.0-next
warn Plugin gatsby-plugin-hubspot is not compatible with your gatsby version 5.13.3 - It requires gatsby@^4.0.0-next
success load plugins - 3.531s
success onPreInit - 0.002s
success initialize cache - 0.032s
success copy gatsby files - 0.038s
success Compiling Gatsby Functions - 0.130s
success onPreBootstrap - 0.141s
verbose Creating 11 worker
success createSchemaCustomization - 0.012s
verbose Checking for deleted pages
verbose Deleted 0 pages
verbose Found 0 changed pages
success Checking for changed pages - 0.001s
success source and transform nodes - 4.462s
info Writing GraphQL type definitions to [path redacted]/web/packages/blog/.cache/schema.gql
success building schema - 0.140s
warn unable to find prism language 'yacc' for highlighting. applying generic code block
warn unable to find prism language 'c++' for highlighting. applying generic code block
warn unable to find prism language 'terraform' for highlighting. applying generic code block
warn unable to find prism language 'plan9_x86' for highlighting. applying generic code block
warn unable to find prism language 'make' for highlighting. applying generic code block
⠹ createPages
[                            ]   0.000 s 0/1 0% Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs

This step never moves past 0% or errors. The logs look similar for gatsby develop, but without the last Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs line.

I recently discovered that removing the excerpt and timeToRead fields fixes the hanging problem and reduces createPages to <1s. If I add a limit of up to 300 on allMarkdownRemark, it also doesn't hang. This is reproducible using the gatsby cli on my local computer and in github actions.

blog % yarn build      
verbose set gatsby_log_level: "verbose"
verbose set gatsby_executing_command: "build"
verbose loading local command from: [path redacted]/web/node_modules/gatsby/dist/commands/build.js
verbose running command: build
verbose Running build in "production" environment
success compile gatsby files - 0.574s
success load gatsby config - 0.025s
verbose No adapter was found for the current environment. Skipping adapter initialization.
warn Plugin gatsby-plugin-hubspot is not compatible with your gatsby version 5.13.3 - It requires gatsby@^4.0.0-next
warn Plugin gatsby-plugin-hubspot is not compatible with your gatsby version 5.13.3 - It requires gatsby@^4.0.0-next
success load plugins - 0.343s
success onPreInit - 0.003s
success initialize cache - 0.031s
success copy gatsby files - 0.038s
success Compiling Gatsby Functions - 0.123s
success onPreBootstrap - 0.131s
verbose Creating 11 worker
success createSchemaCustomization - 0.019s
verbose Checking for deleted pages
verbose Deleted 0 pages
verbose Found 0 changed pages
success Checking for changed pages - 0.001s
success source and transform nodes - 4.685s
info Writing GraphQL type definitions to [path redacted]/web/packages/blog/.cache/schema.gql
success building schema - 0.141s
success createPages - 0.176s
success createPagesStatefully - 0.031s
info Total nodes: 5429, SitePage nodes: 668 (use --verbose for breakdown)
verbose Number of node types: 8. Nodes per type: Directory: 65, File: 2386, ImageSharp: 1627, MarkdownRemark: 634, Site: 1, SiteBuildMetadata: 1, SitePage: 668, SitePlugin: 47
verbose Checking for deleted pages
verbose Deleted 0 pages
verbose Found 668 changed pages
success Checking for changed pages - 0.001s
[...]

Is there a better way to debug why this query is hanging when we include fields that require parsing our markdown files?

It is difficult to reproduce without our repository, which is private. But here are our gatsby config and package.json.

This is our package.json:

{
  "scripts": {
    "build": "gatsby build --prefix-paths --verbose",
    "develop": "gatsby develop",
  },
  "dependencies": {
    "@dolthub/react-components": "^0.1.1",
    "@dolthub/shared-components": "^0.1.0",
    "@dolthub/tailwind-config": "^0.1.0",
    "@react-icons/all-files": "^4.1.0",
    "classnames": "^2.5.1",
    "dotenv": "^16.4.1",
    "gatsby": "^5.13.3",
    "gatsby-plugin-canonical-urls": "^5.13.1",
    "gatsby-plugin-feed": "^5.13.1",
    "gatsby-plugin-google-gtag": "^5.13.1",
    "gatsby-plugin-hubspot": "^2.0.0",
    "gatsby-plugin-image": "^3.13.1",
    "gatsby-plugin-manifest": "^5.13.1",
    "gatsby-plugin-offline": "^6.13.1",
    "gatsby-plugin-postcss": "^6.13.1",
    "gatsby-plugin-sharp": "^5.13.1",
    "gatsby-plugin-twitter": "^5.13.1",
    "gatsby-remark-autolink-headers": "^6.13.1",
    "gatsby-remark-copy-linked-files": "^6.13.1",
    "gatsby-remark-images": "^7.13.1",
    "gatsby-remark-prismjs": "^7.13.1",
    "gatsby-remark-responsive-iframe": "^6.13.1",
    "gatsby-source-filesystem": "^5.13.1",
    "gatsby-transformer-remark": "^6.13.1",
    "gatsby-transformer-sharp": "^5.13.1",
    "graphql": "^16.8.1",
    "js-search": "^2.0.1",
    "prismjs": "^1.29.0",
    "prop-types": "^15.8.1",
    "react": "^18.2.0",
    "react-dom": "^18.2.0",
    "react-share": "^5.1.0"
  },
  "devDependencies": {
    "@babel/core": "^7.23.9",
    "@types/js-search": "^1.4.4",
    "@types/prismjs": "^1.26.3",
    "@typescript-eslint/eslint-plugin": "^6.21.0",
    "@typescript-eslint/parser": "^6.18.0",
    "babel-loader": "^9.1.3",
    "eslint": "^8.49.0",
    "eslint-plugin-jest": "^27.6.0",
    "eslint-plugin-react": "^7.33.2",
    "eslint-plugin-testing-library": "^6.2.0",
    "postcss": "^8.4.32",
    "prettier": "^3.2.5",
    "stylelint": "^16.2.1",
    "stylelint-config-recommended": "^14.0.0",
    "typescript": "^5.3.3",
    "webpack": "^5.89.0",
    "yalc": "^1.0.0-pre.53"
  }
}

And our gatsby.config:

import dotenv from "dotenv";
import type { GatsbyConfig } from "gatsby";

dotenv.config({
  path: `.env.${process.env.NODE_ENV}`,
});

const siteUrl = "url";
const title = "title";

const config: GatsbyConfig = {
  pathPrefix: "/blog",
  siteMetadata: {
    title,
    description: `redacted`,
    author: `redacted`,
    siteUrl,
  },
  graphqlTypegen: {
    generateOnBuild: true,
    typesOutputPath: `gatsby-types.d.ts`,
    documentSearchPaths: [
      `./src/**/*.{ts,tsx}`,
      `../../node_modules/gatsby-*/**/*.js`,
      "./gatsby-node.ts",
    ],
  },
  plugins: [
    `gatsby-plugin-postcss`,
    `gatsby-plugin-twitter`,
    {
      resolve: `gatsby-plugin-feed`,
      options: {
        query: `
          {
            site {
              siteMetadata {
                title
                description
                siteUrl
                site_url: siteUrl
              }
            }
          }
        `,
        feeds: [
          {
            serialize: ({ query: { site, allMarkdownRemark } }) =>
              allMarkdownRemark.edges.map((edge) => {
                return {
                  ...edge.node.frontmatter,
                  description: edge.node.excerpt,
                  date: edge.node.frontmatter.date,
                  url: site.siteMetadata.siteUrl + edge.node.fields.slug,
                  guid: site.siteMetadata.siteUrl + edge.node.fields.slug,
                  custom_elements: [{ "content:encoded": edge.node.html }],
                };
              }),
            query: `query FeedQuery {
  allMarkdownRemark(sort: {frontmatter: {date: DESC}}) {
    edges {
      node {
        excerpt
        html
        fields {
          slug
        }
        frontmatter {
          title
          date
        }
      }
    }
  }
}`,
            output: "/rss.xml",
            title,
          },
        ],
      },
    },
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `images`,
        path: `${__dirname}/src/images`,
      },
    },
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `src`,
        path: `${__dirname}/src/`,
      },
    },
    {
      resolve: `gatsby-transformer-remark`,
      options: {
        plugins: [
          `gatsby-remark-copy-linked-files`,
          `gatsby-remark-autolink-headers`,
          `gatsby-remark-prismjs`,
          {
            resolve: `gatsby-remark-images`,
            options: {
              maxWidth: 856,
            },
          },
        ],
      },
    },
    `gatsby-plugin-image`,
    `gatsby-transformer-sharp`,
    `gatsby-plugin-sharp`,
    {
      resolve: `gatsby-plugin-canonical-urls`,
      options: {
        siteUrl,
      },
    },
    {
      resolve: "gatsby-plugin-hubspot",
      options: {
        trackingCode: "redacted",
        respectDNT: true,
        productionOnly: true,
      },
    },
    {
      resolve: `gatsby-plugin-google-gtag`,
      options: {
        trackingIds: [ "redacted" ],
      },
    },
    {
      resolve: `gatsby-plugin-manifest`,
      options: {
        name: title,
        icon: `src/images/favicon.png`,
        short_name: title,
        start_url: `/`,
        background_color: `#182134`,
        theme_color: `#182134`,
        display: `minimal-ui`,
      },
    },
  ],
};

export default config;

And a simplified version of our gatsby-node that still reproduces the issue:

import { createFilePath } from "gatsby-source-filesystem";
import path from "path";

const blogPostTemplate = path.resolve(`./src/templates/BlogPost.tsx`);
const blogListTemplate = path.resolve("./src/templates/BlogList.tsx");

export const onCreateNode = ({ node, getNode, actions }) => {
  const { createNodeField } = actions;
  if (node.internal.type === `MarkdownRemark`) {
    const slug = createFilePath({ node, getNode, basePath: `pages` });
    createNodeField({
      node,
      name: `slug`,
      value: slug,
    });
  }
};

export const createPages = async ({ graphql, actions, reporter }) => {
  const { createPage } = actions;
  const result = await graphql(`
    query CreatePagesQuery {
      allMarkdownRemark(sort: { frontmatter: { date: DESC } }) {
        edges {
          node {
            id
            excerpt(pruneLength: 250)
            timeToRead
            fields {
              slug
            }
            frontmatter {
              title
              author
              authorHref
              date
              tags
            }
          }
        }
      }
    }
  `);

  if (result.errors) {
    reporter.panicOnBuild(`There was an error loading posts`, result.errors);
    return;
  }

  if (!result.data) {
    reporter.panicOnBuild(`No data found loading posts`);
    return;
  }

  // Create blog post pages
  const posts = result.data.allMarkdownRemark.edges;
  posts.forEach(({ node }) => {
    createPage({
      path: node.fields.slug,
      component: blogPostTemplate,
      context: {
        // Data passed to context is available
        // in page queries as GraphQL variables.
        slug: node.fields.slug,
      },
    });
  });

  // Create paginated blog list pages
  const postsPerPage = 20;
  const numPages = Math.ceil(posts.length / postsPerPage);
  Array.from({ length: numPages }).forEach((_, i) => {
    const firstPage = i === 0;
    const currentPage = i + 1;
    createPage({
      path: firstPage ? "/" : `/page/${currentPage}`,
      component: blogListTemplate,
      context: {
        limit: postsPerPage,
        skip: i * postsPerPage,
        numPages,
        currentPage,
        allBlogs: posts,
      },
    });
  });
};

export const createSchemaCustomization = ({ actions }) => {
  const { createTypes } = actions;
  createTypes(`
    type SitePage implements Node {
      context: SitePageContext
    }
    type SitePageContextFields {
      slug: String
    }
    type SitePageContextFrontmatter {
      title: String
      author: String
      authorHref: String
      date: Date
      tags: String
    }
    type SitePageContextBlogNode {
      id: String
      excerpt: String
      timeToRead: Int
      fields: SitePageContextFields
      frontmatter: SitePageContextFrontmatter
    }
    type SitePageContextBlogs {
      node: SitePageContextBlogNode
    }
    type SitePageContext {
      slug: String
      limit: Int
      skip: Int
      numPages: Int
      currentPage: Int
      allBlogs: [SitePageContextBlogs]
    }
  `);
};

Reproduction Link

N/A

Steps to Reproduce

  1. Put allMarkdownRemark query in createPages with excerpt and/or timeToRead fields
  2. Run gatsby build or gatsby develop
  3. Hangs at createPages
    ...

Expected Result

Build is successful or errors

Actual Result

Build hangs indefinitely

Environment

System:
    OS: macOS 14.1
    CPU: (12) arm64 Apple M3 Pro
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.11.1 - /private/var/folders/2f/47svy2p51k12pmjv4phs06k00000gn/T/xfs-804f9c02/node
    Yarn: 4.0.2 - /private/var/folders/2f/47svy2p51k12pmjv4phs06k00000gn/T/xfs-804f9c02/yarn
    npm: 10.2.4 - ~/.nvm/versions/node/v20.11.1/bin/npm
  Languages:
    Python: 3.9.2 - /Users/taylor/.pyenv/shims/python
  Browsers:
    Chrome: 121.0.6167.184
    Safari: 17.1

Config Flags

None

@tbantle22 tbantle22 added the type: bug An issue or pull request relating to a bug in Gatsby label Feb 16, 2024
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

No branches or pull requests

1 participant