Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install crashes because getPrettyLogs() tries to allocate a huge string object #4949

Closed
octogonz opened this issue Jun 28, 2022 · 10 comments
Closed
Milestone

Comments

@octogonz
Copy link
Member

octogonz commented Jun 28, 2022

pnpm version: 6.32.23 (however the same code is still in the latest 7.4.0)

Code to reproduce the issue:

Encountered by invoking these command:

export NODE_OPTIONS="--max-old-space-size=16000"

pnpm install --recursive --no-scripts

...in a very large monorepo with these settings in .npmrc:

shared-workspace-lockfile=false
workspace-concurrency=16

Crash output:

packages/apps/webapp-server              | Progress: resolved 1132, reused 0, downloaded 7, added 0
undefined
/data00/home/[repo-name]/packages/apps/[project-name]:
 ERR_STREAM_PREMATURE_CLOSE  Premature close
/data00/home/[user-name]/.nvm/versions/node/v14.19.2/lib/node_modules/pnpm/dist/pnpm.cjs:186453
        const jsonLogs = JSON.stringify(prettyLogs, null, 2);
                              ^

RangeError: Invalid string length
    at JSON.stringify (<anonymous>)
    at process.<anonymous> (/data00/home/[user-name]/.nvm/versions/node/v14.19.2/lib/node_modules/pnpm/dist/pnpm.cjs:186453:31)
    at process.emit (events.js:400:28)

Analysis

RangeError: Invalid string length is Node.js's way of reporting that it has run out of memory.

getPrettyLogs() appears to be creating a large in-memory data structure that is an array of all the log entries in .pnpm-debug.log. If I understand the code correctly, JSON.stringify() will then allocate a single massive JavaScript string, which can potentially be very large:

file-reporter/src/index.ts

    const prettyLogs = getPrettyLogs()
    const jsonLogs = JSON.stringify(prettyLogs, null, 2)
    // Don't create a node_modules directory if it does not exist
    const dest = fs.existsSync(path.dirname(LOG_FILENAME))
      ? LOG_FILENAME
      : path.basename(LOG_FILENAME)
    fs.writeFileSync(dest, jsonLogs, 'utf-8')
  })

  function getPrettyLogs () {
    const prettyLogs = {}
    logs.forEach((logObj, i) => {
      // eslint-disable-next-line
      const key = `${i} ${logObj['level']} ${logObj['name']}`
      const msgobj = getMessageObj(logObj)
      prettyLogs[key] = prettify(msgobj)
    })
    return prettyLogs
  }

A possible solution would be to write the log incrementally rather than stringifying the entire data structure at once.

Additional information:

  • node -v prints: v14.19.2
  • Windows, macOS, or Linux?: Linux
@octogonz
Copy link
Member Author

I was able to workaround these crashes by manually editing pnpm/dist/pnpm.cjs and commenting out the call to JSON.stringify(prettyLogs, null, 2).

@zkochan what is the right way to fix this?

By design, is it important for the file .pnpm-debug.log to be serialized as a single JSON object? Example of the current file format:

node_modules/.pnpm-debug.log

{
  "0 info pnpm": {
    "message": "Using hooks from: /Users/nobody/code/repo/common/temp-split/global-pnpmfile.cjs",
    "prefix": "/Users/nobody/code/repo/common/temp-split"
  },
  "1 info pnpm": {
    "message": "readPackage hook is declared. Manifests of dependencies might get overridden",
    "prefix": "/Users/nobody/code/repo/common/temp-split"
  },
  "2 debug pnpm:scope": {
    "selected": 329,
    "total": 329,
    "workspacePrefix": "/Users/nobody/code/repo/common/temp-split"
  },

The above file format is more difficult to write/read incrementally. Compare with something like this:

possible different design

{
  "key": "0 info pnpm",
  "message": "Using hooks from: /Users/nobody/code/repo/common/temp-split/global-pnpmfile.cjs",
  "prefix": "/Users/nobody/code/repo/common/temp-split"
}
{
  "key": "1 info pnpm",
  "message": "readPackage hook is declared. Manifests of dependencies might get overridden",
  "prefix": "/Users/nobody/code/repo/common/temp-split"
}
{
  "key": "2 debug pnpm:scope",
  "selected": 329,
  "total": 329,
  "workspacePrefix": "/Users/nobody/code/repo/common/temp-split"
}

(In this possible design, the file reader can delimit blocks by looking for /^\{/, and still parse them using JSON.parse(). The writer can simply append JSON.stringify() for each object inside its loop.)

@zkochan
Copy link
Member

zkochan commented Jun 29, 2022

I don't care that much how that log file will look like. If you have a fix, submit it and I'll accept it

@octogonz
Copy link
Member Author

We can fix it. 👍

@octogonz
Copy link
Member Author

BTW is there an option to disable this logging entirely? That could provide an interim workaround, and for very large installs it might even speed up the install time.

@zkochan
Copy link
Member

zkochan commented Jun 29, 2022

No, there is no such option.

@octogonz
Copy link
Member Author

Would you want to add an option for this? (What should it be called?)

We can measure the effect of disabling this logging, and if it makes a significant difference, maybe we could implement the option in the same PR that fixes the memory leak.

@zkochan
Copy link
Member

zkochan commented Jun 29, 2022

If it makes performance bad, we just need to remove logging. No option is needed.

@octogonz
Copy link
Member Author

Ok, we'll investigate and follow up. 👍

FYI however it is fixed, we'll probably need to backport it to PNPM 6.

@octogonz
Copy link
Member Author

Generally, why does PNPM keep the entire logs: Object[] array in RAM? Is there any problem with making the objects incremental as well?

(For example, was there some concern that the target folder won't exist if the output file is created too early? Or that the file might not get closed if the process terminates? Or that file writes are slow?)

@zkochan
Copy link
Member

zkochan commented Jun 29, 2022

I don't remember any reason. I think it can be changed to a file that is incrementally written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants