Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(file-reporter): output logs incrementally #5011

Closed
wants to merge 4 commits into from

Conversation

william2958
Copy link
Contributor

Resolves issue #4949 where pnpm install will crash in a very large repository as the logs object in memory becomes too large for JSON.stringify to handle. Performance has been tested on a large monorepo and this change does not seem to hinder installation time in any measurable way as compared to not logging at all.

The output log format has been changed from:

{
  "0 info pnpm": "foo",
  "1 warn pnpm": {
    "foo": 1,
    "bar": 2
  }
}

to:

{
  "message": "foo",
  "key": "0 info pnpm"
}
{
  "foo": 1,
  "bar": 2,
  "key": "1 warn pnpm"
}

Where each object is written to the file incrementally, instead of storing all logs and writing all at once.

Note: because we are using fs.appendFileSync which does not overwrite previous files, the log file must be cleaned up before any logging.

@welcome
Copy link

welcome bot commented Jul 11, 2022

💖 Thanks for opening this pull request! 💖
Please be patient and we will get back to you as soon as we can.

@zkochan
Copy link
Member

zkochan commented Jul 11, 2022

Run pnpm changeset in the root of the repository and describe your changes. The resulting files should be committed as they will be used during release.


streamParser['on']('data', function (logObj: Object) {
if (isUsefulLog(logObj)) {
logs.push(logObj)
if (isUsefulLog(logObj) && global['writeDebugLogFile'] !== false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if global['writeDebugLogFile'] is false, we can early return from the outer function at line 7

const msgobj = getMessageObj(logObj)
const prettyLogs = prettify(msgobj)
const jsonLogs = JSON.stringify(prettyLogs, null, 2) + '\n'
const dest = fs.existsSync(path.dirname(LOG_FILENAME)) ? LOG_FILENAME : path.basename(LOG_FILENAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is optimal to check the existence of the file hundreds of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why are there two possible locations for the logs?

const prettyLogs = prettify(msgobj)
const jsonLogs = JSON.stringify(prettyLogs, null, 2) + '\n'
const dest = fs.existsSync(path.dirname(LOG_FILENAME)) ? LOG_FILENAME : path.basename(LOG_FILENAME)
fs.appendFileSync(dest, jsonLogs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the best way to do this? Or would it be faster to open a write stream? That would keep up the connection to the file, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably look at how winston does it

function getMessageObj (logobj: Object): Object {
const msgobj = {}
const msgobj: { [key: string]: string } = {}
for (const key in logobj) {
if (['time', 'hostname', 'pid', 'level', 'name'].includes(key)) continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (['time', 'hostname', 'pid', 'level', 'name'].includes(key)) continue
if (['time', 'hostname', 'pid'].includes(key)) continue

for (const key in logobj) {
if (['time', 'hostname', 'pid', 'level', 'name'].includes(key)) continue
msgobj[key] = logobj[key]
}
const logLevel: string = logobj['level']
const logName: string = logobj['name']
msgobj.key = `${logNum} ${logLevel} ${logName}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field can be removed.

@zkochan
Copy link
Member

zkochan commented Jul 11, 2022

TBH, maybe we don't even need these logs. They are not very useful. I think we can remove them. Or we can just write the stacktrace of the error to the log file

@william2958
Copy link
Contributor Author

TBH, maybe we don't even need these logs. They are not very useful. I think we can remove them. Or we can just write the stacktrace of the error to the log file

Okay in that case I will open another PR that looks to remove the logging entirely - for this change should I just remove the file-reporter package entirely?

@zkochan
Copy link
Member

zkochan commented Jul 11, 2022

I guess, yes

@octogonz
Copy link
Member

Or we can just write the stacktrace of the error to the log file

FWIW I've used PNPM for many years and had no idea that this log file even existed. 😄 The output is written under node_modules with a hidden filename. If we decide to improve this feature (rather than simply removing it), perhaps the log file should be documented and given a more conspicuous filename.

But if that additional work would delay the fix of the RangeError crash, maybe it should be a separate work item. (?)

@william2958
Copy link
Contributor Author

I guess, yes

PR for removing file-reporter package: #5012

@zkochan zkochan closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants