Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 3x performance drop since v21.2.0 #11944

Open
2 tasks
feesler opened this issue Feb 19, 2024 · 9 comments
Open
2 tasks

[Bug]: 3x performance drop since v21.2.0 #11944

feesler opened this issue Feb 19, 2024 · 9 comments
Assignees

Comments

@feesler
Copy link

feesler commented Feb 19, 2024

Minimal, reproducible example

git clone https://github.com/feesler/puppeteer-perf.git
cd puppeteer-perf
npm install
npm test
npm install puppeteer@21.2.0
npm test

Error string

no error

Bug behavior

  • Flaky
  • PDF

Background

Similar to #9993 and #8650.

Steps to reproduce the problem:

  1. Install puppeteer v21.1.1 dependency
  2. Run some script and measure performance
  3. Update puppeteer to v21.2.0 and repeat step 2

Script performing simple page query/evaluate calls.

Expectation

Expected same performance of query/evaluate calls as version 21.1.1 and earlier.

Reality

Starting from version 21.2.0 performance of script with query/evaluate calls is dropped about 3 times and remains on the same level.
No difference between ESM and CJS.

Log output:
Node: v21.6.2
Puppeteer: 21.1.1
Browser: Chrome/116.0.5845.96
Duration: 218.91580000000022

Node: v21.6.2
Puppeteer: 21.2.0
Browser: Chrome/116.0.5845.96
Duration: 783.2916

Node: v21.6.2
Puppeteer: 22.1.0
Browser: Chrome/121.0.6167.85
Duration: 775.9939000000004

Puppeteer configuration file (if used)

No response

Puppeteer version

21.2.0 - 22.1.0

Node version

21.6.2

Package manager

npm

Package manager version

9.8.1

Operating system

Windows

This comment was marked as outdated.

@OrKoN OrKoN self-assigned this Feb 19, 2024
@OrKoN
Copy link
Collaborator

OrKoN commented Feb 19, 2024

@jrandolf PTAL

@OrKoN
Copy link
Collaborator

OrKoN commented Feb 19, 2024

@OrKoN OrKoN assigned jrandolf and unassigned OrKoN Feb 19, 2024
@OrKoN OrKoN added the P1 label Feb 19, 2024
@jrandolf
Copy link
Contributor

jrandolf commented Feb 19, 2024

After investigation, we found that the performance drop is due to sandboxed queries in Puppeteer. Older versions of Puppeteer did not sandbox queries, so there was no performance loss from transferring sandboxed objects into the main execution context (the one Puppeteer library users use).

Internally, Puppeteer doesn't have an internal method of migrating arbitrary objects (in particular, arrays of nodes) from a sandbox. The CDP method DOM.describeNode only allows transfers of Nodes, so to enable adoption, we adopt each node individually which costs a significant amount of time for queries with a lot of nodes.

Perhaps the solution would be to allow users to disable sandboxing. Deferring to @OrKoN .

@jrandolf jrandolf assigned OrKoN and unassigned jrandolf Feb 19, 2024
@OrKoN OrKoN added P2 and removed P1 labels Feb 19, 2024
@feesler
Copy link
Author

feesler commented Feb 21, 2024

Well, it will be sad if there is no good solution to this! Just wonder how to workaround this from user perspective.

Sample script is, of course, intentionally not optimized.
Moving calculations for all nodes inside single page.evaluate call saves the situation, but only once.
Over long test run it still adds up to an impressive amount of time.

@NC-piercej
Copy link

I can confirm that headless: true on v22 is significantly slower than headless: 'shell' for our PDF generation workloads.

Interestingly, if we are just generating a single PDF in a single tab, the performance is similar. It's only when multiple tabs are open, running heavy CPU-bound javascript, and generating PDFs at once that performance degrades significantly.

If I didn't know better, it's almost like script execution in one tab (dependent on requestAnimationFrame) is blocked until another tab completes. It's weird and hard too pin down.

@jrandolf
Copy link
Contributor

jrandolf commented Feb 28, 2024

Well, it will be sad if there is no good solution to this! Just wonder how to workaround this from user perspective.

Sample script is, of course, intentionally not optimized. Moving calculations for all nodes inside single page.evaluate call saves the situation, but only once. Over long test run it still adds up to an impressive amount of time.

I think if we can get more examples of cases where the script cannot be optimized and the test runs have a high delta in time, this would strengthen the use-case. From our experience, it's never querying that increase the testing time. For example, the test may be testing too many cases at once or it may be using some flaky, time-consuming heuristic.

@OrKoN
Copy link
Collaborator

OrKoN commented Feb 28, 2024

@NC-piercej this is unrelated issue. The new headless might be slower than the old headless mode but it is the same browser implementation as the regular headful Chrome. See https://developer.chrome.com/docs/chromium/new-headless

@OrKoN
Copy link
Collaborator

OrKoN commented Jun 6, 2024

So one way to fix this regression would be to introduce a new method that does not isolate the query code. This way we could still keep the isolation by default like in other methods and allow use cases where many elements need to be returned. See #12539 Note that there is still a 30% increase in querying remaining after turning off the isolation so there is probably another regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants