Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Freezing when querying frames on certain websites #12292

Closed
2 tasks
Vasile-Peste opened this issue Apr 18, 2024 · 12 comments
Closed
2 tasks

[Bug]: Freezing when querying frames on certain websites #12292

Vasile-Peste opened this issue Apr 18, 2024 · 12 comments

Comments

@Vasile-Peste
Copy link

Vasile-Peste commented Apr 18, 2024

Minimal, reproducible example

const puppeteer = require("puppeteer");

async function main () {
    console.log("Operative");

    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto('http://fonologomed.it/');

    const allIframeLinks = [];
    const frames = page?.frames() ?? [];

    for (const frame of frames) {
        try {
            if (!frame?.url() || frame.detached) {
                continue;
            }

            const links = await frame.$$eval('a', (anchors) => anchors.map((anchor) => ({
            href: anchor.href,
                text: anchor.text,
    	    })));

            allIframeLinks.push(...links);
        }
        catch (e) {
            console.log(e);
        }
    }

    console.log(allIframeLinks);
    console.log("Success");
}

main();

Error string

no error

Bug behavior

  • Flaky
  • PDF

Background

We want to retrieve the links from the iframes within a website. This works for the majority of visited websites, but we noticed on certain websites this operation completely is freezing (the Promise is pending forever).
Running the reproducible example on http://fonologomed.it/ will freeze
For a contrarian example, running the reproducible example on https://www.aranzulla.it will NOT freeze

Expectation

The promise to resolve or at least get an error

Reality

The operation gets stuck as the Promise never gets resolved

Puppeteer configuration file (if used)

No response

Puppeteer version

22.6.5

Node version

20.3.0

Package manager

npm

Package manager version

9.6.7

Operating system

macOS

Copy link

github-actions bot commented Apr 18, 2024

This issue was not reproducible. Please check that your example runs locally and the following:

  • Ensure the script does not rely on dependencies outside of puppeteer and puppeteer-core.
  • Ensure the error string is just the error message.
    • Bad:

      Error: something went wrong
        at Object.<anonymous> (/Users/username/repository/script.js:2:1)
        at Module._compile (node:internal/modules/cjs/loader:1159:14)
        at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
        at Module.load (node:internal/modules/cjs/loader:1037:32)
        at Module._load (node:internal/modules/cjs/loader:878:12)
        at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
        at node:internal/main/run_main_module:23:47
    • Good: Error: something went wrong.

  • Ensure your configuration file (if applicable) is valid.
  • If the issue is flaky (does not reproduce all the time), make sure 'Flaky' is checked.
  • If the issue is not expected to error, make sure to write 'no error'.

Once the above checks are satisfied, please edit your issue with the changes and we will
try to reproduce the bug again.


Analyzer run

@Vasile-Peste
Copy link
Author

I guess it cannot be reproduced by the workflow because it gets stuck?

@AntonioFavero
Copy link

The promise frame.$$eval is never resolved.

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 18, 2024

I think it is another example of #10696 basically the browser built-in PDF viewer creates empty iframes which never gets an execution context created for them as regular iframe. So the eval calls wait for the context as expected from the Chrome DevTools Protocol, but never gets them for those frames. Maybe it is because it is treated as part of the browser UI although the technical implementation is an iframe.

@OrKoN OrKoN added the upstream label Apr 18, 2024
@OrKoN
Copy link
Collaborator

OrKoN commented Apr 18, 2024

A workaround could be like

      let parent = frame.parentFrame();
      let skip = false;
      while (parent) {
        if (parent.url().endsWith('.pdf')) {
          skip = true;
          break;
        }
        parent = parent.parentFrame();
      }
      if (!frame?.url() || frame.detached || skip) {
        continue;
      }

or more general since it is reasonable to not expect any context on about:blank pages

      if (!frame?.url() || frame.detached || frame.url('about:blank')) {
        continue;
      }

@Vasile-Peste
Copy link
Author

Hi @OrKoN,
thank you for the answer

We thought it might be a PDF issue in fact we tried filtering by PDF extension as you proposed in the first workaround. Although we didn't traverse the parent frames so this is why probably our solution didn't work. We will try your first workaround

The second workaround is problematic. In fact in our solution we want to read frames with "about:blank". From our tests it results that calling frame.url() on iframes without a src set, returns "about:blank". We have several cases of frames without a src set, they are automatically generated by plugins and widgets, and since they have content we need, we want to read them

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 18, 2024

You can also disable PDF viewer via OOPIFs for now via

args: [
      '--disable-features=PdfOopif'
    ]

@Vasile-Peste
Copy link
Author

You can also disable PDF viewer via OOPIFs for now via

args: [
      '--disable-features=PdfOopif'
    ]

Thank you,
is there any documentation about this option?

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 19, 2024

There is an issue for that https://issues.chromium.org/issues/40268279

@OrKoN
Copy link
Collaborator

OrKoN commented Apr 19, 2024

I left a comment there as it looked like the feature was not meant to be enabled yet ?

@OrKoN OrKoN self-assigned this Apr 25, 2024
@OrKoN
Copy link
Collaborator

OrKoN commented Apr 25, 2024

After further investigation, it looks like the feature might not be the reason for hanging (it seems to reproduce now with the feature disabled). It appears that the iframes from the pdf viewer extensions get reported into the page sessions via CDP (although they probably have to be contained to the extension target).

@przhkv
Copy link

przhkv commented Apr 29, 2024

I am experiencing freezing problem for http URLs but not https. Could that be the main issue here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants