Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Traffic from iframes, nested iframes and service workers not intercepted via CDPSession / Fetch.getResponseBody #21816

Closed
1 task done
brianjenkins94 opened this issue Mar 20, 2023 · 6 comments

Comments

@brianjenkins94
Copy link

brianjenkins94 commented Mar 20, 2023

System info

  • Playwright Version: playwright-chromium@1.31.2
  • Operating System: macOS 13.1
  • Browser: Chrome 110.0.5481.77 (Official Build) (arm64)
  • Other info:

Source code

  • I provided exact source code that allows reproducing the issue locally.

script.ts:

import { Browser, BrowserContext, chromium, Page } from "playwright-chromium";
import * as path from "path";
import * as url from "url";

async function intercept(page: Page) {
	const context = page.context();

	const client = await context.newCDPSession(page);

	await client.send("Fetch.enable", {
		"patterns": [
			{
				"requestStage": "Response",
				"urlPattern": "*"
			}
		]
	});

	client.on("Fetch.requestPaused", async ({ requestId, request }) => {
		const { hostname, pathname } = new URL(request.url);

		try {
			const { body } = await client.send("Fetch.getResponseBody", {
				"requestId": requestId
			});

			let baseName = path.basename(pathname);
			const pathName = path.join(pathname.slice(0, -baseName.length));
			const fileName = path.join(hostname, pathName, baseName || "index.html");
			baseName ||= path.basename(fileName);

			console.log("Saving " + request.url + " as " + baseName);
		} catch (error) {
			console.error("Failed to get " + request.url);
		} finally {
			await client.send("Fetch.continueRequest", {
				"requestId": requestId
			});
		}
	});
}

let browser: Browser;

let context: BrowserContext;

export async function init() {
	browser ??= await chromium.launch({
		"args": ["--disable-web-security"],
		"devtools": true,
		"headless": false //!(Boolean(process.env["CI"]) || process.platform === "win32" || Boolean(process.env["DISPLAY"]))
	});

	context ??= await browser.newContext();

	const page = await context.newPage();

	await intercept(page);

	await page.goto(url.pathToFileURL(path.resolve("./index.html")).toString());
}

init();

index.html:

<!DOCTYPE html>
<html>
<head>
    <meta name="description" content="" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
</head>
<body>
    <iframe id="nodebox-iframe"></iframe>
    <iframe id="preview-iframe"></iframe>
    <script type="module">
        import { Nodebox } from "https://cdn.jsdelivr.net/npm/@codesandbox/nodebox/build/index.min.mjs";

        const nodeboxIframe = document.getElementById("nodebox-iframe");
        const previewIframe = document.getElementById("preview-iframe");

        const nodebox = new Nodebox({
            "iframe": nodeboxIframe
        });

        await nodebox.connect();

        await nodebox.fs.init({
            "index.js": `
// server.ts
import * as http from "http";

// server.ts
var BASE_URL = "http://localhost:8000";
var router = {
  "/": function(request, response) {
    response.writeHead(200, { "Content-Type": "text/html" });
    response.end("OK");
  }
};
var server = http.createServer(function(request, response) {
  const startTime = performance.now();
  let status = 200;
  try {
    if (request.method === "GET" && router[request.url] !== void 0) {
      router[request.url](request, response);
    } else {
      status = 404;
      response.writeHead(status, { "Content-Type": "text/plain" });
      response.end("File not found");
    }
  } catch (error) {
    status = 500;
    response.writeHead(status, { "Content-Type": "text/plain" });
    response.end("Internal server error");
  }
  response.on("finish", function() {
    console.log(request.method, request.url, status, (performance.now() - startTime).toFixed(3), "ms");
  });
});
server.listen(new URL(BASE_URL).port, function() {
  console.log("> Ready on " + BASE_URL);
});
    `
        });

        const shell = nodebox.shell.create();

        const { id } = await shell.runCommand("node", ["index.js"]);

        const { url } = await nodebox.preview.getByShellId(id);
        previewIframe.setAttribute("src", url);
    </script>
</body>
</html>

package.json:

{
  "dependencies": {
    "playwright-chromium": "latest",
    "ts-node": "latest",
    "typescript": "latest"
  },
  "scripts": {
    "start": "node --experimental-specifier-resolution=node --loader=ts-node/esm script.ts"
  }
}

Expected

Capture all network traffic, including:

  • requests originating from iframes,
  • nested iframes,
  • service workers

Actual

Output:

Saving ./playwright-issue/index.html as index.html
Saving https://cdn.jsdelivr.net/npm/@codesandbox/nodebox/build/index.min.mjs as index.min.mjs
Saving ./playwright-issue/index.html as index.html
Saving https://nodebox-runtime.codesandbox.io/ as index.html

Less than what you would expect given a look at the network panel in the Chrome DevTools.


I'm going to compile some more of what I came across while researching this issue and what I tried below, but it seemed like this has been asked frequently enough with no particularly clear, working solution that I hope to at least document it better.

  • https://stackoverflow.com/questions/70645680/does-chromium-support-intercept-webworker-requests-via-cdp

    After digging in puppeteer source code and tracing raw protocol messages, it seems that calling page.setRequestInterception(true) also intercepts WebWorker requests, but these requests never issue any Network.requestWillBeSent events, which is known as page.request events in puppeteer, then WebWorker requests hang for waiting for a request.continue() which is usually called in the page.request event handler.

Mentions using Target.attachedToTarget.

Someone experiencing the same issue. Notes that they tried sending Target.setAutoAttach, which didn't appear to change anything for me. Also notes that they couldn't get Target.targetCreated to fire, which I also encountered.

Issue could be worked around by using the --disable-features flag. Linked resources also talks about how setAutoAttach shouldn't be relied on.

This is what my current code is based on, although I couldn't get browser.on('targetcreated') to fire. Someone towards the bottom mentions serviceWorkerTarget.setRequestInterception.

Currently, Playwright supports an experimental feature to inspect and route Network traffic made by Service Workers (in Chrome / Chromium). To learn more and enable, please read: https://playwright.dev/docs/service-workers-experimental.

Looks like there is some experimental Service Worker-specific network events that I could enable.

Further confirmation that I should be using the experimental Service Worker-specific network events (this section in particular).

Playwright seems to have worked around this/provided a fix: #1226

Something about iframes and auto-attaching? It sounds like with this PR, frames should auto-attach. The PR talks about frames as being popups, but surely it should apply to anything that creates a new frame.

Another suggestion to use setRequestInterception, which is deprecated and suggests you use Fetch.enable instead. I don't fully understand how I would programmatically attach to all of the frames.

That's everything that seems relevant.

@brianjenkins94
Copy link
Author

brianjenkins94 commented Mar 21, 2023

Using page.on gets the results I was expecting:

script.ts:

import { Browser, BrowserContext, chromium, Page } from "playwright-chromium";
import * as path from "path";
import * as url from "url";

let browser: Browser;

let context: BrowserContext;

export async function init() {
	browser ??= await chromium.launch({
		"args": ["--disable-web-security"],
		"devtools": true,
		"headless": false //!(Boolean(process.env["CI"]) || process.platform === "win32" || Boolean(process.env["DISPLAY"]))
	});

	context ??= await browser.newContext();

	const page = await context.newPage();

	page.on("response", async function(response) {
		const { hostname, pathname } = new URL(response.url());

		let baseName = path.basename(pathname);
		const pathName = path.join(pathname.slice(0, -baseName.length));
		const fileName = path.join(hostname, pathName, baseName || "index.html");
		baseName ||= path.basename(fileName);

		try {
			const data = await response.text();

			console.log("Saving " + response.url() + " as " + baseName);
		} catch (error) {
			console.error("Failed to get " + response.url());
		}
	});

	await page.goto(url.pathToFileURL(path.resolve("./index.html")).toString());
}

init();

Output:

Saving ./playwright-issue/index.html as index.html
Saving https://cdn.jsdelivr.net/npm/@codesandbox/nodebox/build/index.min.mjs as index.min.mjs
Failed to get https://nodebox-runtime.codesandbox.io/
Saving https://nodebox-runtime.codesandbox.io/runtime.3rzzwog6zor6ea7tx91qp9fsc4hu2gk.js as runtime.3rzzwog6zor6ea7tx91qp9fsc4hu2gk.js
Saving https://static.cloudflareinsights.com/beacon.min.js/vaafb692b2aea4879b33c060e79fe94621666317369993 as vaafb692b2aea4879b33c060e79fe94621666317369993
Saving https://nodebox-runtime.codesandbox.io/cdn-cgi/rum? as rum
Saving https://nodebox-runtime.codesandbox.io/cdn-cgi/challenge-platform/h/g/scripts/alpha/invisible.js?ts=1679356800 as invisible.js
Saving https://nodebox-runtime.codesandbox.io/cdn-cgi/challenge-platform/h/g/scripts/pica.js as pica.js
Saving https://nodebox-runtime.codesandbox.io/worker-mdt54o6o3wdjiy9oohw2u50tjvrnbwu.js as worker-mdt54o6o3wdjiy9oohw2u50tjvrnbwu.js
Failed to get https://nodebox-runtime.codesandbox.io/brotli_wasm_bg.wasm
Saving https://nodebox-runtime.codesandbox.io/cdn-cgi/challenge-platform/h/g/cv/result/7ab26fb22eb9b0db as 7ab26fb22eb9b0db
Saving https://em137ud-8000.nodebox.codesandbox.io/__csb_bridge/index.html as index.html
Saving https://static.cloudflareinsights.com/beacon.min.js/vaafb692b2aea4879b33c060e79fe94621666317369993 as vaafb692b2aea4879b33c060e79fe94621666317369993
Saving https://em137ud-8000.nodebox.codesandbox.io/__csb_bridge/__csb_bridge.pejfvee4jr58rxne7tafhqct72n0i88.js as __csb_bridge.pejfvee4jr58rxne7tafhqct72n0i88.js
Saving https://em137ud-8000.nodebox.codesandbox.io/cdn-cgi/rum? as rum
Failed to get https://em137ud-8000.nodebox.codesandbox.io/
Saving https://em137ud-8000.nodebox.codesandbox.io/__csb_runtime.js?t=1679362299452 as __csb_runtime.js

Although I am still curious to know if this can also be accomplished via a CDPSession.

@aslushnikov
Copy link
Collaborator

Although I am still curious to know if this can also be accomplished via a CDPSession.

@brianjenkins94 First and foremost, we strongly recommend using Playwright API. If you still want to explore DevTools protocol, then you can trace protocol calls that Playwright itself does with the DEBUG=pw:protocol env variable.

Hope this helps!

@brianjenkins94
Copy link
Author

On further inspection __csb_sw.3bocy7xe55uwh0mnxat47bexlaaujt0.js doesn't show up in the network tab or in the page.on results, but does show up in the sources tab.

Setting PW_EXPERIMENTAL_SERVICE_WORKER_NETWORK_EVENTS=1 doesn't appear to have any effect.

@brianjenkins94
Copy link
Author

brianjenkins94 commented Mar 30, 2023

This allowed me to get the URL of the service worker:

context ??= await browser.newContext();

context.on("serviceworker", download); // <--

const page = await context.newPage();

page.on("response", download);

@GrayedFox
Copy link

I have landed at this page after trying to intercept and route WebSocket traffic - it's unclear from the docs whether route() supports intercepting WebSocket requests or not? Note that I mean the actual request - not the messages sent between the WS client and server.

There is a warning about WebWorker traffic not being sniffed but nothing about the wss protocol not being supported:

image

Will also create an issue and link back to here for reference.

@VJO-JavaScript
Copy link

Use this to monitor network traffic from all nested Iframes.

async function monitorNetworkForTarget(target) {
const client = await target.createCDPSession();
await client.send('Network.enable');

client.on('Network.requestWillBeSent', (params) => {
console.log(Request sent for targetId: ${target._targetId}, URL: ${params.request.url});
});

client.on('Network.responseReceived', (params) => {
console.log(Response received for targetId: ${target._targetId}, URL: ${params.response.url});
});
}

browser.on('targetcreated', async (target) => {
console.log(New target created: ${target.type()}, ID: ${target._targetId});
await monitorNetworkForTarget(target);
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants