Proxy for page #678

ivibe · 2017-09-04T13:15:55Z

Hi!

Could someone tell me, whether there's a possibility to set proxy not only for a chromium instance, but also for a page?

So the current solution is:
const browser = await puppeteer.launch({ args: [ '--proxy-server=127.0.0.1:9876' ] });

Desired solution in my case is something like this:
const page = await browser.newPage({ args: [ '--proxy-server=127.0.0.1:9876' ] });

With proxy per page there's a possibility to run a single chrome instance, but use different proxies depending on page.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

JoelEinbinder · 2017-09-04T21:41:13Z

You can use request interception to forward requests from each page to the correct proxy.

ivibe · 2017-09-05T11:28:36Z

@JoelEinbinder could you show an example how can I forward request through SOCKS proxy using request interception?

aslushnikov · 2017-09-05T22:28:49Z

@ivibe unfortunately, this is not possible for SOCKS proxy, you'll have to launch a separate browser instance for this case.

Out of curiosity, why would you need this?

ivibe · 2017-09-06T07:58:40Z

It's a pity.

My use case is web-scraping. Web-servers can block IPs or the proxy server can become inactive, that's why relatively often I need to change proxy.

Of course, I would like to avoid a perfomance hit related to launching many instances of chromium. Is there any chance, that such functionality (i.e. dynamic changing proxy) will be implemented in future chromium releases?

ks07 · 2017-09-06T09:24:47Z

+1

This feature would be useful for me too, as I'm currently forced to launch multiple chromium instances if I need to access multiple URLs via different proxies. To add to what @ivibe suggested for use-cases, this could also be useful if you need to access resources behind firewalls with no common proxy that can pass through both. Alternatively, this would be useful if you wanted to test or screenshot your web application from multiple sources - e.g. if page content changes based on the visitor's IP's geolocation.

If there is a way to workaround this as suggested by @JoelEinbinder, perhaps the SOCKS requirement could be alleviated by setting up a proxy in the middle to allow an HTTP proxy interface to the SOCKS connection. (e.g. https://superuser.com/questions/423563/convert-http-requests-to-socks5)

Khady · 2017-09-06T11:51:56Z

unfortunately, this is not possible for SOCKS proxy, you'll have to launch a separate browser instance for this case.

What are the supported proxy for this case?

blue-cp · 2017-09-09T19:51:12Z

+1
we have exact same use case. this will be a very useful feature.
if we can set http proxy per page that would be great.

fhmd4k · 2017-10-14T21:10:19Z

I think you can capture every request to use http(s) proxy!

fhmd4k · 2017-10-14T21:12:34Z

Socks proxy affect to the whole browser(all tabs), you only run different browser(different userDataDir) instance to do.

Khady · 2017-10-15T01:38:37Z

One more reason to get this feature is the absence of proxy pac file support in headless mode: https://bugs.chromium.org/p/chromium/issues/detail?id=765245

ivibe · 2017-10-15T06:21:17Z

@fhmd4k even if we consider only regular http(s) proxy, that would be nice to see an example of using it through capturing requests

barbolo · 2017-11-10T10:45:52Z

Hi, I'm working around on this issue and I'm already able to make this work with HTTP websites. For HTTPS websites I'm still facing some issues.

It may sound a bit hacky and complex... hmm... that's because it really is! But hey, it works.

The idea is to create a local Downstream Proxy that parses the address of the Upstream Proxy from the headers of the page's requests.

(image credits: https://www.fedux.org/articles/2015/04/11/setup-a-proxy-with-ruby.html)

You can use something like this per page:

page.setExtraHTTPHeaders({proxy_addr: "200.11.11.11", proxy_port: 999});
// 200.11.11.11:999 is the address of your final proxy you want to use (the Upstream Proxy).

You should start chrome using --proxy-server=downstream-proxy-address.

Then, your custom Downstream Proxy should extract those proxy headers and forward the request for the proper Upstream Proxy.

For HTTPS requests, the issue I'm facing is to intercept the CONNECTION requests when the secure communication tunnel is being created. In this case the proxy headers are not sent by Chrome and I'm figuring out another way of transmitting the proxy information to the Downstream Proxy without needing to hack chrome(/chromium) itself.

The Downstream Proxy should be a very lightweight process running in your operation system. For reference, the proxy I've built consumes about 20MB of system's memory. I won't share the proxy code for now because it currently exposes some security risks for my application.

tzellman · 2017-11-21T00:25:07Z

I could be wrong, but I believe SOCKS5 is already supported: http://www.chromium.org/developers/design-documents/network-stack/socks-proxy

--proxy-server="socks5://myproxy:8080"
--host-resolver-rules="MAP * ~NOTFOUND , EXCLUDE myproxy"

barbolo · 2017-11-21T11:12:15Z

@tzellman that sets a single proxy for chrome and not for each page (tab) of chrome.

gwaramadze · 2017-12-13T12:23:23Z

@barbolo I believe this workaround applies to most headless browsers. We have set up similar stack with PhantomJS:
client => haproxy => phantomjs => server

Same story, works great for HTTP resources but fails to route HTTPS as there is no access to additional headers, querystring, nothing... We are even considering SSL termination but that's just soooo much hacking to achieve such a simple thing :/

Did you have any luck with working around HTTPS requests?

barbolo · 2017-12-13T17:29:13Z

@gwaramadze Yes, I've found some ways of making this scheme work with HTTPS and I'll share how I'm currently doing it.

Like I've said in the previous comment, the custom headers with the proxy information were ignored by Chrome when communicating with the downstream proxy server. However the user-agent header was being transmitted.

The first approach I tried was to encapsulate the proxy information in a JSON string sent as the user-agent header. For example, I would change the Chrome user-agent for each tab to look like this:

var userAgent = JSON.stringify({
  "user-agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
  "proxy-addr" : "111.111.111.111",
  "proxy-port" : "9999",
});
page.setUserAgent(userAgent);

That way I can intercept the user-agent in the Downstream Proxy and parse the Proxy attributes from it.

The problem is that the user-agent is also encrypted in the connection and sent directly to the final HTTP server. It's impossible to intercept it and fix it before sending it to the HTTP server. So the final HTTP server would receive a bizarre user agent string that would include your proxy connection information. If that is not a problem for you, that will work. But for me it could be a problem.

So what I ended up doing was to create a list with thousands of user agent strings and for each new tab:

Choose a user agent string from the list and set it in as the page user agent
Send a request to the downstream proxy specifying that requests with this user agent string should use a proxy i was also specifying.
Send a new request with this tab
In the downstream proxy, find which proxy should be used based on the user agent string.

That's how I'm doing it now. The steps 2 and 4 implies in reprogramming the downstream proxy.

Another approach that should work is to make changes to the source of chromium network to allow other headers to be transmitted. But that would be more maintenance work in the long term.

gwaramadze · 2017-12-13T18:44:01Z

@barbolo Thanks, this is quite interesting hack. I wouldn't want to meddle with user agents too much as they might be checked by anti-scraping algorithms.

barbolo · 2017-12-13T20:20:54Z

@gwaramadze yes. That's why I'm using the other approach. For instance, you have thousands of real chrome user agents available for recent versions of the chrome browser.

Ogofo · 2018-01-11T14:02:24Z

Is this feature in active development? Got the same issue and I guess the Use-Case is widely spread.

chaims · 2018-01-22T08:55:19Z

+1
I have same use case ! waiting for a solution !

qingpengchen2011 · 2018-01-26T02:03:44Z

+1

dvssmgk · 2018-01-29T16:28:33Z

+1 Even I have similar use case. Waiting for the Solution with capability to set Proxy per page.

barbolo · 2018-01-29T16:37:30Z

I don't think Puppeteer has anything to do with this issue. The problem is with Chrome, which doesn't provide any API to configure proxy.

You can either use a workaround like I've suggested above or you can build Chromium with a modified Network Stack, which I don't see as a good option.

flyxl · 2018-01-31T14:57:27Z

I'm using request interception to forwarding request:

 async newPage(browser) {
        let page = await browser.newPage();

        await page.setRequestInterception(true);
        page.on('request', async interceptedRequest => {
            const resType = interceptedRequest.resourceType();
            if (['document', 'xhr'].indexOf(resType) !== -1) {
                const url = interceptedRequest.url();
                const options = {
                    uri: url,
                    method: interceptedRequest.method(),
                    headers: interceptedRequest.headers(),
                    body: interceptedRequest.postData(),
                    usingProxy: true,
                };
                const response = await this.fetch(options);

                interceptedRequest.respond({
                    status: response.statusCode,
                    contentType: response.headers['content-type'],
                    headers: response.headers,
                    body: response.body,
                });
            } else {
                interceptedRequest.continue();
            }
        });
        return page;
    }

    fetch(options) {
        // let baseUrl = options.baseUrl || request.globals.baseUrl;
        let isHttps;
        if (options.uri.startsWith('https')) {
            isHttps = true;
        } else if (options.uri.startsWith('http')) {
            isHttps = false;
        }

        if (options.usingProxy || process.env.NODE_ENV === 'production') {
            options.agentClass = isHttps ? Sock5HttpsAgent : Sock5HttpAgent;
            options.agentOptions = {
                socksHost: 'localhost', // Defaults to 'localhost'.
                socksPort: 9050 // Defaults to 1080.
            }
        }

        options.resolveWithFullResponse = true;

        return request(options);
    }

Please note that In my case I just forward document and xhr request and ignore baseUrl of request options and I use request-promise-native instead of request. You can replace the proxy settings in function fetch.

joelgriffith · 2018-02-05T02:59:21Z

You can use a project like browserless and configure per-request proxies via query-params. This, coupled with the page.authenticate method, allow for pretty flexible usage.

browserless is here
page.authenticate is here

banxian · 2018-03-31T10:07:06Z

@flyxl I used your code in project to forward all request to proxy, but it introduced some 502 error from server. sure directly add proxy config in launch options works fine.
I guess the problem is triggered by resorted request order, and conflict to servers logical.

mathiasbynens · 2020-06-03T14:36:15Z

Chromium tracking issue: https://bugs.chromium.org/p/chromium/issues/detail?id=1090797

mikespnu · 2020-06-06T00:26:48Z

Anybody having issues with Puppeteer-page-proxy?
I'm getting the following error:

dist/source/create.js:155
                    yield item;
                    ^^^^^

SyntaxError: Unexpected strict mode reserved word
    at createScript (vm.js:80:10)
    at Object.runInThisContext (vm.js:139:10)
    at Module._compile (module.js:617:28)

gajus · 2020-06-06T00:48:40Z

Anybody having issues with Puppeteer-page-proxy?
I'm getting the following error:

dist/source/create.js:155
                    yield item;
                    ^^^^^

SyntaxError: Unexpected strict mode reserved word
    at createScript (vm.js:80:10)
    at Object.runInThisContext (vm.js:139:10)
    at Module._compile (module.js:617:28)

What Node.js version?

mikespnu · 2020-06-06T01:31:37Z

v8.12.0

…

On Fri, Jun 5, 2020 at 8:48 PM Gajus Kuizinas ***@***.***> wrote: Anybody having issues with Puppeteer-page-proxy? I'm getting the following error: dist/source/create.js:155 yield item; ^^^^^ SyntaxError: Unexpected strict mode reserved word at createScript (vm.js:80:10) at Object.runInThisContext (vm.js:139:10) at Module._compile (module.js:617:28) What Node.js version? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#678 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACV3XANDGR23CBNOMXKTEJLRVGG7RANCNFSM4DZPUCKQ> .

mikespnu · 2020-06-07T11:06:07Z

Node needed to be updated. It's working fine

runningabcd · 2020-09-23T09:25:13Z

厉害了，python啥时候有？
wow,python no support

Nisthar · 2020-11-25T18:24:40Z

EDIT:
It's possible with puppeteer-page-proxy.
It supports setting a proxy for an entire page, or if you like, it can set a different proxy for each request.
Repository:
https://github.com/Cuadrix/puppeteer-page-proxy

Is this library still working for you?

…ontext (puppeteer#678)

…ontext Issue: puppeteer#678 Example: const browser = await puppeteer.launch(); const context = await browser.createIncognitoBrowserContext('myproxy.com:3128'); const page = await context.newPage() await page.authenticate({username: 'foo', password: 'bar' }); await page.goto('https://google.com'); await browser.close();

…ontext Issue: puppeteer#678 Example: (async () => { const browser = await puppeteer.launch(); const context = await browser.createIncognitoBrowserContext('myproxy.com:3128'); const page = await context.newPage() await page.authenticate({username: 'foo', password: 'bar' }); await page.goto('https://google.com'); await browser.close(); })();

…erContext Issue: puppeteer#678

…ontext (#7516) Example: (async () => { const browser = await puppeteer.launch(); const context = await browser.createIncognitoBrowserContext('myproxy.com:3128'); const page = await context.newPage() await page.authenticate({username: 'foo', password: 'bar' }); await page.goto('https://google.com'); await browser.close(); })(); Issue: #678

radiolondra · 2022-05-25T08:50:32Z

@Nisthar
AFAIK puppeteer-page-proxy lib has some issues.
Personally I tried to use it with my proxy, but I have had problems, for example, going to https://whatismyipaddress.com/ (and other similar links) to simply get the proxy IP address in the proxied page. It fails also when Google sends reCaptcha while scraping (and not only with Google).
Instead everything works fine using the standard puppeteer launch arg '--proxy-server'.
The lib seems to be not actively maintained, even answering the issues.

Kikobeats · 2022-08-13T16:58:20Z

You can use proxy per context, that in the end it's going to be pretty similar
https://pptr.dev/next/api/puppeteer.browsercontextoptions

aslushnikov added feature upstream labels Jan 11, 2018

tomgallagher mentioned this issue Feb 2, 2018

Using a proxy per page with Puppeteer #1948

Closed

ebidel mentioned this issue Feb 2, 2018

Dynamic proxy chage in page #1861

Closed

jozsi mentioned this issue Jul 29, 2020

Package to integrate with proxy per page berstend/puppeteer-extra#269

Closed

ysmood mentioned this issue Apr 27, 2021

Redesign the hijack by using layer5 proxy go-rod/rod#395

Open

joone pushed a commit to joone/puppeteer that referenced this issue Aug 23, 2021

feat: add proxy and bypass list parameters to createIncognitoBrowserC…

7905ae9

…ontext (puppeteer#678)

joone pushed a commit to joone/puppeteer that referenced this issue Aug 23, 2021

feat: add proxy and bypass list parameters to createIncognitoBrowserC…

1418f23

…ontext (puppeteer#678)

joone pushed a commit to joone/puppeteer that referenced this issue Sep 17, 2021

feat: replace normal arguments with an object in createIncognitoBrows…

99de59e

…erContext Issue: puppeteer#678

This was referenced May 30, 2022

chore(main): release 1.0.0 #8407

Closed

chore(main): release 1.0.0 #8411

Closed

This was referenced May 30, 2022

chore(main): release 1.0.0 #8414

Closed

chore(main): release 1.0.0 #8418

Closed

chore(main): release puppeteer 14.2.0 #8420

Closed

This was referenced May 30, 2022

chore(main): release 1.0.0 #8425

Closed

chore(main): release 1.0.0 #8427

Closed

This was referenced Jun 13, 2022

chore(main): release 15.0.0 #8513

Closed

chore(main): release 15.0.0 #8514

Closed

chore(main): release 15.0.0 #8516

Closed

benzntech mentioned this issue Jan 4, 2023

a proxy setting for plugin gencay/vscode-chatgpt#32

Closed

sequencerr mentioned this issue Feb 2, 2024

[Bug]: Incognito browser context doesn't respect proxyServer option #8820

Open

Proxy for page #678

Proxy for page #678

Comments

ivibe commented Sep 4, 2017

JoelEinbinder commented Sep 4, 2017

ivibe commented Sep 5, 2017 • edited

aslushnikov commented Sep 5, 2017

ivibe commented Sep 6, 2017

ks07 commented Sep 6, 2017 • edited

Khady commented Sep 6, 2017

blue-cp commented Sep 9, 2017 • edited

fhmd4k commented Oct 14, 2017

fhmd4k commented Oct 14, 2017

Khady commented Oct 15, 2017 via email

ivibe commented Oct 15, 2017

barbolo commented Nov 10, 2017 • edited

tzellman commented Nov 21, 2017

barbolo commented Nov 21, 2017

gwaramadze commented Dec 13, 2017

barbolo commented Dec 13, 2017

gwaramadze commented Dec 13, 2017

barbolo commented Dec 13, 2017

Ogofo commented Jan 11, 2018

chaims commented Jan 22, 2018

qingpengchen2011 commented Jan 26, 2018

dvssmgk commented Jan 29, 2018

barbolo commented Jan 29, 2018

flyxl commented Jan 31, 2018 • edited by ebidel

joelgriffith commented Feb 5, 2018

banxian commented Mar 31, 2018 • edited

mathiasbynens commented Jun 3, 2020

mikespnu commented Jun 6, 2020

gajus commented Jun 6, 2020

mikespnu commented Jun 6, 2020 via email

mikespnu commented Jun 7, 2020

runningabcd commented Sep 23, 2020

Nisthar commented Nov 25, 2020

radiolondra commented May 25, 2022

Kikobeats commented Aug 13, 2022

ivibe commented Sep 5, 2017 •

edited

ks07 commented Sep 6, 2017 •

edited

blue-cp commented Sep 9, 2017 •

edited

barbolo commented Nov 10, 2017 •

edited

flyxl commented Jan 31, 2018 •

edited by ebidel

banxian commented Mar 31, 2018 •

edited