Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broken socks proxy for binary downloads #6274

Closed

Conversation

spacelatte
Copy link

@spacelatte spacelatte commented Jul 24, 2020

Hi there,

This patch allows BrowserFetcher module to use socks proxies defined in ALL_PROXY environment variable or proxy=socks5://... line in ~/.npmrc (or related configuration)

Fixes: #4401

Normally with unsupported proxy configuration install fails silently (yields empty .local-chromium directory). It still is btw :)
To solve this I can add default: case to throw an error as a fallback...

Ps. Btw I'm not TS dev, sometimes writing some JS in free time... Feedback appreciated :)

Using `ALL_PROXY=socks...` and `proxy=socks...` in `npmrc` causes installer to fail silently. This patch allows usage of socks proxies.

Fixes: puppeteer#4401
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

docs/api.md Show resolved Hide resolved
@@ -49,6 +49,7 @@
"devtools-protocol": "0.0.781568",
"extract-zip": "^2.0.0",
"https-proxy-agent": "^4.0.0",
"socks-proxy-agent": "^5.0.0",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually am not sure what to do here Found a few modules.

  1. https://www.npmjs.com/package/socks-proxy-agent Similar to current Https module
  2. https://www.npmjs.com/package/proxy-agent Seems versatile can be used instead of 2 separate modules.
  3. https://www.npmjs.com/package/simple-proxy-agent Low # of downloads
  4. https://www.npmjs.com/package/socks-proxy-agent-ts Even lower # of downloads

I've used 1st but 2nd seems better suited (PAC support et.al.) So directly pass logic to that module entirely.
Note: 1, 2, and 4 are from the same developer/publisher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that proxy-agent already implements all the different protocols and resolving ENV variables it sounds like the best option to me.

@@ -561,7 +562,7 @@ function httpRequest(

type Options = Partial<URL.UrlWithStringQuery> & {
method?: string;
agent?: ProxyAgent;
agent?: HttpsProxyAgent | SocksProxyAgent | any;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs explicitly said no any :) But I couldn't found a better way to handle those different types at once. Some pointers really appreciated :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case | any is not really needed.

...parsedProxyURL,
secureProxy: false,
};
switch (true) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO not to best way handling those (may be anti-pattern too)

Idk if TS had such thing:

let options = when {
  proxyURL.match(...) -> new SocksProxyAgent(...)
  url.startsWith(...) -> new HttpsProxyAgent(...)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch/case could be used for that but I think going with a classic if/elseif is the better way here to keep the code simple.

secureProxy: false,
};
switch (true) {
case proxyURL.match(/^socks(4|5|5h)?:/) !== null:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I firstly added multiple OR (||) statemens here but seemed ugly. So I've replaced it with regexp thinking not much performance hit since its only done once at the initial install phase.

Edit: Also tried not to do over-engineering (eg: /^socks(4|5[h]?)?: since its also longer and messy.)

@spacelatte
Copy link
Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@spacelatte spacelatte changed the title Add SOCKS proxy support for browser downloads (fixes #4401) fix: broken socks proxy downloads of binaries Jul 24, 2020
@spacelatte spacelatte marked this pull request as ready for review July 24, 2020 21:32
@spacelatte spacelatte changed the title fix: broken socks proxy downloads of binaries fix: broken socks proxy for binary downloads Jul 24, 2020
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes labels Oct 4, 2020
@spacelatte
Copy link
Author

@googlebot I fixed it.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels Oct 4, 2020
@spacelatte
Copy link
Author

Hi @mathiasbynens & @jackfranklin ,

Sorry to bother you. Is this looking good? Can you give some hints/pointers if it's not :)

Thanks!

@jackfranklin
Copy link
Collaborator

Hey @pvtmert thanks for the PR and I'm sorry for the lack of response.

This PR looks OK to me but I'm a bit worried now about the amount of different configurations you can apply for proxies, given we have HTTP_PROXY, HTTPS_PROXY and now potentially ALL_PROXY. I'm not sure if those are the clearest names and I wonder if we could think about that.

I'm also not familiar with SOCKS Proxy at all but what you've done seems reasonable. My only thought is how common it is (it may be extremely - as I say I'm unaware of it!) and if it's worth supporting in Puppeteer but given this is a relatively small change maybe that's fine.

@mathiasbynens wdyt?

Copy link
Contributor

@jschfflr jschfflr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvtmert Thanks for your contribution! Could you explain in which situation someone would be using a SOCKS proxy over a HTTPS proxy for example?
And regarding the naming (as @jackfranklin pointed out): Maybe calling the env variable SOCKS_PROXY would be more helpful to users.
Let me know what you think! Happy to get this landed :)

@spacelatte
Copy link
Author

spacelatte commented Sep 12, 2021

Hi, it's been a while & I probably need to rebase from main :)

Could you explain in which situation someone would be using a SOCKS proxy over a HTTPS proxy for example?

Yeah, for instance when you do the dynamic forwarding through SSH tunnel. Eg: via ssh -D 1080 it opens localhost:1080 as SOCKS5 proxy.

User does not need to setup a dedicated proxy server such as apache2 or polipo...

Edit: For the use-case part, the company I worked for at the time had intranet. Packages etc mirrored inside using internal registry. Since both the registry and the CI servers didn't had access, we've been able to gateway them through another "jumphost". Which is basically a SSH server plus using dynamic-forwarding.

And regarding the naming (as @jackfranklin pointed out): Maybe calling the env variable SOCKS_PROXY would be more helpful to users.

Yeah, can be. ALL_PROXY uses the URI scheme to see what kind of proxy it needs to use. For instance: socks5://localhost:1080 or http://localhost:3128...

Plus, ALL_PROXY is the standard ENV variable exported by GNOME, KDE & various other desktop environments...

Copy link
Contributor

@jschfflr jschfflr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like using the proxy-agent package that you recommended might be the best option to go forward because it already implements everything we could need here, right?

docs/api.md Show resolved Hide resolved
experimental/puppeteer-firefox/install.js Show resolved Hide resolved
@@ -49,6 +49,7 @@
"devtools-protocol": "0.0.781568",
"extract-zip": "^2.0.0",
"https-proxy-agent": "^4.0.0",
"socks-proxy-agent": "^5.0.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that proxy-agent already implements all the different protocols and resolving ENV variables it sounds like the best option to me.

@@ -561,7 +562,7 @@ function httpRequest(

type Options = Partial<URL.UrlWithStringQuery> & {
method?: string;
agent?: ProxyAgent;
agent?: HttpsProxyAgent | SocksProxyAgent | any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case | any is not really needed.

...parsedProxyURL,
secureProxy: false,
};
switch (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch/case could be used for that but I think going with a classic if/elseif is the better way here to keep the code simple.

@jschfflr
Copy link
Contributor

Hi, it's been a while & I probably need to rebase from main :)

Could you explain in which situation someone would be using a SOCKS proxy over a HTTPS proxy for example?

Yeah, for instance when you do the dynamic forwarding through SSH tunnel. Eg: via ssh -D 1080 it opens localhost:1080 as SOCKS5 proxy.

User does not need to setup a dedicated proxy server such as apache2 or polipo...

Edit: For the use-case part, the company I worked for at the time had intranet. Packages etc mirrored inside using internal registry. Since both the registry and the CI servers didn't had access, we've been able to gateway them through another "jumphost". Which is basically a SSH server plus using dynamic-forwarding.

And regarding the naming (as @jackfranklin pointed out): Maybe calling the env variable SOCKS_PROXY would be more helpful to users.

Yeah, can be. ALL_PROXY uses the URI scheme to see what kind of proxy it needs to use. For instance: socks5://localhost:1080 or http://localhost:3128...

Plus, ALL_PROXY is the standard ENV variable exported by GNOME, KDE & various other desktop environments...

Thanks for your explanations! I didn't know much about proxies before so this was definitely helpful :)

@jschfflr
Copy link
Contributor

Just for reference: #6577

@jschfflr
Copy link
Contributor

@pvtmert Gentle ping to see if there are any updates on this or if you want me to take a look at this :)

Copy link
Collaborator

@OrKoN OrKoN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I think we need to get it updated and resolve the previous discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants