Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip --proxy parameter is not been passed to the request. #9691

Open
1 task done
junqfisica opened this issue Mar 5, 2021 · 21 comments
Open
1 task done

Pip --proxy parameter is not been passed to the request. #9691

junqfisica opened this issue Mar 5, 2021 · 21 comments
Labels
project: vendored dependency Related to a vendored dependency state: blocked Can not be done until something else is done type: bug A confirmed bug or unintended behavior

Comments

@junqfisica
Copy link
Contributor

pip version

21.0.1

Python version

3.8

OS

Windows 10 / Linux

Additional information

The problem is easier to see if you are behind a proxy.

Description

Running pip install package --proxy http://my_proxy DO NOT pass the proxies to requests.Session.request(...).

As far as I could check this problem is there since version 20.2.2 at least. It was unnoticed until now because, before urllib update, it didn't check ssl_certificate so it didn't matter if your scheme was either http or https. Therefore, this bug is partially a root cause for the SSLError in recent versions of pip. Since users don't have the ability to pass the proxy to pip directly.

Expected behavior

I would expect that running pip install with --proxy, the proxy set here would be passed to the request, overriding any system configuration.

How to Reproduce

  1. Start a fresh venv with pip and setuptools only
  2. Then set a working proxy using system var. i.e: set https_proxy: http://your_proxy or export https_proxy: http://your_proxy for Linux.
  3. Now run pip passing a fake proxy, i.e: pip install numpy --proxy http://127.0.0.2:80 or pip.main(['install', '--proxy=http://127.0.0.2:80', 'numpy']) for esier debug.
  4. You should expect an error when installing, but if you have your proxy set correctly in the system it will NOT fail.

PS: You can also do the inverse process. Set a fake proxy on the system and run pip --proxy using a working one. Now you should get an error.

Output

Code of Conduct

  • I agree to follow the PSF Code of Conduct

I have already created a post about it here, including a working solution.

@junqfisica junqfisica added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Mar 5, 2021
@CrazyBoyFeng
Copy link

CrazyBoyFeng commented Mar 5, 2021

There is a situation that needs to be discussed:
--proxy is not completely inoperative. It works while system proxy is not set. Now our temporary solution to this bug is like this.

So it seems to be wrong in some if branch of determining to use the system proxy or parameter?
That is why I was checking merge_setting(), although I found nothing there.

@uranusjr
Copy link
Member

uranusjr commented Mar 5, 2021

There’s some discussion here as well, and we independenly reached the same conclusion. You’re on the right direction, but the relevant code is actually in the underlying requests package. Pip can try to overwrite this behaviour, but the solution would be pretty hack-y (basically overwrites the value after requests overwirtes it), a conclusion needs to be reached in requests before pip can decide how best to handle this.

@uranusjr uranusjr added project: vendored dependency Related to a vendored dependency state: needs discussion This needs some more discussion and removed S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Mar 5, 2021
@CrazyBoyFeng
Copy link

CrazyBoyFeng commented Mar 5, 2021

So this is a bug of requests, and handled by psf/requests#5677 psf/requests#5681 psf/requests#5735

To @junqfisica :
--proxy is passed to requests in session.send(proxies). session is build at:

if options.proxy:
session.proxies = {
"http": options.proxy,
"https": options.proxy,

@junqfisica
Copy link
Contributor Author

@CrazyBoyFeng You are right about the proxy been passed correctly when the system has no proxy set. However, the solution proposed here is quite useless for many users. If you need to pass a proxy is very likely you need it at the system level too, and as I mentioned before, many developers can't change that in working places environments.

To @CrazyBoyFeng :
This is working as expected:

if options.proxy:
session.proxies = {
"http": options.proxy,
"https": options.proxy,

Your reply made me think more about the problem and here is my latest update about it:

Okay, when the PipSession is created the proxy passed to it is correctly set from pip --proxy, which I will refer to as pip_proxy from now on. So, when PipSession request method is called:

def request(self, method, url, *args, **kwargs):
# type: (str, str, *Any, **Any) -> Response
# Allow setting a default timeout on a session
kwargs.setdefault("timeout", self.timeout)
# Dispatch the actual request
return super().request(method, url, *args, **kwargs)

adding a breakpoint there you can notice that self.proxies=pip_proxy, as expected.
Then the parent call is made:
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)
return resp

If you add a breakpoint at line 532, you can still notice that self.proxies = pip_proxy and local proxies are {}. So, self.merge_environment_settings is passed with proxies={}. This now brings us here:
def merge_environment_settings(self, url, proxies, stream, verify, cert):
"""
Check the environment and merge it with some settings.
:rtype: dict
"""
# Gather clues from the surrounding environment.
if self.trust_env:
# Set environment's proxies.
no_proxy = proxies.get('no_proxy') if proxies is not None else None
env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
for (k, v) in env_proxies.items():
proxies.setdefault(k, v)
# Look for requests environment configuration and be compatible
# with cURL.
if verify is True or verify is None:
verify = (os.environ.get('REQUESTS_CA_BUNDLE') or
os.environ.get('CURL_CA_BUNDLE'))
# Merge all the kwargs.
proxies = merge_setting(proxies, self.proxies)
stream = merge_setting(stream, self.stream)

on line 722 there is the call to merge_setting, and local parameter proxies is now your system_proxy, and merge_setting, gives the preference to proxies instead of self.proxies, causing the trouble.

Below I'm providing is a small script isolating the problem as a showcase:

from collections import OrderedDict, Mapping


def to_key_val_list(value):
    """Take an object and test to see if it can be represented as a
    dictionary. If it can be, return a list of tuples, e.g.,

    ::

        >>> to_key_val_list([('key', 'val')])
        [('key', 'val')]
        >>> to_key_val_list({'key': 'val'})
        [('key', 'val')]
        >>> to_key_val_list('string')
        Traceback (most recent call last):
        ...
        ValueError: cannot encode objects that are not 2-tuples

    :rtype: list
    """
    if value is None:
        return None

    if isinstance(value, (str, bytes, bool, int)):
        raise ValueError('cannot encode objects that are not 2-tuples')

    if isinstance(value, Mapping):
        value = value.items()

    return list(value)


def merge_setting(request_setting, session_setting, dict_class=OrderedDict):
    """Determines appropriate setting for a given request, taking into account
    the explicit setting on that request, and the setting in the session. If a
    setting is a dictionary, they will be merged together using `dict_class`
    """

    if session_setting is None:
        return request_setting

    if request_setting is None:
        return session_setting

    # Bypass if not a dictionary (e.g. verify)
    if not (
            isinstance(session_setting, Mapping) and
            isinstance(request_setting, Mapping)
    ):
        return request_setting

    merged_setting = dict_class(to_key_val_list(session_setting))
    merged_setting.update(to_key_val_list(request_setting))

    # Remove keys that are set to None. Extract keys first to avoid altering
    # the dictionary during iteration.
    none_keys = [k for (k, v) in merged_setting.items() if v is None]
    for key in none_keys:
        del merged_setting[key]

    return merged_setting


pip_proxy = {'http': 'http://127.0.0.2:80', 'https': 'http://127.0.0.2:80'}
system_proxy = {'http': 'http://127.1.1.1:80', 'https': 'https://127.1.1.1:80'}

print(f"Default merge: {merge_setting(request_setting=system_proxy, session_setting=pip_proxy)}")
# or ??
print(f"Swap settings:? {merge_setting(session_setting=system_proxy, request_setting=pip_proxy)}")

system_proxy = {}  # no proxy in the system.
print(f"No system proxy: {merge_setting(request_setting=system_proxy, session_setting=pip_proxy)}")
Output:

Default merge: OrderedDict([('http', 'http://127.1.1.1:80'), ('https', 'https://127.1.1.1:80')])
Swap settings:? OrderedDict([('http', 'http://127.0.0.2:80'), ('https', 'http://127.0.0.2:80')])
No system proxy: OrderedDict([('http', 'http://127.0.0.2:80'), ('https', 'http://127.0.0.2:80')])

So, it seems that request.Session gives priority to request_setting, which in this case is the system settings. This may be related to a bug, as mentioned by @uranusjr here, but I'm not entirely sure about it. Also, I'm not sure if pip has much of a saying here, correct me if I'm wrong.

So, I still think would be a good idea to pip to explicitly pass the proxy to the request , avoiding this confusion.

I would like to reinforce, as a developer behind proxy if you use pip with pip --proxy it's because you can't rely on the system settings solely (the way proxy is fetched from it can change depending on OS, messing with the scheme). In my view, it's quite important that when using pip with proxy, this proxy should have a higher priority.

Ps: In my exemple, I gave different ips to the proxy just to highlight the problem, but usually this is not a real scenario, the critical point there is actually the scheme.

@junqfisica
Copy link
Contributor Author

Okay, I have another solution for this error.
Line:

proxies = proxies or {}

should be replaced by:
proxies = proxies or self.proxies

This also fixes the problem.

@uranusjr
Copy link
Member

uranusjr commented Mar 6, 2021

I think we should wait on psf/requests#5735 first. It would fix this problem for us (and other requests users) if I understand it correctly. We can ask requests to make a new release before we upgrade the bundled requests for the next pip release in April and receive the fix.

@CrazyBoyFeng
Copy link

CrazyBoyFeng commented Mar 6, 2021

My conclusion is: this is a bug of requests (psf/requests#5677), pip should not deal with it.
pip/src/pip/_vendor/requests/sessions.py this file is belong to the requests library.

mateusduboli has already proposed a patch PR one month ago: psf/requests#5735 , but PSF has not yet approved it.

In that patch, mateusduboli adjusted the order of merge setting, finally make parameters higher priority than environment.

If the PSF does not approve it in the end, or the processing is too slow, maybe we can consider override the default parameter of session, just like your solution before.

@junqfisica
Copy link
Contributor Author

@CrazyBoyFeng and @uranusjr I wouldn't call it override, since request actually gives you the ability to pass it as a parameter, but yeah I really think this is the best way. Also, I don't think request.Session will entirely fix the problem here, since it's more related to how they fetch proxies from the system env.

Let me try to make my point clear:

request.Session tries to automatically get proxies from the system if no proxy is provided to it, which is great. Unfortunately, this solution is not bulletproof and many users need the ability to bypass it and set the proxy directly in the request. So, basically, what I'm trying to say is that we can't trust the system parameters for setting the proxy.

Also is quite misleading if pip gives you the option to set a proxy and in the end, rely on request.Session to set it. For me is clear that this specific bug is caused because PipSession is not given its proxy to the parent request.Session .

@uranusjr
Copy link
Member

uranusjr commented Mar 6, 2021

For me is clear that this specific bug is caused because PipSession is not given its proxy to the parent request.Session.

pip sets the proxies here:

# Handle configured proxies
if options.proxy:
session.proxies = {
"http": options.proxy,
"https": options.proxy,
}

where session.proxies is a documented requests.Session attribute that should be preferred over environment varialbes. If that’s true, PipSession would not need to pass the proxy values on every Session.request() call. Turns out that’s not actually the case, which is a requests bug. The bug can be avoided by PipSession explicitly passing the proxies argument, but calling this a bug is like saying it’s your fault you’re using an API that behaves incorrectly, which is not a correct characterisation IMO.

@CrazyBoyFeng
Copy link

CrazyBoyFeng commented Mar 6, 2021

request.Session ... is more related to how they fetch proxies from the system env.

parse system env proxies wrongly is another bug of cpython, to be precise, it is in cpython's urllib library.

PipSession is not given its proxy to the parent request.Session.

as i post above, pip build pipSession with options, and the options contain proxies. and pipSession inherits requests.Session. so pipSession is actually a requests.Session with some more functions.
req_command build pipSession with options, means it build a requests.Session with some more funtions and passes options to the constructor of requests.Session.

so, the ultimate problem is that requests.Session does not handle these parameters correctly.

besides, self.setdefault() is actually overriding. default value is set while pipSession/requests.Session is constructing. but requests.Session use system env value instead of parameter (this is a bug). then if you use self.setdefault(), it will override the default value that has been set yet in the construct.
so, overriding is reasonable only if requests do not fix it.

@pradyunsg pradyunsg added state: blocked Can not be done until something else is done type: bug A confirmed bug or unintended behavior and removed state: needs discussion This needs some more discussion labels Mar 6, 2021
@pradyunsg
Copy link
Member

Looks like the concensus here is that it's a requests bug and needs to be fixed there. I've marked this as "blocked" on the basis of "if it gets fixed in our dependency, we get that fix too".

session.proxies is a documented requests.Session attribute that should be preferred over environment varialbes. If that’s true, PipSession would not need to pass the proxy values on every Session.request() call. Turns out that’s not actually the case, which is a requests bug. The bug can be avoided by PipSession explicitly passing the proxies argument, but calling this a bug is like saying it’s your fault you’re using an API that behaves incorrectly, which is not a correct characterisation IMO.

Do we want to work around the requests bug, or do we want to keep providing broken behavior to our users?

TBH, I'm ambivalent. The outcome I do like is $corp folks whose workflows have been disrupted by this contribute to requests, and help get the bug fixed; but maybe I'm too optimistic. :)

@uranusjr
Copy link
Member

uranusjr commented Mar 6, 2021

We still have at least a month to go before a release is made anyway, so I’d wait for requests first (possiblty pinging some of the maintainers to try get the ball rolling). We can talk about workarounds when it’s clear we’re not going to get an upstream fix.

@junqfisica
Copy link
Contributor Author

For me is clear that this specific bug is caused because PipSession is not given its proxy to the parent request.Session.

pip sets the proxies here:

# Handle configured proxies
if options.proxy:
session.proxies = {
"http": options.proxy,
"https": options.proxy,
}

where session.proxies is a documented requests.Session attribute that should be preferred over environment varialbes. If that’s true, PipSession would not need to pass the proxy values on every Session.request() call. Turns out that’s not actually the case, which is a requests bug. The bug can be avoided by PipSession explicitly passing the proxies argument, but calling this a bug is like saying it’s your fault you’re using an API that behaves incorrectly, which is not a correct characterisation IMO.

I completely agree that the bug in this situation is on the side of resquest.Session. That's why I proposed this fix. Because resquest.Session is overriding the parameter self.proxies set here:

# Handle configured proxies
if options.proxy:
session.proxies = {
"http": options.proxy,
"https": options.proxy,
}

by the env_proxies, assuming session is an object of PipSession. Therefore, the only fix pip can apply on its end is to explicitly pass the proxies to the request method.

This method will always override self.proxies at line 722 if it finds env_proxies, unless you give a proxies parameter to it.

def merge_environment_settings(self, url, proxies, stream, verify, cert):
"""
Check the environment and merge it with some settings.
:rtype: dict
"""
# Gather clues from the surrounding environment.
if self.trust_env:
# Set environment's proxies.
no_proxy = proxies.get('no_proxy') if proxies is not None else None
env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
for (k, v) in env_proxies.items():
proxies.setdefault(k, v)
# Look for requests environment configuration and be compatible
# with cURL.
if verify is True or verify is None:
verify = (os.environ.get('REQUESTS_CA_BUNDLE') or
os.environ.get('CURL_CA_BUNDLE'))
# Merge all the kwargs.
proxies = merge_setting(proxies, self.proxies)

@pfmoore
Copy link
Member

pfmoore commented Mar 6, 2021

I completely agree that the bug in this situation is on the side of resquest.Session. That's why I proposed this fix.

But that is a change to our vendored copy of requests, and that's not how we handle vendoring. You need to propose that fix to the requests project, and we'll pick it up when they release a fixed version (maybe you have already posted a PR to requests, I haven't been following this discussion closely).

@junqfisica
Copy link
Contributor Author

I completely agree that the bug in this situation is on the side of resquest.Session. That's why I proposed this fix.

But that is a change to our vendored copy of requests, and that's not how we handle vendoring. You need to propose that fix to the requests project, and we'll pick it up when they release a fixed version (maybe you have already posted a PR to requests, I haven't been following this discussion closely).

@pfmoore I just wanted to point the problem there, I know pip can't change the vendor directly. The proposed solution for pip is here, in the case you don't want to wait for request.Session
About a PR, no I didn't submit anything to requests. Maybe, I or some of you could open a bug issue there? Based on what @uranusjr said

session.proxies is a documented requests.Session attribute that should be preferred over environment variables.

Which definitely is not happing.

@uranusjr
Copy link
Member

uranusjr commented Mar 8, 2021

I would encourage putting more effort pushing requests to merge and release a fix. Arguing about semantics here is not very helpful for anything.

junqfisica added a commit to junqfisica/requests that referenced this issue Nov 23, 2021
This change will make the hierarchy request **kwargs -> Session args -> environment to be respected. This also solves problems with pip as discussed [here](pypa/pip#9691 (comment))  and  psf#5735
junqfisica added a commit to junqfisica/pip that referenced this issue Nov 24, 2021
After waiting a long time for a fix on the side of the request that could fix pip install issues with proxy as already discussed in pypa#9691. It seems that such changes as mentioned in psf/requests#5735 will take a while to happen, probably only on version 3 of the request. 

Therefore, I'm suggesting a change on the pip side as already proposed in [issuecomment-79051687]pypa#9568 (comment) and [issuecomment-939373868]psf/requests#5735 (comment) 

I think it's important that pip address this issue asap since the newer versions of pip are not working properly behind a proxy.
@meiswjn
Copy link

meiswjn commented Nov 29, 2021

Is this similar to #10691? From what I see in the debugger, the proxy is passed to the request's session, just not applied.

@CrazyBoyFeng
Copy link

CrazyBoyFeng commented Nov 29, 2021

@meiswjn According to previous research #9568, this is a design flaw in the third-party library requests. It has been committed as psf/requests#5735.

However, the maintainers of requests believe that this would break backwards compatibility, so no change is being considered for now.
The maintainers of requests believe that the current use of requests.session.proxy could be similar to pip.session.timeout by setting a proxy parameter for each request.

@junqfisica has been following up on this issue. There is a #10680 already in progress.

@ByteJuggler
Copy link

ByteJuggler commented Feb 4, 2022

Is fixing this at an impasse then, it seems? How awkward for users (I ran into this again today.)

Maybe a way forward would be to perhaps remove the --proxy option from pip (as it doesn't always work). And/or output a warning about possibly broken behaviour.

@pfmoore
Copy link
Member

pfmoore commented Feb 4, 2022

@ByteJuggler There have been 2 solutions proposed in the discussions here (there may be others, I only did a quick skim):

  • Someone flags this to requests and gets the problem fixed at their end.
  • There's a suggested pip fix which might work, which needs someone to produce a PR

All that's lacking is for someone to step up and do the necessary work. Is it something you could help with?

@junqfisica
Copy link
Contributor Author

@pfmoore I have made a PR to pip in #10680, which is still waiting for approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: vendored dependency Related to a vendored dependency state: blocked Can not be done until something else is done type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants