Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poetry not respecting order of evaluation for repository sources #2339

Closed
2 tasks done
TheFriendlyCoder opened this issue Apr 23, 2020 · 23 comments · Fixed by #3406
Closed
2 tasks done

Poetry not respecting order of evaluation for repository sources #2339

TheFriendlyCoder opened this issue Apr 23, 2020 · 23 comments · Fixed by #3406
Labels
area/repo Meta-issues for the repository/forge itself kind/bug Something isn't working as expected

Comments

@TheFriendlyCoder
Copy link

TheFriendlyCoder commented Apr 23, 2020

  • I am on the latest Poetry version.

  • I have searched the issues of this repo and believe that this is not a duplicate.

  • OS version and name: Mac OS 10.15.3

  • Poetry version: 1.0.5

Issue

So, I am trying to configure a pyproject.toml file that supports pulling files from pypi.org whenever possible (ie: because on my site the performance is better) but will support pulling packages that aren't found on pypi.org from a secondary location (ie: a private pypi repo). So to start with I added a section like what follows to a sample toml file for testing:

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'
default = true

[[tool.poetry.source]]
name = "private"
url = "https://private/server/url
secondary = true

[tool.poetry.dependencies]
python = "^3.6"
sphinx = "*"

Based on the docs, I was under the impression that by putting default=true on the first repo config, and secondary=true on the other one, that my goal would be achieved: if a package exists on pypi.org it'd pull it from there and if not it'd pull it from my secondary repo. However, that does not seem to be the case.

For the sake of this discussion I am using average performance metrics from running poetry lock on this toml file and comparing the length of time it takes to resolve the dependencies of the one Python package mentioned (ie: sphinx which is a standard Python package available on pypi.org).

So, running poetry lock on this file using the configuration I posted above takes about 20-25 seconds to complete. For comparison purposes I simply removed the second source definition from the toml file giving me:

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'
default = true

[tool.poetry.dependencies]
python = "^3.6"
sphinx = "*"

Re-running the same common against this toml file takes between 2 and 3 seconds to complete - a full order of magnitude difference in performance. Now, for my 3rd and final test I removed the first repo source leaving just the second one, giving me the following configuration block:

[[tool.poetry.source]]
name = "private"
url = "https://private/server/url
secondary = true

[tool.poetry.dependencies]
python = "^3.6"
sphinx = "*"

Re-running the same command again takes about the same 20-25 seconds to complete (NOTE: our private pypi repository is also a mirror of the public pypi.org repo so I can compare the same package versions and resolution times in all cases).

So, based on these results, the only explanation I can see is that when the second repository definition is in place the public pypi.org repo is being ignored or something and the secondary / private repo is still being accessed for some reason.

My expectation here is that poetry would iterate over the various repos defined in the toml file in order of declaration. For each package listed in the toml file it should then try to access / index each package against each repository stopping when it finds the first match. If this were true then I would have expected my first test case above to perform identically to the second one because the package I'm testing (sphinx) should exist on pypi.org, and all of it's transitive dependencies will be available there as well. However this is obviously not the case.

So my question is - is this a bug or is there some additional configuration options I need to specify to get this to work as I was expecting?

@TheFriendlyCoder TheFriendlyCoder added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Apr 23, 2020
@TheFriendlyCoder
Copy link
Author

For kicks I just tried to run a few similar / comparable tests using pip directly (which poetry makes use of under the hood), making use of the index-url and extra-index-url parameters and I seem to be getting very similar results. Making the main index URL my private repo is slow, as is making my private repo an "extra" index and leaving the pypi.org as the default index URL, but when I use pypi.org exclusively the performance is better.

So maybe this is some odd limitation of pip and it just so happens that poetry is sharing that functionality.

I wonder if this is something new that has changed in a recent version of pip / poetry or something. I am pretty sure I've tried scenarios like this in the past with different results. I don't have detailed metrics from those days-gone-past though so I can't say for sure. :(

Still - any input anyone might have on whether there is anything that can be done to help mitigate this problem would be appreciated.

@TheFriendlyCoder
Copy link
Author

NOTE: if it helps, I could create a map / list of packages that I know are exclusively only available on my private pypi server and thus could associate those packages with my mirror, and all others to the main pypi.org site. Not sure if that is possible or not though.

@madig
Copy link

madig commented Oct 8, 2020

I'm seeing the same (using a private repo as a secondary source) and am scratching my head at the following output:

> poetry update -vvv
Using virtualenv: C:\[...]\.venv
Updating dependencies
Resolving dependencies...
   1: fact: XXX is 0.39.5
   1: derived: XXX
   1: fact: XXX depends on pyparsing (^2.2)
[...]
   1: fact: XXX depends on rope (^0.17.0)
   1: selecting XXX (0.39.5)
   1: derived: rope (^0.17.0)
[...]
   1: derived: YYY (^1.32)
PyPI: No packages found for YYY >=1.32,<2.0
dama: https://PRIVATE_PYPI/simple/YYY/
dama: 52 packages found for YYY >=1.32,<2.0
PyPI: No release information found for rope-0.2, skipping
[...]]
PyPI: No release information found for rope-0.9, skipping
PyPI: No release information found for rope-0.9.1, skipping
PyPI: 1 packages found for rope >=0.17.0,<0.18.0
dama: https://PRIVATE_PYPI/simple/rope/
   1: Version solving took 1.902 seconds.
   1: Tried 1 solutions.

[Crash because our private PyPI does not forward to the official one using https]

The last lines seem to implicate that rope is found on the official PyPI, but then poetry goes off and queries the private repo for it anyway?!

@finswimmer finswimmer added the area/repo Meta-issues for the repository/forge itself label Oct 9, 2020
@alimantu
Copy link

I face same issue here, it seems to be, that poetry/pip checks packages for both default and secondary sources. It makes sense if we are talking about robustness - you'll always get version that will fit you best from poetry/pip point of view, but even so I would suggest at least some key/argument to avoid this deep check in case package fit the dependencies already found during search in some previous repo.

@sinoroc
Copy link

sinoroc commented Nov 24, 2020

Is that still an issue?
Was it maybe fixed by #3251 released in version 1.1.4?
Is that possibly fixed by the newest, unreleased, suggested solution in #3406?

@GooseYArd
Copy link

Is that still an issue?
Was it maybe fixed by #3251 released in version 1.1.4?
Is that possibly fixed by the newest, unreleased, suggested solution in #3406?

It's still broken, at least for me.

I'm working on a PR but I'm very new to the code and don't understand the architecture yet, but I can give some additional context that might be helpful.

In my case, I've added one private repository with secondary = true. In my case, the private pypi repository is on Artifactory and does not have the option enabled with causes the repo to proxy the connection to the canonical pypi repository. The behavior of Artifactory in that case is to return a 403 when poetry makes the request for some public module.

The exception occurs in legacy_repository.get()

    def _get(self, endpoint):  # type: (str) -> Union[Page, None]
        url = self._url + endpoint
        try:
            response = self.session.get(url)
            if response.status_code == 404:
                return
            response.raise_for_status()
        except requests.HTTPError as e:
            raise RepositoryError(e)

        if response.status_code in (401, 403):
            self._log(
                "Authorization error accessing {url}".format(url=response.url),
                level="warn",
            )
            return

since raise_for_status treats (afaik) any 4xx status code as a failure, we raise a RepositoryError, even though we're subsequently checking for a 401 and 403.

repositories.pool.find_packages() doesn't catch RepositoryError, which means that if any get to a repo yields a 403, we wil not try any additional repositories. This doesn't seem totally illogical, since under normal circumstances I think we could interpret the 403 as an error that can't be retried until we've fixed the auth issue in our configuration, and I expect that pypi servers other than Artifactory probably don't share this quirk of returning a 403 if a package doesn't exist in the repo.

I believe the correct fix here is have find_packages catch the RepositoryError and continue with any remaining repositories. Alternatively, we could test for 401/403 in the try block in _get before calling raise_for_status, so that the subsequent code that prints the warning for 401/403 (and which is currently useless) handles things. The legacy repository tests would need to be modified in that case, and this behavior seems slightly wrong to me, so if a maintainer can comment on whether the first approach seems sound, I can generate a PR to that effect.

@Drachenfels
Copy link

Drachenfels commented Feb 3, 2021

I can provide additional details about the issue.

In my situation, I have a package let's call it John on a private repository, John has dependencies of requests and sqlalchemy. Now if I do poetry add John, the package will be located correctly, next it will try to add dependencies, but my private repo does not have them. If it returns 404 it's fine, but my repository due to security (or whatever, I don't really know why) returns "403 Forbidden for url". Then line response.raise_for_status() happens and poetry dies.

Fix is actually trivial and looking at the code that follows should really be there:

    def _get(self, endpoint):  # type: (str) -> Union[Page, None]
        url = self._url + endpoint
        try:
            response = self.session.get(url)
            if response.status_code == 404:
                return
            if response.status_code not in (401, 403):
                response.raise_for_status()
        except requests.HTTPError as e:
            raise RepositoryError(e)

        if response.status_code in (401, 403):
            self._log(
                "Authorization error accessing {url}".format(url=response.url),
                level="warn",
            )
            return

@GooseYArd
Copy link

@Drachenfels if you have a moment, could you confirm whether #3608 solves this issue for you? I had the same problem and that change seems to do the trick, although I've associated it with another ticket which I think may be related to the same problem.

@Drachenfels
Copy link

@GooseYArd I will test it later today, looking at that PR it probably will fix my issue.

@Drachenfels
Copy link

@GooseYArd I just finished testing your pull request and the answer is yes and no.

If we leave pytoml as-is, that is we define the additional repository 403 will again kill poetry if it tries to install any dependency that lives on standard pypi.

But if we add secondary=true to that additional repo it will work. But if we do it, for some reason I am getting a lot of debug output. Might be my configuration of sort or perhaps a bug with PR?

Resolving dependencies... (4.1s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (4.2s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (4.4s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (5.2s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (5.3s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (5.6s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (5.8s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (5.9s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (6.1s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (6.3s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (6.5s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (6.7s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (6.8s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (12.3s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (12.5s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (12.6s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (13.6s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (13.7s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.0s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.1s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.3s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.5s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.7s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.8s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (14.9s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (15.1s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (15.2s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (15.7s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (15.8s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (15.9s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.1s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.3s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.4s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.6s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.7s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (16.8s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (17.0s)<debug>Pool:</debug> error checking secondary repository myo
Resolving dependencies... (17.1s)<debug>Pool:</debug> error checking secondary repository myo

@Drachenfels
Copy link

I will add that the code change I proposed seems to be a correct solution, 4 lines below there is a warning message that otherwise never can be reached.

@GooseYArd
Copy link

I wasn't quite sure how to handle the logging in the case where a secondary fails with a 403- it seems bad not to provide some kind of feedback about it, so I borrowed a debug logging example from one of the other modules, but I think I've used it incorrectly. I meant to look into that and it slipped my mind, lemme have a look now to see if I did something dumb.

@kakarukeys
Copy link

I am seeing either a regression of this bug, or a missing instruction in the documentation.

I followed this Install dependencies from a private repository
to add a private pypi repo as a source for package installation, setting secondary = true. The lock file generated put the private pypi as source for all packages, public and private.

Adding an extra section tool.poetry.source with the official PyPI mitigated the issue, but this is not documented. The official PyPI should be an implicit source that does not require declaration.

@caniko
Copy link

caniko commented Jun 28, 2021

I am seeing either a regression of this bug, or a missing instruction in the documentation.

I followed this Install dependencies from a private repository to add a private pypi repo as a source for package installation, setting secondary = true. The lock file generated put the private pypi as source for all packages, public and private.

Adding an extra section tool.poetry.source with the official PyPI mitigated the issue, but this is not documented. The official PyPI should be an implicit source that does not require declaration.

This didn't work for me.

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'
default = true

@kakarukeys
Copy link

This didn't work for me.

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'
default = true

remove default = true, it made official pypi the only source, see poetry doc.

If you want your packages to be exclusively looked up... you can set it as the default one by using the default keyword

if the above means what it means, then default is a misnomer

the settings that should work for you:

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'

@maffka123
Copy link

Also does not work for me (tried all ways default=true, false, nothing, etc.) it keeps searching everything in secondary directory. Also tried all poetry versions from 1.1.3 to 1.2.0a1:

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'

[[tool.poetry.source]]
name = "pycelonis"
url = "https://pypi.celonis.cloud/"
secondary = true

@caniko
Copy link

caniko commented Jun 29, 2021

https://pypi.celonis.cloud/

@maffka123, the repository is forbidden when I type the URL to my browser. Did you include your credentials and/or certificates?

@maffka123
Copy link

Hi @caniko,
With pip it works, see instructions here
https://celonis.github.io/pycelonis/

pip install --extra-index-url=https://pypi.celonis.cloud/ pycelonis

The repo is not mine, so unfortunately no passwords..

@caniko
Copy link

caniko commented Jun 30, 2021

This didn't work for me.

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'
default = true

remove default = true, it made official pypi the only source, see poetry doc.

If you want your packages to be exclusively looked up... you can set it as the default one by using the default keyword

if the above means what it means, then default is a misnomer

the settings that should work for you:

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[[tool.poetry.source]]
name = 'default'
url = 'https://pypi.python.org/simple'

This didn't work either.

log

Updating dependencies
Resolving dependencies...

  RepositoryError

  403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cpu/flake8/

  at ~\.poetry\lib\poetry\repositories\legacy_repository.py:393 in _get

toml file:

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu/"
secondary = true

[tool.poetry.dependencies]
python = ">=3.7.1,<3.8"

future-fstrings = {"version" = "^1.2.0", "platform" = "linux"}
wheel = {"version" = "^.36.2", "platform" = "linux"}
bpy = "^2.91a0"

pandas = "^1.2.3"
pyarrow = "^3..0"
fletcher = "^.7.2"
h5py = "^3.1.0"
odfpy = "^1.4.1"
openpyxl = "^3..7"

scipy = "^1.5.2"
numpy = "^1.19.3"

torch = {version = "^7.0", source = "pytorch"}
torchvision = {version = "^1.8.1", source = "pytorch"}
torchio = "^.18.39"

opencv-python = "^4.4.0"
matplotlib = "^3.3.2"
scikit-image = "^.18"
trimesh = "^3.9.10"
ffmpeg-python = "^.2.0"
imageio-ffmpeg = "^.4.3"
click = "^7.1.2"
tqdm = "^4.60.0"
yaspin = "^1.5.0"
seaborn = "^.11.1"
streamlit = "^.82.0"

[tool.poetry.dev-dependencies]
pre-commit = "^2.12.1"
black = {extras = ["d"], version = "*"}
isort = "*"
Cython = "*"
flake8 = "*"

[build-system]
requires = ["poetry-core>=1..0"]
build-backend = "poetry.core.masonry.api"

@kakarukeys
Copy link

kakarukeys commented Jun 30, 2021

no idea why that didn't work out for you, the following works for me, I just tested a while ago.

[tool.poetry]
...

[tool.poetry.dependencies]
...

[tool.poetry.dev-dependencies]
...

[[tool.poetry.source]]
name = "default"
url = "https://pypi.python.org/simple/"

[[tool.poetry.source]]
name = "foo"
url = "http://pypi-pypiserver:8080/simple/"
secondary = true

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

after a bunch of poetry add

$ grep foo poetry.lock | wc -l
       1
$ grep default poetry.lock | wc -l
      12
$ poetry --version
Poetry version 1.1.7

could it be that the sequence of the declarations matters? I put the official pypi on top of private pypi.
Also, what's worth mentioning is that my private pypi is set to fallback to official pypi when the package can't be found.

anyway my original complaint still stands.

I am seeing either a regression of this bug (with just the private pypi declaration), or a missing instruction in the documentation (the required declarations are not shown in doc).

@caniko
Copy link

caniko commented Jul 1, 2021

could it be that the sequence of the declarations matters? I put the official pypi on top of private pypi.
Also, what's worth mentioning is that my private pypi is set to fallback to official pypi when the >package can't be found.

No, I tried both ways.

My problem is more related to #3855, moving there.

@makquel
Copy link

makquel commented Jun 9, 2023

According to the version of the current docs, the secondary flag is deprecated. Ergo, the priority order should be:

  1. default source,
  2. primary sources,
  3. implicit PyPI (unless disabled by another default source or configured explicitly),
  4. secondary sources (DEPRECATED),
  5. supplemental sources.

So something like the following should work.

[[tool.poetry.source]]
name = "torch-gpu"
url = "https://download.pytorch.org/whl/cu117"
priority = "explicit"

[[tool.poetry.source]]
name = "torch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "supplemental"

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/repo Meta-issues for the repository/forge itself kind/bug Something isn't working as expected
Projects
None yet