Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poetry doesn't try public pypi when private pypi included #3855

Closed
damienrj opened this issue Mar 29, 2021 · 35 comments
Closed

Poetry doesn't try public pypi when private pypi included #3855

damienrj opened this issue Mar 29, 2021 · 35 comments
Labels
kind/bug Something isn't working as expected

Comments

@damienrj
Copy link

  • [ X ] I am on the latest Poetry version.
  • [ X ] I have searched the issues of this repo and believe that this is not a duplicate.
  • [ X ] If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • macOS : 11.1
  • Poetry version: 1.1.5

Issue

When using a private pypi repo, the public repo is no longer being checked. I can get everything working again by adding

[[tool.poetry.source]]
name = "pypi-public"
url = "https://pypi.org/simple/"

But the wasn't needed in the past.

@damienrj damienrj added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Mar 29, 2021
@sinoroc
Copy link

sinoroc commented Apr 2, 2021

Any details? Is your private repository set as "secondary"? Why do you think the public PyPI is not checked anymore, does your project have dependencies that should be downloaded from the public PyPI?

@damienrj
Copy link
Author

damienrj commented Apr 7, 2021

Seems to be related to #3306

Yeah, there are packages that are not on our private repository that then fail because it tries to pull from only the private one. Adding the public one as a source is a work around that unblocked me.

@caniko
Copy link

caniko commented Jul 1, 2021

Any details? Is your private repository set as "secondary"? Why do you think the public PyPI is not checked anymore, does your project have dependencies that should be downloaded from the public PyPI?

I have this exact issue, you can see my pyproject.toml file here

@damienrj
Copy link
Author

damienrj commented Jul 1, 2021

The way we have been solving it is by

[[tool.poetry.source]]
name = "private"
url = "https://private/simple"
secondary = true


[[tool.poetry.source]]
name = "pypi-public"
url = "https://pypi.org/simple/"

Otherwise if you look at the lockfile the source for all packages is our private repo. This puts more work on our private repo because there isn't any reason to check it for the public packages since we are not running a full clone of pypi.

@dwyatte
Copy link

dwyatte commented Jul 2, 2021

@damienrj given this is related to #3306, do you want to check to see if the behavior is still present in https://github.com/python-poetry/poetry/releases/tag/1.2.0a1? Seems #3306 (PR #3406) made the release notes in 1.2.0a1 and may be cherry-picked for an upcoming patch release per #4241

@jleclanche
Copy link
Contributor

I'm affected by this. Tried in 1.2.0a2 for good measure… At first glance it looks fixed: poetry.lock no longer updates with the incorrect default repository.

However, it's apparent that poetry still checks the non-default repository for every package even if a valid package has been found on the default one. This might be intended, but it also slows operations down significantly. Common scenario: App depends on a handful of packages in a private repository, and a bunch of packages in pypi. A quick poetry update -vvv will reveal what's going on. In my case i'm using a private gitlab pypi repo and:

image

This makes poetry update a 35 second operation on a warm cache for my project. If I remove the custom repository, same project, it completes in under 1 second.

@jleclanche
Copy link
Contributor

The above happens even if source="pypi" is specified on every non-private dependency.

@persiyanov
Copy link

persiyanov commented Sep 9, 2021

Experiencing the same issue as @jleclanche has mentioned. I'm on poetry 1.1.8. During poetry update every package seems to be requested from our private PyPI.

I have the following pypi configuration in my pyproject.toml:

[[tool.poetry.source]]
name = "cnstrc_pypi"
url = "https://pypi.cnstrc.com/simple/"
secondary = true

And this is the log piece I'm getting (repeated pattern for every package) for poetry update -vvv:

PyPI: 7 packages found for joblib >=0.14.1
cnstrc_pypi: Response URL https://pypi.org/simple/joblib/ differs from request URL https://pypi.cnstrc.com/simple/joblib/
cnstrc_pypi: 7 packages found for joblib >=0.14.1
PyPI: Getting info for joblib (1.0.1) from PyPI
PyPI: No dependencies found, downloading archives
PyPI: Downloading wheel: joblib-1.0.1-py3-none-any.whl
   1: selecting joblib (1.0.1)

I looks like poetry doesn't respect secondary = true flag in custom pypi configuration.

Changing config to

[[tool.poetry.source]]
name = "cnstrc_pypi"
url = "https://pypi.cnstrc.com/simple/"
secondary = true


[[tool.poetry.source]]
name = "pypi-public"
url = "https://pypi.org/simple/"

as was suggested above doesn't help.

@judahrand
Copy link

I'm also finding this is an issue...

@andras-kth
Copy link

@jleclanche

The above happens even if source="pypi" is specified on every non-private dependency.

Are you sure? What I observe is that those dependencies where source = "pypi" is specified explicitly are not requested from the alternate repo, but any implicit dependencies (i.e. packages that these explicit dependencies depend on) don't inherit this source designation and those (but only those) are looked up on the alternate index.

BUT, I do think that explicitly specifying the "default" source should NOT be necessary in either case.

I think the culprit is this loop

        packages = []
        for repo in self._repositories:
            packages += repo.find_packages(dependency)

where packages are collected from every registered repo without regard to priorities.
If I'm reading the code correctly self._repositories is already in priority order.
If that's correct, then taking the first match instead might be the desired behavior:

         packages = []
         for repo in self._repositories:
-            packages += repo.find_packages(dependency)
+            packages = repo.find_packages(dependency)
+            if packages:
+                break

Unfortunately, this could produce incorrect results due to what I'd consider a bug
in version matching (#4729) (i.e. in some cases the first "match" doesn't actually match);
so, the issue may, in fact, be somewhat more complicated.

@jleclanche
Copy link
Contributor

@mehes-kth yes I'm fairly certain. I'm still working on that same project and the poetry install command is exceedingly slow because of this, despite only having one single private dependency.

@andras-kth
Copy link

@jleclanche I experience the excruciating slowness, too.

All I'm saying is that any dependency explicitly marked as source = "pypi" is not looked up in the alternate repo.
On the other hand, recursive dependencies of those will be, which is likely more than sufficient to make things slow.

BTW, I'm wondering why Poetry seems to assume that any non-PyPI index is private.
There are quite a few public repos out there beyond just https://pypi.org/simple/
Perhaps, this is another one of those philosophical issues... 😈

@jleclanche
Copy link
Contributor

Ah, you might be right that it's caused by the recursive dependencies. It's been a while since I tested this so I don't remember exactly, but I did see lookups in the private repo when setting everything to source pypi.

@timorkal
Copy link

timorkal commented Dec 9, 2021

Will there be a fix supplied soon or some workaround? I am pretty stuck with trying to lock my deps with private repo and pypi.

@hugoantunes
Copy link

Also having the same issue here, it only works when I set my private repo as default = true.
Any suggestion on how to fix make it work?

Thank you!

@hugoantunes
Copy link

Actually, it worked for me. I just add to add only my private repository with the default = true. It falls back to the public PyPI when it doesn't find in my private repo.

@jonapich
Copy link
Contributor

When you combine secondary = true and default = false, the poetry.lock does behave correctly.

e.g.:

[[tool.poetry.source]]
name = "internal"
url = "https://pypi.internal.com/simple/"
secondary = true
default = false

There's no need to redefine the public pypi in this case. The poetry.lock correctly adds the private repo exclusively to the libraries I marked with source = "internal"

@pbsds
Copy link

pbsds commented Feb 25, 2022

If this is true, then the docs should be updated to mention default=false.

@SimonVerhoek
Copy link

@hugoantunes @jonapich This did not work for me. Could you share the environment you did this in, and what python/poetry versions and OSes you did this with?

My experiences so far: The above does not work with a Docker build based on python:3.10, poetry 1.1.13. What did work was setting source = "pypi" as @andras-kth suggested.

@jonapich
Copy link
Contributor

Poetry 1.1.12 and Windows, python 3.9. I just tested it again:

Given this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = "*"


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true
default = false

The lock doesn't contain any repository information. But once I do this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*", source = "internal" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true
default = false

Then the lock contains the repository information for the requests package exclusively.

Once I change to this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
default = false

Then suddenly every single package in the lock contains my repository information (this seems to be a bug!).

This seems to work though:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true

With the above, I don't see any repo information. When I add source = "internal" then I get the same result as the first example: the repo information is added to the requests package and the other packages don't have any repo information.

@SimonVerhoek
Copy link

SimonVerhoek commented Mar 4, 2022

@jonapich An update/rectification from my side - setting both secondary = true and default = false does indeed work. Authentication kept failing on my side, due to me mistakenly thinking I could pass variables from a .env file as build arguments into a Dockerfile...

That leaves me agreeing with you that using only secondary = true is currently not working due to the bug you're describing.

@abn
Copy link
Member

abn commented Apr 28, 2022

Poetry by default searches all sources for a package unless the package explicitly specifies a source (poetry add --source pypi or poetry add --source torch). The use of secondary = true only implies preference when choosing the best match - search still goes through all sources.

If a source is set to default = true then PyPI is never searched, ie this disables PyPI effectively.

@Maciej-Zwolinski
Copy link

@SimonVerhoek
does the private repo you are using return 403 or 404 if a package is not found?

@abn
Copy link
Member

abn commented May 13, 2022

This issue isn't reproducible on poetry@master. Closing.

Note that setting repository to default will disable default PyPI.

https://python-poetry.org/docs/master/repositories/#project-configuration

@abn abn closed this as completed May 13, 2022
@jonapich
Copy link
Contributor

jonapich commented May 13, 2022

@abn setting default to true, or false, yields the same poetry.lock where all packages will target the custom repository. This feels like a bug (or at the very least, a misleading documentation)? It should not apply to all packages if default is false (but the documentation states that if you set one, it's used over pypi, so the documentation is actually OK here, just a false assumption of mine and probably lots of people).

Basically we try to use a toml like this:

[tool.poetry]
name = "pouet"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = "^3.10"

urllib3 = "*"
idna = "*"


[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"


[[tool.poetry.source]]
name = "fake-private"
url = "https://pypi.org/simple/"
default = false

Notice how the poetry.lock has a package.source for each package, pointing to this registry.

Now set default to true, try again, and you get the same thing. It feels like the default switch doesn't do anything here, and I guess most people only have 1 private repo to worry about, not several.

It feels like the description of default is wrong or misleading: https://python-poetry.org/docs/master/repositories/#default-package-source and really the rule is:

  • If you add [[tool.poetry.source]] it overrides pypi.org
  • If you want to retain pypi first, set secondary=True
  • If you set multiple [[tool.poetry.source]] then default=True will do something (I didn't try this, but I guess that's its purpose?)

@abn
Copy link
Member

abn commented May 13, 2022

There are two factors at play here. Default and secondary. This leads to the following scenarios.

  1. Setting a custom source to default will disable PyPI.
  2. Adding a source without setting default will make it take precedence over PyPI. As in, PyPI will be looked in last.
  3. In order to avoid (2) you set all your sources to secondary=true. This will keep PyPI as the preferred source.

Clarifications for the docs welcome.

jonapich added a commit to jonapich/poetry that referenced this issue May 13, 2022
Clarifies default vs secondary (see discussion in python-poetry#3855)
@jonapich jonapich mentioned this issue May 13, 2022
2 tasks
@andras-kth
Copy link

andras-kth commented May 14, 2022

@abn

Poetry by default searches all sources for a package unless the package explicitly specifies a source (poetry add --source pypi or poetry add --source torch). The use of secondary = true only implies preference when choosing the best match - search still goes through all sources.

That's exactly what's causing major slow-down when a private repo is not a full mirror,
but only hosts a few relevant packages. I realize that this is not the original issue reported,
but it's a closely related major headache. Would you care to clarify WHY this is done?

@abn
Copy link
Member

abn commented May 14, 2022

That's exactly what's causing major slow-down when a private repo is not a full mirror, but only hosts a few relevant packages.

It adds extra requests during locking, yes, however "major slow-down" is probbaly an overstatement. If I recall correct, the extra request happens once per package when searching for the pakage when an update for the the package is allow lsited (eg: poetry add (1 request) poetry update | lock (n requests, where n is number of packages in pyproject).

Would you care to clarify WHY this is done?

This was discussed recently on discord when discussing #5442 with @tgolsson.

One of the more common use caes for enterprises when using a self-hosted PEP 503 repository is to provide target environment specific wheels. This means, they set private repositories as secondary = true and use PyPI as the default.

A concrete example - PyPi has a source published (py3-none) but repo.org has prebuilt (py38-windows). If we say that the secondary source is only used when package is not found (ie. it is a fallback only), then Poetry will select only the none wheel.

As I have stated multiple times in various issues and discord discussions, modifying the current behaviour is the not the right approach, but rather to allow for an explicit feature that caters to the scenario where sources should only be used when the package explicitly requires this source. However, this also has other issues. As discussed on the discord thread, eg: how do we handle transitive dependncies then? There are far too many edge cases here.

If you really bothered by the extra requests at present, perhaps using poetry add --source=pypi will help in the interim prior to a better solution being available. And I suspect #5442 should also further improve the situation. It won't be enabled by default because that will cause significant performance impact for private repos that proxy/mirror/overlay PyPI.

Hope this helps clarify why things are the way they are.

@jleclanche
Copy link
Contributor

It adds extra requests during locking, yes, however "major slow-down" is probbaly an overstatement

has the situation significantly improved since the issue was filed? My recollection is it made a difference of one to two orders of magnitude in seconds for my project.

@abn
Copy link
Member

abn commented May 14, 2022

Cannot speak for your project but based on the above example given in #3855 (comment) here is an unscientific evaluation.

$ poetry source show
No sources configured for this project.
$ poetry lock --no-cache
Updating dependencies
Resolving dependencies... (0.9s)

Writing lock file
$ poetry source add --secondary fake-private https://pypi.org/simple/
Adding source with name fake-private.
$ poetry source show
 name       : fake-private             
 url        : https://pypi.org/simple/ 
 default    : no                       
 secondary  : yes                      

$ poetry lock --no-cache
Updating dependencies
Resolving dependencies... (1.0s)

Writing lock file
$

@jonapich
Copy link
Contributor

jonapich commented May 16, 2022

Our private registry is configured to redirect to pypi.org on missing packages. I think your test is flawed without a real repository, since both of your repositories a) contain all packages b) provide amazing performance. In the real world, you would be hitting your custom server first, which is probably slower than pypi.org.

I just tested locking a huge project. Locking with our registry first took 4m40, but configuring it as non default, secondary and targetting only the few relevant packages brought that down to 4m14 (that's poetry lock --no-cache). In that test, there are 4 packages to fetch on our server, and 164 from pypi. There are also 2 git sources which slow down the whole process quite a bit. So overall, the difference here exists, but is clearly in the nice-to-have range.

The huge difference though, is that we were able to slash down the size of our private registry server with this one easy trick 😅 when too many clients are doing a poetry install simultaneously (think a bunch of docker builds and automated tests kicking in parallel), the server occasionally spits out a 5xx during poetry install. We were able to fix this problem by using the default false, secondary true trick.

I improved (I think) the documentation in #5605 but I think some effort should be made to support this use case better. The options are simply misleading for anyone who didn't take the time to carefully read that documentation section.

This is the dark side of the rules:

  • adding a source means it's checked first (it's opinionated, not misleading. however you just moved a whole project from pypi to private now, probably unintentionally)
  • combining with source= at a dependency level has no effect (misleading)
  • setting secondary=true has no effect (misleading)
  • setting default=false has no effect (misleading)
  • setting default=true had no effect (misleading)
  • setting default=false and secondary=true finally makes source= work as intended! (profit)

We have to think about the developer's thought process here. When you add source= the first time to a dependency, there's very little chance that you'll know that default=false and secondary=true must be added. If the repo information is added without a good understanding of its documentation, you're not just adding a dependency, you're actually setting the private registry as default for all locking and install needs. Since GitHub hides the poetry.lock diff most of the time (it's too large), a lot of devs won't notice the addition of 100s of package.source to their lock file and will just go with it. It took us a couple 5xx to understand what was going on...

I would say that if the user provided source= information, the plan is to use the private registry as little as possible. When the user sets the source=, poetry should use it only when requested. If no package have a source=, then the repository information is most likely to be used as much as possible.

However, this also has other issues. As discussed on the discord thread, eg: how do we handle transitive dependncies then?

Simply don't. If the user wants a transitive to use the private registry, it can be added to the dependencies with the source= specified 🤷🏻‍♂️ That's how someone could use e.g. a forked version of urllib3 even though only requests was needed. urllib3 would be pushed to the private registry, and urllib3 and source= would be added to the pyproject file. It feels wrong that the custom urllib3 will maybe be used by everyone in the company who didn't think about this and set the private registry as default by mistake.

I think that setting a registry as the first one to be checked should be the "you need to specify an option" way, and using the registry only when source= targets it really should be the default scenario.

@jleclanche
Copy link
Contributor

Cannot speak for your project but based on the above example given in #3855 (comment) here is an unscientific evaluation.

I remember what the issue was now: I was using a gitlab private pypi repository which 1) didn't have all packages (as @jonapich points out can be an issue) and 2) doesn't support the more recent package metadata protocol improvements, only the "simple" protocol (or something like that; I don't remember the internal details exactly)

@abn
Copy link
Member

abn commented May 16, 2022

I think your test is flawed without a real repository

I agree. Was not going for anything else. I do appreciate you doing the actual test on a real project where the impact can be seen.

We were able to fix this problem by using the default false, secondary true trick.

default defaults to false for all sources, no need to explicitly set it. I am unclear on why you say that setting secondary=true had no effect. If it is indeed the case, then there is a bug unless default = true was explicitly configured.

Additionally, this situation should be much improved once #5442 is merged, but perhaps not in your case if you are proxying public packages - you might end up with a huge index page.

Fwiw, I am not advocating any particular solution here. However, I am trying to clarify the status quo. I am not saying that the way it is is fine and we shouldn't change anything.

adding a source means it's checked first (it's opinionated, not misleading. however you just moved a whole project from pypi to private now, probably unintentionally)

Personally, I agree that the default behaviour should be similar to adding an extra index in pip install --extra-index-url.

Simply don't. If the user wants a transitive to use the private registry, it can be added to the dependencies with the source= specified 🤷🏻‍♂️

While it might work in the case you have identified, I do not think this is universally applicable. I can recall environments where they did prefer it to be the other way around. Question would be how to handle that, and what the right defaults should be. Further, doing this will also potentially leave unwanted packages in the project metadata. As an example if A depends on B and B depends on C, if we add C to A as you suggest and then later B drops dependency on C, you are left with an unused dependency in your project and/or your lockfile. Sure you can workaround this by adding it to a group instead of the main one. But similar issues apply for the version constriaints used as well. What is the right thing to do when B changes C's requirements? etc.

All that said, I'd suggest that we move off this issue for this discussion. Might be more constructive to discuss the change of "default" behaviour of adding a package source. Alternatively, discuss addition of an option disabling package searches unless explicitly used.

@jonapich
Copy link
Contributor

We were able to fix this problem by using the default false, secondary true trick.

default defaults to false for all sources, no need to explicitly set it. I am unclear on why you say that setting secondary=true had no effect. If it is indeed the case, then there is a bug unless default = true was explicitly configured.

You're right, only secondary=true is needed. I think that was maybe an old bug, or just a manipulation error when I played around this months ago.

As an example if A depends on B and B depends on C, if we add C to A as you suggest and then later B drops dependency on C, you are left with an unused dependency in your project and/or your lockfile.

The same problem occurs when you need to specifically pin a version of a transitive dependency because reasons, no need to involve private registries to fall into this trap. I can't vouch for everyone's best practices, but if you need to add such an edge case to your pyproject, you comment it as such so that everyone knows what it's about.

In fact, the same problem occurs if someone adds dependency A for new python code, then someone alters the code later and removes the usage. Unless you actively search the code base for more usages of some random import you just removed, you're going to be left with one unused library. My opinion is that it's a non-issue / user-error, this isn't something poetry should be concerned about.

All that said, I'd suggest that we move off this issue for this discussion. Might be more constructive to discuss the change of "default" behaviour of adding a package source. Alternatively, discuss addition of an option disabling package searches unless explicitly used.

👍🏻

Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests