Skip to content

pypi: do not use formula name as PyPI package name #8732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

SeekingMeaning
Copy link
Contributor

@SeekingMeaning SeekingMeaning commented Sep 15, 2020

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This fixes brew bump-formula-pr for cdk8s, juju, and pulumi as they aren't Python packages and don't use any PyPI resources

Before:

$ brew bump-formula-pr --version 0.28.0 cdk8s
==> Downloading https://registry.npmjs.org/cdk8s-cli/-/cdk8s-cli-0.28.0.tgz
######################################################################## 100.0%
Warning: Cannot verify integrity of 187437a0449fde48e31cff7fea2c427a9a20e0a59e6cfe3bebef5b23af9bae6f--cdk8s-cli-0.28.0.tgz
A checksum was not provided for this resource.
For your reference the SHA-256 is: a31f7fbabd2ded1c5c193927b5a947c90f08e64f62fddd072c8a4372c1ddf23a
==> replace /https:\/\/registry\.npmjs\.org\/cdk8s\-cli\/\-\/cdk8s\-cli\-0\.27\.
==> replace "77010866c1e04f3c4ef09bd9076b3adb32c8585f4580acebf7599d2834be861b" w
Error: The resources for "cdk8s" need special attention. Please update them manually.

After:

$ brew bump-formula-pr --version 0.28.0 cdk8s
==> Downloading https://registry.npmjs.org/cdk8s-cli/-/cdk8s-cli-0.28.0.tgz
Already downloaded: ~/Library/Caches/Homebrew/downloads/187437a0449fde48e31cff7fea2c427a9a20e0a59e6cfe3bebef5b23af9bae6f--cdk8s-cli-0.28.0.tgz
Warning: Cannot verify integrity of 187437a0449fde48e31cff7fea2c427a9a20e0a59e6cfe3bebef5b23af9bae6f--cdk8s-cli-0.28.0.tgz
A checksum was not provided for this resource.
For your reference the SHA-256 is: a31f7fbabd2ded1c5c193927b5a947c90f08e64f62fddd072c8a4372c1ddf23a
==> replace /https:\/\/registry\.npmjs\.org\/cdk8s\-cli\/\-\/cdk8s\-cli\-0\.27\.
==> replace "77010866c1e04f3c4ef09bd9076b3adb32c8585f4580acebf7599d2834be861b" w
M	Formula/cdk8s.rb
Switched to a new branch 'bump-cdk8s-0.28.0'
[bump-cdk8s-0.28.0 d2c86b4a20] cdk8s 0.28.0
 1 file changed, 2 insertions(+), 2 deletions(-)
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 4 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 449 bytes | 449.00 KiB/s, done.
Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
remote: 
remote: Create a pull request for 'bump-cdk8s-0.28.0' on GitHub by visiting:
remote:      https://github.com/SeekingMeaning/homebrew-core/pull/new/bump-cdk8s-0.28.0
remote: 
To https://github.com/SeekingMeaning/homebrew-core.git
 * [new branch]            bump-cdk8s-0.28.0 -> bump-cdk8s-0.28.0
Branch 'bump-cdk8s-0.28.0' set up to track remote branch 'bump-cdk8s-0.28.0' from 'https://6efeb4e6c042ee3fe2d92d9c263e7d3a80c0496d@github.com/SeekingMeaning/homebrew-core.git'.

Sorry, something went wrong.

@SeekingMeaning SeekingMeaning changed the title pypi: add NON_PYPI_FORMULAE pypi: add list of non-PyPI formulae Sep 15, 2020
@dtrodrigues
Copy link
Member

I'm wondering if it might make more sense to remove the fallback to the formula name instead of having a new blocklist: https://github.com/Homebrew/brew/blob/master/Library/Homebrew/utils/pypi.rb#L73-L75

@Rylan12 what do you think? Are there any/many cases where the primary url is a non-PyPI url but we want to update the resources based on pipgrip output for the Python module?

@Rylan12
Copy link
Member

Rylan12 commented Sep 16, 2020

I'm not totally opposed to removing it. It means that we won't catch cases where the url is a non-pypi url but still uses pypi resources. A good example of this is ansible (although ansible is on the blocklist for autoupdating so maybe its not a good example). I really have no sense of how many formulae this would apply to. From a quick search, it looks like there are 505 formulae that have a pypi resource but not a pypi url. However, my guess would be that most of these aren't available on pypi anyway so they don't get auto-updated (e.g. zurl) anyway.

Have we discussed adding a message or label to the PR that bump-formula-pr creates to indicate whether the resources have been auto-updated? This might be useful to let maintainers know whether or not update-python-resources was skipped when reviewing PRs. If it doesn't indicate that update-python-resources was run and the formula has resource blocks, the maintainer reviewing should make sure to double check that none of the resources need updates.

If we add something like this, I would feel much better about removing the fallback.

I agree, though, having two different exclude lists for this feels wrong.

TL;DR: I have no sense of how many formulae are available on pypi and don't use the pypi url. If it's pretty small, not a problem (maybe these should be switched to use the pypi url anyway). Separately, it might be good to add a message to PRs that indicates whether resources were checked automatically.

@MikeMcQuaid
Copy link
Member

Have we discussed adding a message or label to the PR that bump-formula-pr creates to indicate whether the resources have been auto-updated? This might be useful to let maintainers know whether or not update-python-resources was skipped when reviewing PRs. If it doesn't indicate that update-python-resources was run and the formula has resource blocks, the maintainer reviewing should make sure to double check that none of the resources need updates.

This seems like a good idea 👍🏻

@Rylan12
Copy link
Member

Rylan12 commented Sep 16, 2020

In that case, I'd say we should remove the fallback instead of adding a second list. We can monitor PRs to see how many don't get auto-updated when they should and add it back if there are many instances.

@SeekingMeaning
Copy link
Contributor Author

SeekingMeaning commented Sep 16, 2020

About 1,445 formulae match a PyPI package name: https://gist.github.com/SeekingMeaning/d440ec4496795049f6f576d9f7760218

Of these, 127 have a stable url referencing pythonhosted.org, and 81 others have resources referencing pythonhosted.org (not including homebrew-virtualenv): https://gist.github.com/SeekingMeaning/29a28226ca351af410b98699279927bd

Adding the two numbers together, 208 (14.4%) of the 1,445 formulae matching a PyPI package name are using PyPI.

Of all formulae in homebrew/core, 139 have a stable url referencing pythonhosted.org, and 120 others have resources referencing pythonhosted.org: https://gist.github.com/SeekingMeaning/171574b4d5f3b33c842fa06b69dfc9de

In summary, 259 (5%) of the 5,221 formulae in homebrew/core use PyPI.

@SeekingMeaning SeekingMeaning changed the title pypi: add list of non-PyPI formulae pypi: do not use formula name as PyPI package name Sep 16, 2020
@Rylan12
Copy link
Member

Rylan12 commented Sep 17, 2020

@SeekingMeaning is this table correct? If not, can you correct it?

Category Number of Formulae Resources Checked Automatically Before This PR? Resources Checked Automatically After This PR?
1. PyPI packages with pythonhosted.org stable urls 127
2. Formulae that share a name with a PyPI package (without pythonhosted.org stable urls but with pythonhosted.org resources) 81
3. Formulae that share a name with a PyPI package (without pythonhosted.org stable urls and with no pythonhosted.org resources) 1237
4. PyPI packages with pythonhosted.org stable urls that have mismatched formula names 12
5. Non-PyPI packages (or formulae with mismatched names) without pythonhosted.org stable urls but with pythonhosted.org resources 39
6. Non-PyPI packages without pythonhosted.org stable urls and with no pythonhosted.org resources 3725
Total 5221

To clarify: ✅ and ❌ mean yes and no rather than good and bad

Here's a summary of how each category is handled:

Pinging @Homebrew/core for thoughts on how all of the categories will be handled

Category 1: Auto updates

Category 2: No longer auto-updates but bump-formula-pr will create PRs with a message indicating that there are resources that need to be checked. These formulae can be moved to Category 1 by changing their stable url to use a pythonhosted.org url.

Category 3: No longer checks for new resources. This will likely be a minor problem because if the formulae start needing resources in the future we will notice because the formula won't build. At that point, the stable url could be updated to a pythonhosted.org url and these would fall into Category 1.

Category 4: I don't think this category can exist... 🤔 Nothing should happen, though.

Category 5: These cannot be auto-updated. bump-formula-pr will create PRs with a message indicating that there are resources that need to be checked.

Category 6: Nothing should happen.

@SeekingMeaning
Copy link
Contributor Author

SeekingMeaning commented Sep 17, 2020

  1. PyPI packages with pythonhosted.org stable urls

Accurate description ✅

Special case: this category includes keepkey-agent as the url https://pypi.org/project/keepkey-agent/ redirects to https://pypi.org/project/keepkey_agent/

  1. PyPI packages without pythonhosted.org stable urls but with pythonhosted.org resources

There's probably a possibility that some of these aren't on PyPI — there might be cases where the formula happens to have the same name as an unrelated PyPI package and the actual PyPI package has a different name or does not exist

Special case: this category includes bandcamp-dl as the url https://pypi.org/project/bandcamp-dl/ redirects to https://pypi.org/project/bandcamp_dl/

  1. PyPI packages without pythonhosted.org stable urls and with no pythonhosted.org resources

Most of these are probably not PyPI packages

  1. Non-PyPI packages with pythonhosted.org stable urls

These are PyPI packages — the package name is different from the formula name (ex: the formula aws-elasticbeanstalk installs the package awsebcli)

  1. Non-PyPI packages without pythonhosted.org stable urls but with pythonhosted.org resources

This is a combination of categories 2 and 4 — some packages might not be on PyPI, and those that are (if any) have a different name from the formula name

  1. Non-PyPI packages without pythonhosted.org stable urls and with no pythonhosted.org resources

Some of these use brew's homebrew-virtualenv resource, which was excluded from the above categories (It is not relevant for creating bump PRs since the version is hardcoded)


List of formulae for categories 1, 2, 4, and 5:
(Compiled as one gist with four files)

@Rylan12
Copy link
Member

Rylan12 commented Sep 17, 2020

Thanks for the clarification. I've updated the table to (I think) better match your descriptions.

The bottom line is, formulae should use a pythonhosted.org stable url in order to have resources checked automatically. This will not affect any formulae with different names than their corresponding PyPI package at it determines the PyPI information on only the stable url. There is a possibility that there are formulae in categories 2 and 3 that should be moved to category 1, but this can be handled on a case-by-case basis.

As before, there is a blocklist that handles the exceptions for formulae in categories 1 or 4 that need special attention (9 formulae in total, I believe).

@SeekingMeaning
Copy link
Contributor Author

SeekingMeaning commented Sep 17, 2020

Thanks for the clarification. I've updated the table to (I think) better match your descriptions.

Looks good!

The bottom line is, formulae should use a pythonhosted.org stable url in order to have resources checked automatically. This will not affect any formulae with different names than their corresponding PyPI package at it determines the PyPI information on only the stable url. There is a possibility that there are formulae in categories 2 and 3 that should be moved to category 1, but this can be handled on a case-by-case basis.

I forget which formula, but there is at least one package that does not provide source code on PyPI — I'm not sure what the best way to handle this would be (One approach would be to add another field or two to the Formula class to indicate that it is a PyPI/Node/etc. package and to provide the name of the package but this might be too hacky/confusing)

As before, there is a blocklist that handles the exceptions for formulae in categories 1 or 4 that need special attention (9 formulae in total, I believe).

We could remove ansible, ansible@2.8, and xonsh since they're in category 2, so that leaves us with 6 formulae in the blocklist Nevermind, I wasn't thinking 😅

@Rylan12
Copy link
Member

Rylan12 commented Sep 17, 2020

I forget which formula, but there is at least one package that does not provide source code on PyPI — I'm not sure what the best way to handle this would be (One approach would be to add another field or two to the Formula class to indicate that it is a PyPI/Node/etc. package and to provide the name of the package but this might be too hacky/confusing)

Can we just add it to the blocklist?

We could remove ansible, ansible@2.8, and xonsh since they're in category 2, so that leaves us with 6 formulae in the blocklist

The only reason I can see not to allow them is to force the contributor to create a PR manually (being on the blocklist makes bump-formula-pr fail). However, once the PR message indicating resources weren't checked is added, this may be unnecessary. I'd say this is fine unless we know that a formula will need resource bumps on most updates. In that case, it might save some CI time to fail on the user side before they make the PR.

@samford
Copy link
Member

samford commented Sep 17, 2020

We could remove ansible, ansible@2.8, and xonsh since they're in category 2...

I'd say this is fine unless we know that a formula will need resource bumps on most updates.

ansible has so many resources that it's pretty much guaranteed that a version bump will involve resource block updates. Updating the resource blocks in the ansible formula is a somewhat complicated process, so I think we should keep ansible and ansible@2.8 on the blocklist.

From my perspective, it's appropriate to discourage folks from creating low-effort version bump PRs for ansible. Few people know how to properly update the ansible resource blocks (i.e., without updating them to a version they shouldn't be at) and not everyone who opens an ansible PR is going to be willing to deal with updating the resource blocks when asked. If a different contributor or maintainer simply ends up handling the resource block updates, then I don't see much value in that type of PR other than making it clear that there's a new version available (which brew livecheck does fine).

@SeekingMeaning
Copy link
Contributor Author

Can we just add it to the blocklist?

Yes, that would probably be simplest

The only reason I can see not to allow them is to force the contributor to create a PR manually (being on the blocklist makes bump-formula-pr fail).

ansible has so many resources that it's pretty much guaranteed that a version bump will involve resource block updates. Updating the resource blocks in the ansible formula is a somewhat complicated process, so I think we should keep ansible and ansible@2.8 on the blocklist.

Nevermind, I wasn't thinking 😅

Copy link
Member

@Rylan12 Rylan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there have been no major opinions either way, I think this can move forward.

Just one minor change that would be nice but won't block.

Also, I think that xonsh can be removed from the blocklist. I think keeping the ansible formulae around is a good idea but I don't think xonsh has resources updated nearly as often. Again, not a blocker.


odie <<~EOS
Could not infer PyPI package name from URL:
#{formula.stable.url}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#{formula.stable.url}
#{Formatter.url(formula.stable.url)}

@SeekingMeaning SeekingMeaning merged commit 4f235a1 into Homebrew:master Sep 18, 2020
@SeekingMeaning SeekingMeaning deleted the pypi/non-pypi-formulae branch September 18, 2020 17:21
@BrewTestBot BrewTestBot added the outdated PR was locked due to age label Dec 11, 2020
@Homebrew Homebrew locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants