Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pep 503 Simple Repository API for deployment #25639

Closed
kohtala opened this issue Sep 4, 2019 · 52 comments
Closed

Implement pep 503 Simple Repository API for deployment #25639

kohtala opened this issue Sep 4, 2019 · 52 comments
Assignees
Labels
module: binaries Anything related to official binaries that we release to users oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@kohtala
Copy link

kohtala commented Sep 4, 2019

🚀 Feature

Add files to https://download.pytorch.org/whl/ to implement PEP 503 for pip --extra-index-url and for pipenv Pipfile extra source urls.

Motivation

pipenv install using package configuration like torch = {file = "https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl"} fails with hash mismatch. I guess the problem could be that it downloads the hashes from package releases on pypi and they do not match the whl that provides the package with same name. Pipenv promises vulnerability check in addition to other nice new features for managing package configuration.

Using the simple API you could just document one command that always installs the latest release.

This is related to issue #4793.

Pitch

Rename the different cuda version packages uniquely, like torch-cu100. Add metadata Provides-Dist header to indicate it provides torch to satisfy dependencies on torch.

Provide the simple repository api at URL https://download.pytorch.org/whl/, which lists the different torch-cuNN, torchvision-cuNN etc. packages.

For pip it should work to use pip --extra-index-url https://download.pytorch.org/whl/ torch-cu100 to install latest up to date version of torch for CUDA 10.0.

For pipenv it should work to use configuration like

[[source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/"
verify_ssl = true

[packages]
torch = {version="*", index="pytorch"}

Alternatives

Could keep old package names and provide many different simple repository apis for different cuda versions.

However I do not see much need for this, as it is possible to reduce the impact of change by keeping the package names for the cuda version currently on PyPi and just rename the other cuda version packages.

Additional context

I am not experienced in setting up Python repositories, so this requires some testing it surely works.

I think the rename of packages and use of Provides-Dist would be good for upload of all cuda versions to PyPi as well.

cc @ezyang @gchanan @zou3519 @bdhirsh @seemethere @malfet @walterddr

@ailzhang ailzhang added module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 4, 2019
@ezyang
Copy link
Contributor

ezyang commented Sep 4, 2019

When I wrote up #23656 I was not aware provides-dist. It sounds like a good choice; better than local version identifiers, which are not actually very good for what we are using them for. @soumith, do you have any experience with this flag?

@soumith
Copy link
Member

soumith commented Sep 4, 2019

provides-dist actually sounds great, I didn't know about the existence of this flag either. This overall proposal sounds nice.

@kohtala are you offering to take up this work, or should we take it up?

@kohtala
Copy link
Author

kohtala commented Sep 7, 2019

Thanks for compliments :-)
I am not familiar with torch release process enough to know where to go making changes. It'd need the build and release as well as some documentation changes. Maybe set it up at https://download.pytorch.org/simple/ (à la https://pypi.org/simple/) instead of https://download.pytorch.org/whl/ so it can be tested there first and update documentation when it works.
If it is all familiar to you, I'm sure you'd be much more efficient. Besides, even while I love to contribute, I have some difficulty finding the time.

@cpbotha
Copy link

cpbotha commented Mar 17, 2020

Hi all, until someone has time to fix this at the source, I hacked and slashed together (using Emacs of course) an index-url you can use for pytorch: https://vxlabs.com/pypi/

I am currently using this in my poetry pyproject.toml as an additional index-url and it works like a charm.

You can drill down, you'll see that it ends up pointing to all the whl packages that are hosted at pytorch, it just supplies the top-level indices according to PEP 503.

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2020

Thanks @cpbotha. BTW, @kohtala, @seemethere and I were looking at this recently, and we noticed that provides-dist has a very scary message on the docs saying most tools in the ecosystem don't use it. Do you have any experience using it for projects?

@kohtala
Copy link
Author

kohtala commented Mar 18, 2020

Hi.

No, I have not used provides-dist.

The main improvement proposed in this issue was the simple repository index that @cpbotha created for test. provides-dist was an idea on top of it to take it one step further. If provides-dist does not work, there would then need to be different indices for different cuda versions. Different cuda versions could not be in a single index (such PyPi).

[[source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu100/"
verify_ssl = true

[packages]
torch = {version="==1.1.0", index="pytorch"}

Somewhere https://download.pytorch.org/whl/cu100/torch_stable.html is already generated. It just needs to be split by package and modified for PEP 503 and offered as index on the

I tried with this Pipfile and it was able to lock and install. Solved at least the pipenv problem.

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[[source]]
name = "pytorch"
url = "https://vxlabs.com/pypi/"
verify_ssl = true

[dev-packages]

[packages]
torch = {version="==1.1.0", index="pytorch"}
torchvision = {version="==0.3.0", index="pytorch"}
fastai = "==1.0.54"

[requires]
python_version = "3.7"

@kohtala
Copy link
Author

kohtala commented Mar 18, 2020

I tried to see what happens with pip. Unfortunately it seems to treat packages by same name and version unique and always download the one from PyPi. pypa/pip#5045

@cpbotha
Copy link

cpbotha commented Mar 18, 2020

Just to add to @kohtala 's comment above:

I wast struggling yesterday with the cuda and gpu versions.

According to PEP 440 "torch-1.4.0+cpu" is exactly the same version as "torch-1.4.0" and "torch-1.4.0+cu92", for example. So if you start by installing +cpu for example, it's hard to convince pip or pipenv that you want to go back to the cu101 packages, which are the postfix-less version ones.

Anyways, I don't know about provides-dist support, but @kohtala 's suggestion to have separate indices for the different hardware configurations would work reliably for many people.

Hopefully I'll get some time soon to split https://vxlabs.com/pypi/ out into cpu, cu92 and cu101.

@cpbotha
Copy link

cpbotha commented Mar 18, 2020

I tried to see what happens with pip. Unfortunately it seems to treat packages by same name and version unique and always download the one from PyPi. pypa/pip#5045

Besides the local version label (+blah) I mentioned above, I am able to install directly from my index with e.g. pip install --index-url=https://vxlabs.com/pypi/ torch==1.4.0.

@kohtala
Copy link
Author

kohtala commented Mar 18, 2020

Besides the local version label (+blah) I mentioned above, I am able to install directly from my index with e.g. pip install --index-url=https://vxlabs.com/pypi/ torch==1.4.0.

In this install the only index is your index so it won't go to pypi. If you need something from pypi in the same install (-r requirements.txt) then if we are to trust the pypa/pip#5045 issue it'll install torch from pypi.

Anyway, the Simple Repository API would be an improvement. Unfortunately not as great improvement as one would hope.

@cpbotha
Copy link

cpbotha commented Mar 18, 2020

Besides the local version label (+blah) I mentioned above, I am able to install directly from my index with e.g. pip install --index-url=https://vxlabs.com/pypi/ torch==1.4.0.

In this install the only index is your index so it won't go to pypi. If you need something from pypi in the same install (-r requirements.txt) then if we are to trust the pypa/pip#5045 issue it'll install torch from pypi.

Anyway, the Simple Repository API would be an improvement. Unfortunately not as great improvement as one would hope.

poetry does the right thing here. You can define any number of source indices. By default it will go extra-index -> extra-index -> pypi. However, you can change the order with source config settings in the pyproject.toml.

Also, you can specify per-dependency which source should be used if you don't like the global precedence in that specific case.

Whatever the case may be, pytorch publishing official simple indices, one for each hardware config (cu92, cu100, cu101, cpu) with each containing refs to all other relevant packages (only torch and torchvision are different for the different HW configs, there is also torchtext and torchaudio) would be the most practical at this stage.

@soumith and @ezyang -- if you could perhaps post pointers to where an intrepid contributor can start looking to code up the necessary scripts to do this as part of your processes, maybe an intrepid contributor will try. (it might be me)

@ezyang
Copy link
Contributor

ezyang commented Mar 20, 2020

Thanks for offering. Here's the script that we use to create the index: https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh

It... kind of looks like we might be making simple indices already. So is the request to drop the local version specifier as well from the versions in that case?

@cpbotha
Copy link

cpbotha commented Mar 20, 2020

Thanks for offering. Here's the script that we use to create the index: https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh

It... kind of looks like we might be making simple indices already. So is the request to drop the local version specifier as well from the versions in that case?

I had a quick look at the script, it looks like it's almost there (but I did not look long enough :).

Ideally, we end up with the following simple indices:

In each case, the top-level index only a hrefs the package names: torch, torchvision, torchaudio, torchtext, etc. -- each of those hrefs is another html a hreffing the full list of packages for that hardware configuration.

In this case, the +cu92 local version label can stay. The user can control their hardware choice by just selecting the correct index.

If I find the time in the coming days to experiment with the script, I'll let you know! If anyone else picks this up, I'm not going to stop you. :) (just please leave a comment here if you're going to try so we don't do double effort)

@ezyang
Copy link
Contributor

ezyang commented Mar 20, 2020

I'm not aware of anyone touching the s3 indexes right now. You'll have to hack it up since you don't have access to the bucket but I'm more than happy to help you deploy the update it if you have a proposed update. Note that if we change the URLs in a BC-breaking way that is going to be a lot more work!

@kousu
Copy link

kousu commented Aug 15, 2020

@kohtala you can instead use --find-links:

pip install --find-links https://download.pytorch.org/whl/cpu/torch_stable.html torch

You can make this more permanent by adding it to a requirements.txt, e.g. #26340 (comment) or https://github.com/neuropoly/spinalcordtoolbox/blob/b64cad3c846fd6bd7a557688b67b80fe0b2c6dc2/requirements.txt#L26-L30

numpy==1.17.2                                                                                                                                                                                                     
pandas==0.25.2
-f https://download.pytorch.org/whl/torch_stable.html                                                                                                                                                             
torch==1.3.1+cpu

and then pip install -r requirements.txt will do the right thing.

This doesn't seem to be compatible with making proper packages, though, as far as I can tell, because for a proper package you need to specify all your dependencies in setup.py and not in requirements.txt. So if you depend on pytorch you can't publish your package to pypi. You'll have to, I guess, make them install from source, or maybe pip install -r https://code.example.com/you/yourpackage/requirements.txt ? I'm not really sure. If #26340 happened this would be a non-issue.

@kohtala
Copy link
Author

kohtala commented Aug 19, 2020

Thanks @kousu.

Since creating this issue I moved from Pipenv to using pip-tools with requirements.in file that just says eg. torch @ https://download.pytorch.org/whl/cu100/torch-1.1.0-cp37-cp37m-linux_x86_64.whl. That works and does not impose what Pipenv developers think is right for us.

But on #26340 there is a nice idea of splitting the cuda into an package extra. Each cuda version could have separate extra. There would still be need for the Provides-Dist support in pip so you could have dependency like "I want any HW acceleration, but don't care which it is" and any cuda (or AMD, whatever) version could satisfy it.

@caniko
Copy link

caniko commented Nov 13, 2020

I really need this feature for my python packages. I am willing to help, what is holding us back?

@ezyang
Copy link
Contributor

ezyang commented Nov 13, 2020

bumping priority based on user activity

@jkyl
Copy link

jkyl commented Nov 14, 2020

With a recent (not sure which, sorry, but latest should do) version of Poetry, environment markers work to the extent that I have included the following in my pyproject.toml:

torch = [
    { version = "1.6.0", markers = "sys_platform != 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-win_amd64.whl", markers = "python_version ~= '3.6' and sys_platform == 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-win_amd64.whl", markers = "python_version ~= '3.7' and sys_platform == 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torch-1.6.0-cp38-cp38-win_amd64.whl", markers = "python_version ~= '3.8' and sys_platform == 'win32'" }
]
torchvision = [
    { version = "0.7.0", markers = "sys_platform != 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torchvision-0.7.0-cp36-cp36m-win_amd64.whl", markers = "python_version ~= '3.6' and sys_platform == 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torchvision-0.7.0-cp37-cp37m-win_amd64.whl", markers = "python_version ~= '3.7' and sys_platform == 'win32'" },
    { url = "https://download.pytorch.org/whl/cu102/torchvision-0.7.0-cp38-cp38-win_amd64.whl", markers = "python_version ~= '3.8' and sys_platform == 'win32'" }
]

as a workaround.

@jkyl
Copy link

jkyl commented Nov 14, 2020

Which is not at all to detract from the priority of this issue! My workaround sucks and pytorch should definitely also do a simple index!

@malfet malfet self-assigned this Nov 16, 2020
@rgommers
Copy link
Collaborator

rgommers commented Jun 3, 2021

For organizing the wheels, #25639 (comment) (subdirs like cpu/, cu111/, etc.) seems like a good suggestion.

The Poetry issue isn't really actionable; https://eternalphane.github.io/pytorch-pypi is just collecting all wheels in a single index, and there's no way for Poetry (or Pip, or any wheel-based tool) to do anything reasonable based on only file names with +cu111, +rocm4.0.1. Everything after + is just a random identifier (see PEP 440) - it can be used for ordering, but not for selecting the desired hardware.

@malfet malfet assigned malfet and unassigned seemethere Jun 15, 2021
@malfet
Copy link
Contributor

malfet commented Jun 15, 2021

Likely done by pytorch/builder@71a2b9a

@malfet
Copy link
Contributor

malfet commented Jun 15, 2021

pip install --extra-index-url https://download.pytorch.org/whl/cpu/ torch should install PyTorch on CPU and
pip install --extra-index-url https://download.pytorch.org/whl/cu111/ torch should install PyTorch with CUDA-11.1 support

@kohtala
Copy link
Author

kohtala commented Jun 17, 2021

Seems to work. Thanks!

I found some discussion at pypa/pip#8606 about the rules to select packages between indices. As I tried the commands, they seemed to select the one that I wanted, but I still did not find the detailed rules to understand why. Apparently it chooses the best match by highest version and matching tags and only selects between indices if indices serve the same file. The file name is not the same and the one at pytorch.org would seem to be treated by pip 21.1.2 as a better match.

@cgarciae
Copy link

The issue was closed but currently only adding https://eternalphane.github.io/pytorch-pypi/ as a source works with poetry without having to use the exact URL for the wheel (which is not very user friendly).

@rgommers
Copy link
Collaborator

adding https://eternalphane.github.io/pytorch-pypi/ as a source works with poetry without having to use the exact URL for the wheel (which is not very user friendly).

Does that actually work for different CUDA/ROCm versions? If so, it'd be great to see an explanation of how the correct wheel is selected.

@cgarciae
Copy link

cgarciae commented Oct 19, 2021

Hey @rgommers! No, for poetry users its just a bit more convenient than the whole URL string but you still have to select an exact version + hardware tag e.g. 1.9.1+cpu.

I think my issue is that searching for the correct source is not trivial and there are a lot of abandoned repos, if possible pytorch could provide an official service of what https://eternalphane.github.io/pytorch-pypi/ is doing to make this easier.

Edit: Or point towards https://eternalphane.github.io/pytorch-pypi/ in the installation docs if its a trusted source.

@rgommers
Copy link
Collaborator

you still have to select an exact version + hardware tag e.g. 1.9.1+cpu.

That's what I thought. I think that's worse than separate directories - if you have separate directories you use normal version constraints like `"torch >= 1.9.0, <1.10.0", while if everything is in a single dir you can't. So it'd be better to just improve the docs to make it easier to find the right URLs.

@cgarciae
Copy link

Agreed.

@vikigenius
Copy link
Contributor

@cgarciae tried to use https://eternalphane.github.io/pytorch-pypi/ with poetry:

[[tool.poetry.source]]
name = "torch"
url = "https://eternalphane.github.io/pytorch-pypi/"

[[tool.poetry.source]]
name = "torchvision"
url = "https://eternalphane.github.io/pytorch-pypi/"

However trying to install torchvision and torch together causes failure like this:

  Because torchvision (0.11.1+cu113) depends on torch (1.10.0)
   and siamenc depends on torch (1.10.0+cu113), torchvision is forbidden.
  So, because siamenc depends on torchvision (0.11.1+cu113), version solving failed.

Is there anyway I can install both torch and torchvision together properly?

@rgommers
Copy link
Collaborator

siamenc is not on PyPI under that name, so not sure what it is. But this is the problem with a dependency like 1.10.0+cu113. The right way to do this is to depend on 1.10.0 (without +cu113) and point to https://download.pytorch.org/whl/cu113 as the index. Poetry should be able to do this, when given the correct index.

@vikigenius
Copy link
Contributor

vikigenius commented Dec 11, 2021

@rgommers siamenc is just the name of my project. Poetry I think uses custom repositories as default and it thus throws errors like this if I add the custom repository like this if I add the repo like this.

[[tool.poetry.source]]
name = "torch"
url = "https://download.pytorch.org/whl/cu113"

Here is the error.

  RepositoryError

  403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cu113/python-lsp-server/

  at ~/.local/share/pypoetry/venv/lib/python3.9/site-packages/poetry/repositories/legacy_repository.py:393 in _get
      389│             if response.status_code == 404:
      390│                 return
      391│             response.raise_for_status()
      392│         except requests.HTTPError as e:
    → 393│             raise RepositoryError(e)
      394│
      395│         if response.status_code in (401, 403):
      396│             self._log(
      397│                 "Authorization error accessing {url}".format(url=response.url),

I would appreciate it if you can provide me a working example of how to do it with poetry.

@rgommers
Copy link
Collaborator

I don't use Poetry so can't help provide an example unfortunately.

403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cu113/python-lsp-server/

It should be using PyPI for python-lsp-server, not download.pytorch.org. You'll need to make sure the custom repository applies just to torch, not to anything else.

@Jerry2001Qu
Copy link

Looks like the issue with Poetry not using PyPI should be solvable after this PR: python-poetry/poetry#908

But is broken here: python-poetry/poetry#3855

I currently can't figure out a workaround.

@Jerry2001Qu
Copy link

Downgrading to Poetry 1.0.10 might be a workaround (ontop of my previous comment) as per: python-poetry/poetry#4704 (comment)

Haven't tested because it's too much of a pain, switching to pip!

@Arcitec
Copy link

Arcitec commented Feb 19, 2022

If anyone needs the correct solution for installing PyTorch via Pipenv, I have posted a guide and explanation here:

pypa/pipenv#4961 (comment)

It would be cool if the official pytorch website could list those install commands (the ones I've generated) as an option during the "roll your own selections" guide. I.e. having Pipenv as a choice next to Pip, and then showing the command-style that I'm using in my guide. Then projects that are based on Pipenv won't have to manually lookup the latest versions in the repo HTML in a browser anymore.

@stephanbertl
Copy link

Any update on this?

We have an internal Sonatype Nexus repository. It only supports pep503. It's impossible to proxy the pytorch repository with the current format.

@ezyang
Copy link
Contributor

ezyang commented May 15, 2023

Please go ahead and file a new issue for these problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: binaries Anything related to official binaries that we release to users oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests