Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Pytorch 2.3.0 #1498

Merged
merged 18 commits into from
May 19, 2024
Merged

Update to Pytorch 2.3.0 #1498

merged 18 commits into from
May 19, 2024

Conversation

HGuillemet
Copy link
Collaborator

@HGuillemet HGuillemet commented May 3, 2024

Included in this PR:

  • Update to PyTorch 2.3.0
  • Add AOTInductor (new way to run models exported from Python)

@HGuillemet HGuillemet marked this pull request as draft May 3, 2024 10:20
@HGuillemet
Copy link
Collaborator Author

HGuillemet commented May 3, 2024

The C++ API can change depending on the platform.
Currently blocking is Half constructor and cast operator that uses float16_t or ARM64 and float on x86_64 (which doesn't have float16_t):

#if defined(__aarch64__) && !defined(C10_MOBILE) && !defined(__CUDACC__)
  inline Half(float16_t value);
  inline operator float16_t() const;
#else
  inline C10_HOST_DEVICE Half(float value);
  inline C10_HOST_DEVICE operator float() const;
#endif

I'm considering adding some explicit JNI bridge that will call either the float or float16 version depending on the platform, and skip the platform-dependent C++ API.
@saudet, any better idea ?

@saudet
Copy link
Member

saudet commented May 3, 2024 via email

@HGuillemet
Copy link
Collaborator Author

HGuillemet commented May 4, 2024

If Parser is told to parse only the #if !defined(__arch64__) branch I get a linking error on Mac Apple Silicon because there is no C++ constructor taking a float 32 or a casting operator returning a float 32 (see check error log).

@HGuillemet
Copy link
Collaborator Author

HGuillemet commented May 4, 2024

I can try to patch libtorch source to disable the #if defined(__arch64__) branch and force the creation of float 32 variants on Mac, hoping there is no other arm64-specific parts of he code in libtorch relying on the float16 variant. But it seems more hazardous than the custom JNI trick. What do you think ?

EDIT: I'll try to just add the float variants on Mac, not removing the float16 variants. That seems safer.

@saudet
Copy link
Member

saudet commented May 4, 2024

If Parser is told to parse only the #if !defined(__arch64__) branch I get a linking error on Mac Apple Silicon because there is no C++ constructor taking a float 32 or a casting operator returning a float 32 (see check error log).

I assume we can convert from/to float and float16_t just like we can between float and double, so it should work if we add casts.

@HGuillemet
Copy link
Collaborator Author

HGuillemet commented May 5, 2024

The x64_64-gpu check is killed, probably due to out-of-memory, when compiling some transformer-related cuda code.
Any idea how to work around that ?
Decrease MAKEJ to 2 or 3 ?

@saudet
Copy link
Member

saudet commented May 5, 2024

Increasing swap space doesn't work?

@HGuillemet
Copy link
Collaborator Author

I'll try that.
Do I change deploy-ubuntu/action.yml or do I add a swap file in pytorch cppbuild ?

@HGuillemet
Copy link
Collaborator Author

I added a check for pytorch in action.yml, but if that works, the best would be to set a dedicated environment variable in workflow files, like SWAP_SPACE and use it in action.yml

@saudet
Copy link
Member

saudet commented May 5, 2024 via email

@HGuillemet
Copy link
Collaborator Author

My changes to deploy-ubuntu are not run by the worker. No idea why.

@saudet
Copy link
Member

saudet commented May 5, 2024

To use the actions from your fork, you'll need to change the URL in the workflow

@HGuillemet
Copy link
Collaborator Author

A swap space of 2Gb works, 4 and 0 do not.

@HGuillemet HGuillemet marked this pull request as ready for review May 6, 2024 17:03
@saudet
Copy link
Member

saudet commented May 6, 2024

Why not 4GB? What happens?

@HGuillemet
Copy link
Collaborator Author

I'm not sure because the log file of the 4G attempt seems truncated. But there remains only 10G of disk space after installations and creation of swap file, and before build. I won't be surprised if we run out of disk space.
We really are at the limits of what the worker can offer...

@saudet
Copy link
Member

saudet commented May 6, 2024 via email

@HGuillemet
Copy link
Collaborator Author

Better than what ?
Build passes for now, so we are ok.
But in prevision of next Pytorch releases that will need probably more disk space and more memory to compile, finding spare disk space would be useful.
Also upgrading to cuda 12.4 maybe could help.

@saudet
Copy link
Member

saudet commented May 7, 2024

2GB might not be enough for PyTorch 2.4.0, so before merging this let's wait and see I guess

@saudet
Copy link
Member

saudet commented May 7, 2024

@saudet
Copy link
Member

saudet commented May 7, 2024

We can probably just erase a couple of those without the tool though:

=> Android library: Saved 14GiB
=> .NET runtime: Saved 2.7GiB
=> Haskell runtime: Saved 0B
=> Large misc. packages: Saved 5.3GiB
=> Tool cache: Saved 5.9GiB
=> Swap storage: Saved 4.0GiB

Total: Saved 31GiB

Does that mean swap is enabled by default now??

@HGuillemet
Copy link
Collaborator Author

This looks useful: https://github.com/marketplace/actions/free-disk-space-ubuntu

Indeed. We can use this freeing action.

But why waiting for 2.4.0 before merging ?

Does that mean swap is enabled by default now??

I changed deploy-ubuntu so that it reads a SWAP_SIZE env variable and set this variable to 4 in the mkl workflow (deploy-ubuntu added 4G of swap for mkl before my change), and 2 for pytorch.

@saudet
Copy link
Member

saudet commented May 7, 2024

Let's not make an option, let's just set the swap to a value that works for everything like 4GB, which means using an action here is annoying, so let's just ditch android and dotnet instead like actions/runner-images#2606 (comment)

@HGuillemet
Copy link
Collaborator Author

Ok for the ditching, but keeping option to use some disk space for extra swap, depending on the workflow needs, seems interesting to me.

@saudet
Copy link
Member

saudet commented May 7, 2024

Why do you want to make an option? It's not going to used by anything

@HGuillemet
Copy link
Collaborator Author

Maybe some other builds would fail if we remove 4G of disk for the swap. Like pytorch currently does.
Why do you want to remove the option ? There are already other options like CI_DEPLOY_NEED_* why not CI_DEPLOY_NEED_SWAP ?

@saudet
Copy link
Member

saudet commented May 7, 2024

Those options are there because some builds fail when they are true, but some others fail when they are false. That doesn't happen for something like swap space

@HGuillemet
Copy link
Collaborator Author

Is dotnet used by any build ?
Is /usr/local/android used by android builds ?

@HGuillemet
Copy link
Collaborator Author

11Gb saved from ditching android and dotnet

@saudet saudet requested a review from sbrunk May 15, 2024 14:30
@saudet saudet merged commit f8932a4 into bytedeco:master May 19, 2024
3 of 6 checks passed
@HGuillemet HGuillemet deleted the pytorch_2_3_0 branch May 22, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants