Update to Pytorch 2.3.0 #1498

HGuillemet · 2024-05-03T10:20:00Z

Included in this PR:

Update to PyTorch 2.3.0
Add AOTInductor (new way to run models exported from Python)

HGuillemet · 2024-05-03T15:58:36Z

The C++ API can change depending on the platform.
Currently blocking is Half constructor and cast operator that uses float16_t or ARM64 and float on x86_64 (which doesn't have float16_t):

#if defined(__aarch64__) && !defined(C10_MOBILE) && !defined(__CUDACC__)
  inline Half(float16_t value);
  inline operator float16_t() const;
#else
  inline C10_HOST_DEVICE Half(float value);
  inline C10_HOST_DEVICE operator float() const;
#endif

I'm considering adding some explicit JNI bridge that will call either the float or float16 version depending on the platform, and skip the platform-dependent C++ API.
@saudet, any better idea ?

saudet · 2024-05-03T23:33:55Z

Just disable float16_t support, it's not portable.

HGuillemet · 2024-05-04T06:42:11Z

If Parser is told to parse only the #if !defined(__arch64__) branch I get a linking error on Mac Apple Silicon because there is no C++ constructor taking a float 32 or a casting operator returning a float 32 (see check error log).

HGuillemet · 2024-05-04T06:55:03Z

I can try to patch libtorch source to disable the #if defined(__arch64__) branch and force the creation of float 32 variants on Mac, hoping there is no other arm64-specific parts of he code in libtorch relying on the float16 variant. But it seems more hazardous than the custom JNI trick. What do you think ?

EDIT: I'll try to just add the float variants on Mac, not removing the float16 variants. That seems safer.

saudet · 2024-05-04T07:13:35Z

If Parser is told to parse only the #if !defined(__arch64__) branch I get a linking error on Mac Apple Silicon because there is no C++ constructor taking a float 32 or a casting operator returning a float 32 (see check error log).

I assume we can convert from/to float and float16_t just like we can between float and double, so it should work if we add casts.

pytorch/src/main/resources/org/bytedeco/pytorch/include/platform_unification.h

HGuillemet · 2024-05-05T07:37:27Z

The x64_64-gpu check is killed, probably due to out-of-memory, when compiling some transformer-related cuda code.
Any idea how to work around that ?
Decrease MAKEJ to 2 or 3 ?

saudet · 2024-05-05T07:40:59Z

Increasing swap space doesn't work?

HGuillemet · 2024-05-05T07:52:36Z

I'll try that.
Do I change deploy-ubuntu/action.yml or do I add a swap file in pytorch cppbuild ?

HGuillemet · 2024-05-05T08:21:52Z

I added a check for pytorch in action.yml, but if that works, the best would be to set a dedicated environment variable in workflow files, like SWAP_SPACE and use it in action.yml

saudet · 2024-05-05T10:29:44Z

Like you said, after build, it doesn't matter if the annotation is there or not, so don't worry about it

HGuillemet · 2024-05-05T15:00:56Z

My changes to deploy-ubuntu are not run by the worker. No idea why.

saudet · 2024-05-05T21:32:47Z

To use the actions from your fork, you'll need to change the URL in the workflow

This reverts commit 5a2045f.

HGuillemet · 2024-05-06T17:02:58Z

A swap space of 2Gb works, 4 and 0 do not.

saudet · 2024-05-06T22:17:45Z

Why not 4GB? What happens?

HGuillemet · 2024-05-06T22:32:46Z

I'm not sure because the log file of the 4G attempt seems truncated. But there remains only 10G of disk space after installations and creation of swap file, and before build. I won't be surprised if we run out of disk space.
We really are at the limits of what the worker can offer...

saudet · 2024-05-06T22:56:52Z

So, it's probably a better idea to figure out how to free some space?

HGuillemet · 2024-05-07T05:11:51Z

Better than what ?
Build passes for now, so we are ok.
But in prevision of next Pytorch releases that will need probably more disk space and more memory to compile, finding spare disk space would be useful.
Also upgrading to cuda 12.4 maybe could help.

saudet · 2024-05-07T07:10:09Z

2GB might not be enough for PyTorch 2.4.0, so before merging this let's wait and see I guess

saudet · 2024-05-07T07:13:29Z

This looks useful: https://github.com/marketplace/actions/free-disk-space-ubuntu

saudet · 2024-05-07T07:15:00Z

We can probably just erase a couple of those without the tool though:

=> Android library: Saved 14GiB
=> .NET runtime: Saved 2.7GiB
=> Haskell runtime: Saved 0B
=> Large misc. packages: Saved 5.3GiB
=> Tool cache: Saved 5.9GiB
=> Swap storage: Saved 4.0GiB

Total: Saved 31GiB

Does that mean swap is enabled by default now??

HGuillemet · 2024-05-07T07:19:07Z

This looks useful: https://github.com/marketplace/actions/free-disk-space-ubuntu

Indeed. We can use this freeing action.

But why waiting for 2.4.0 before merging ?

Does that mean swap is enabled by default now??

I changed deploy-ubuntu so that it reads a SWAP_SIZE env variable and set this variable to 4 in the mkl workflow (deploy-ubuntu added 4G of swap for mkl before my change), and 2 for pytorch.

saudet · 2024-05-07T07:25:55Z

Let's not make an option, let's just set the swap to a value that works for everything like 4GB, which means using an action here is annoying, so let's just ditch android and dotnet instead like actions/runner-images#2606 (comment)

HGuillemet · 2024-05-07T07:35:17Z

Ok for the ditching, but keeping option to use some disk space for extra swap, depending on the workflow needs, seems interesting to me.

saudet · 2024-05-07T08:06:25Z

Why do you want to make an option? It's not going to used by anything

HGuillemet · 2024-05-07T08:57:02Z

Maybe some other builds would fail if we remove 4G of disk for the swap. Like pytorch currently does.
Why do you want to remove the option ? There are already other options like CI_DEPLOY_NEED_* why not CI_DEPLOY_NEED_SWAP ?

saudet · 2024-05-07T09:50:35Z

Those options are there because some builds fail when they are true, but some others fail when they are false. That doesn't happen for something like swap space

HGuillemet · 2024-05-07T10:20:37Z

Is dotnet used by any build ?
Is /usr/local/android used by android builds ?

HGuillemet · 2024-05-07T10:25:05Z

11Gb saved from ditching android and dotnet

.github/actions/deploy-ubuntu/action.yml

Update to Pytorch 2.3.0

8108c98

HGuillemet marked this pull request as draft May 3, 2024 10:20

Skip torch::jit::VectorReader

c8ad320

Add platform_unification.h for float16_t

8e87348

saudet reviewed May 4, 2024

View reviewed changes

pytorch/src/main/resources/org/bytedeco/pytorch/include/platform_unification.h Outdated Show resolved Hide resolved

HGuillemet added 2 commits May 4, 2024 18:59

Replace platform_unification.h with annotations

04067c4

Skip getCurrentCUDABlasLtHandle

db813d6

Add swap for pytorch github workflow

8d13ded

HGuillemet added 2 commits May 5, 2024 14:20

Check for arm64 in Java instead of C

04ed3bf

Add SWAP_SIZE in deploy-ubuntu

adacd13

HGuillemet added 2 commits May 5, 2024 20:10

Fix arm64 detection

8001d5d

Fix arm64 detection

5a20af5

HGuillemet added 4 commits May 5, 2024 23:38

Change deploy-ubuntu URL in pytorch wf for testing

8c6f04a

Add branch to action URL

4fd1688

Increase swap

5a2045f

Revert "Increase swap"

dfdb0bf

This reverts commit 5a2045f.

HGuillemet marked this pull request as ready for review May 6, 2024 17:03

Remove SWAP_SIZE. Ditch android and dotnet.

45ec370

saudet reviewed May 15, 2024

View reviewed changes

.github/actions/deploy-ubuntu/action.yml Outdated Show resolved Hide resolved

saudet requested a review from sbrunk May 15, 2024 14:30

HGuillemet and others added 3 commits May 18, 2024 14:07

Move swap file creation

c2e4936

Merge remote-tracking branch 'upstream/master' into pytorch_2_3_0

97cb8c6

Update CHANGELOG.md and fix nits

f0db915

saudet approved these changes May 19, 2024

View reviewed changes

saudet merged commit f8932a4 into bytedeco:master May 19, 2024
3 of 6 checks passed

HGuillemet deleted the pytorch_2_3_0 branch May 22, 2024 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to Pytorch 2.3.0 #1498

Update to Pytorch 2.3.0 #1498

HGuillemet commented May 3, 2024 •

edited

HGuillemet commented May 3, 2024 •

edited

saudet commented May 3, 2024 via email

HGuillemet commented May 4, 2024 •

edited

HGuillemet commented May 4, 2024 •

edited

saudet commented May 4, 2024

HGuillemet commented May 5, 2024 •

edited

saudet commented May 5, 2024

HGuillemet commented May 5, 2024

HGuillemet commented May 5, 2024

saudet commented May 5, 2024 via email

HGuillemet commented May 5, 2024

saudet commented May 5, 2024

HGuillemet commented May 6, 2024

saudet commented May 6, 2024

HGuillemet commented May 6, 2024

saudet commented May 6, 2024 via email •

edited

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

saudet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024 •

edited

HGuillemet commented May 7, 2024

HGuillemet commented May 7, 2024

Update to Pytorch 2.3.0 #1498

Update to Pytorch 2.3.0 #1498

Conversation

HGuillemet commented May 3, 2024 • edited

HGuillemet commented May 3, 2024 • edited

saudet commented May 3, 2024 via email

HGuillemet commented May 4, 2024 • edited

HGuillemet commented May 4, 2024 • edited

saudet commented May 4, 2024

HGuillemet commented May 5, 2024 • edited

saudet commented May 5, 2024

HGuillemet commented May 5, 2024

HGuillemet commented May 5, 2024

saudet commented May 5, 2024 via email

HGuillemet commented May 5, 2024

saudet commented May 5, 2024

HGuillemet commented May 6, 2024

saudet commented May 6, 2024

HGuillemet commented May 6, 2024

saudet commented May 6, 2024 via email • edited

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

saudet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024

HGuillemet commented May 7, 2024

saudet commented May 7, 2024 • edited

HGuillemet commented May 7, 2024

HGuillemet commented May 7, 2024

HGuillemet commented May 3, 2024 •

edited

HGuillemet commented May 3, 2024 •

edited

HGuillemet commented May 4, 2024 •

edited

HGuillemet commented May 4, 2024 •

edited

HGuillemet commented May 5, 2024 •

edited

saudet commented May 6, 2024 via email •

edited

saudet commented May 7, 2024 •

edited