Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging Sapporo #997

Open
4 tasks
LourensVeen opened this issue Oct 20, 2023 · 11 comments
Open
4 tasks

Packaging Sapporo #997

LourensVeen opened this issue Oct 20, 2023 · 11 comments

Comments

@LourensVeen
Copy link
Collaborator

I'm looking at creating Anaconda packages (#525), and it seems that the place to start is with the Sapporo library. There is currently a sapporo_light in the lib/ directory here, and there is Sapporo2 in a separate repository.

(Of the other things in lib/, I understand forsockets, stopcond and amuse_mpi to be a part of the AMUSE framework, and probably best packaged with that, while g6 (if still relevant) and simple_hash could be separate packages as well. But those are separate issues.)

What's what

As I understand it, Sapporo started life as a compatibility library that allowed codes written for the GRAPE5/GRAPE6 hardware to run on any CUDA-compatible GPU. There's now a Sapporo light which seems like a simplified version that only supports the GRAPE6 API, and a Sapporo2 which supports GRAPE5 as well as two new integrators that I'm not sure correspond to any hardware, and which have a different API (maybe the idea was to share some GPU code?). Then there is g6lib, which implements the GRAPE6 API on the CPU. There's a copy of g6lib inside the sapporo_light directory, probably by mistake as it doesn't seem to get compiled or used anywhere.

Here's an overview of what's what:

Library CPU GPU G5 API G6 API Yebisu API 6th API
g6lib + - - F+C - -
sapporo_light - CUDA - F - -
sapporo2 - CUDA+OpenCL C F C C

Fortran likes to add an underscore to the end of symbols in its ABI, where C does not, so if you've got a C function that's supposed to be called from Fortran, you'll want to add an underscore to its name. For each API above, C means that there is a non-underscore version, and F that there is a version with an underscore of the symbols.

Users of Sapporo

There seem to be four community codes in AMUSE that use Sapporo:

Code Written in G6 API 6th API Notes
bhtree C++ F - Underscores added in the C++ code.
ph4 C++ F - Has its own set of forwarding functions to convert from C to F symbol names. Has its own CPU code in case no GPU is available.
phigrape Fortran F - Requires either Sapporo with a GPU, or an actual GRAPE6 board and library.
mi6 C++ - C Has its own declarations in 6thorder.h for the functions in Sapporo2's sapporo6thlib.cpp. Also has a sapporo2_dummy.cc with a CPU-based implementation of the 6th API.

Currently, the first three are built against sapporo_light, while mi6 falls back to the CPU and probably needs the user to supply a sapporo2 installation for it to use the GPU. bhtree seems to be able to work with g6lib, but it's disabled in the Makefile.

So, it seems that bhtree, ph4 and phigrape all use the GRAPE6 API, and can work with either sapporo_light or sapporo2, while mi6 requires sapporo2 for GPU support, but can use its own sapporo2_dummy.cc on the CPU if there is no GPU or no sapporo2.

Plan

Looking at all this, it seems to me that it makes sense to package sapporo2, and then build everything else against it, and not bothering with sapporo_light. Is that right, or am I missing something?

To do that, a few improvements would be good to have:

  • Export both C and Fortran versions of at least the GRAPE6 API, so that the community codes don't need their own hacks.
  • Add proper C headers in sapporo2, where they belong, so that we can remove them from the community codes. (I'd leave a Fortran module definition for the future.)
  • Rearrange the source layout to be more in line with other software. In particular, put the headers for the public API into include/ and the rest into src/, lib/ to me is a directory where you install binaries.
  • Modify the build system to build a shared library as well as a static one. Not much point in packaging otherwise.

This should all be backwards compatible, but I should probably test building AMUSE against the new version to be sure.

Questions

  • Does it make sense to just go with sapporo2 and forget about sapporo_light?
  • Is anyone still maintaining the sapporo2 repository?
  • Are there any other known users of sapporo2 whose work I might mess up by changing anything?
@LourensVeen LourensVeen self-assigned this Oct 20, 2023
@LourensVeen
Copy link
Collaborator Author

For completeness, I forgot to mention Kirin, which is similar to Sapporo light and possibly a predecessor of it?

@rieder
Copy link
Member

rieder commented Oct 23, 2023

Hi @LourensVeen,

  • Yes, Kirin is a predecessor of Sapporo. Well found!
  • I am in favour of dropping sapporo_light for sapporo2. It adds multi-gpu support in addition to what was mentioned above. I don't think any codes require sapporo_light specifically. See deprecate sapporo_light? #845
  • sapporo2 is maintained by https://github.com/treecode, which is @jbedorf and Evghenii Gaburov. But I think only for important fixes at this point.
  • I'm not sure if there are other sapporo2 users, but I don't think so.

@LourensVeen
Copy link
Collaborator Author

Okay, let's do sapporo2 then.

I've managed to compile it with CUDA 12 after some tweaks, but not with the OpenCL support in CUDA. OpenCL is rather deprecated at this point, but I'll try with a non-CUDA OpenCL library to see if that helps. It could be that it simply uses obsolete OpenCL features, the CUDA compiler also gives a bunch of deprecation warnings about the CUDA kernels.

I have no experience with GPU programming, although I've always wanted to learn. It would probably be more efficient though to see if I can get one of my colleagues to modernise this and maybe convert it to HIP and Vulkan, or whatever seems appropriate. But that would be a separate project, so I'd like to postpone that and focus on packaging things as they are for now.

Question for @jbedorf: if I make the changes above, would you be able to review and merge a couple pull requests?

@jbedorf
Copy link
Contributor

jbedorf commented Oct 24, 2023

Question for @jbedorf: if I make the changes above, would you be able to review and merge a couple pull requests?

Sure! And if you have any questions on the code let me know and I'll see what I remember.

@LourensVeen
Copy link
Collaborator Author

Okay, I have the above basically done, but now the plot thickens. Of course the point of this is to build a Conda package, and I've tried to do that, and with my changes Sapporo2 compiles successfully in a Conda environment with Conda-installed compilers and CUDA libraries.

However, there is a packaging issue with the conda-forge CUDA packages (conda-forge/nvcc-feedstock#12) which makes it impossible to build Conda packages against the old libcuda.so interface. Newer CUDA programs link against libcudart.so apparently, which implements a newer API, and that is supposed to work. But that didn't exist yet when Sapporo2 was written. (I think they split the driver from the library, libcuda.so comes with the driver while libcudart.so comes with the library, or something like that.)

So it looks like I'll need to either implement a compatibility layer for the compatibility layer, or learn enough CUDA to port Sapporo2 to the new API. The former doesn't make sense, so I guess the latter it is. Also, I need to check the community codes that use CUDA and see if we can expect that problem to appear in other places too... Time to do some reading and make a plan.

@jbedorf
Copy link
Contributor

jbedorf commented Nov 8, 2023

Back in the days (and maybe still today?) the cuda-runtime wrapper was meant as an easier to use interface than the cuda-driver interface. However, the driver interface allowed us to make the host code universal for both CUDA & OpenCL by developing a thin CUDA/OpenCL specific layer as those APIs would follow similar methods & semantics. Whereas the runtime library at that point required the <<< >>> launch configuration settings.

Switching to the runtime library is totally possible but it would require the host code section of the sapporo library to use the runtime library, and as such dropping OpenCL support. Given your previous comments about OpenCL that might not be a bad thing per-se, but it would be much more work than just changing the make files...

@LourensVeen
Copy link
Collaborator Author

That sounds like quite a bit of work. Also, I've been hearing some noise regarding OpenCL making a bit of a comeback in the last year or so, so it's hard to see what will happen. Maybe something like Kokkos is the way to go.

Anyway, I've done some more digging around, and it seems like there may be a way to just tell Conda that it's okay to have this dangling dependency that needs to be resolved from the system. There are conda-forge packages for Gromacs and Pytorch that use CUDA, so it seems like there should be a way. Although it also seems that conda-forge has its own way of dealing with CUDA that doesn't work one-on-one elsewhere, so I need to play with this more.

It looks like there's also a way to provide multiple packages with different backends, i.e. a sapporo2-cuda and a sapporo2-opencl, and then the user can specify which they want to use, after which conda install amuse should automatically grab the appropriate one. I don't know how that works yet either, but I'm going to figure it out 😄.

@LourensVeen
Copy link
Collaborator Author

Bit of an update here. I got the dangling dynamic link taken care of, there turns out to be an option for that, and other packages use it too, and it makes sense. I can at least locally build a Sapporo2 Conda package now that depends on CUDA, although there's no testing yet. At any rate I don't want to publish anything until we have some client code built against it and packaged and tested.

I've also been looking into the multiple-backends issue, and this is a mess. I haven't found an example of a package that has multiple implementations with the same API/ABI, which we would have here. Debian's dpkg has virtual packages, which is exactly what we would need, but it doesn't seem like Conda has them. (It does have something called virtual packages, but it's not the same thing.)

MPI is a bit similar to what we are doing, in that it has a standard API at least. On conda-forge, there's an mpi metapackage which has multiple copies with different build strings for the different MPI implementations, so you get mpi-1.0-openmpi and mpi-1.0-mpich etc. Packages that need MPI, like mpi4py then build for all different versions of MPI, with each package depending on the corresponding dependency directly, which in turn depends on the corresponding version of mpi.

So now, if the user pins mpi to mpi-1.0-openmpi, then the only MPI implementation that will install is openmpi, because installing e.g. mpich would upgrade (sidegrade?) mpi to mpi-1.0-mpich and that's impossible because of the pin. So when installing mpi4py, you'd automatically get a version of it that uses openmpi, because that's the only combination that's compatible with the pinned version of mpi. If you try to conda install mpi you get the Intel MPI version of that package, but note that it's empty and that Intel MPI isn't actually installed. If you try to conda install mpi4py you get the version with mpich, but I can't find any specification of this being the preferred option, it seems to be random.

So we could use this mechanism to keep a sapporo2-opencl and sapporo2-cuda package from being installed at the same time, with client code depending on a sapporo2 metapackage along the lines of mpi, and the user then installing the client code and either sapporo2-opencl or sapporo2-cuda explicitly. We could possible have amuse-opencl and amuse-cuda metapackages that would depend on the corresponding version of sapporo2, so that the user can just install one package and get the whole compatible stack.

It seems that the more standard way to do things is to make different package variants, which means we'd have a single package sapporo2 with two variants cuda and opencl (or likely more, for different CUDA versions). Packages using Sapporo would then build multiple variants as well. You can only have one variant of package installed, so collisions would be avoided automatically by Conda.

The issue with that is that if you have multiple dependencies like that, you get a combinatorial explosion. Gromacs for example has MPI/No MPI, CUDA/No CUDA, and double precision or not, but it skips certain combinations so that in the end we get packages for five different combinations. The build number is abused here to specify a preference for No MPI, No CUDA and single precision.

Doing something similar would potentially lead to a lot of different packages being built, but if this is how it works then perhaps its best to just go with the flow. A user may eventually end up installing AMUSE using

conda install 'amuse=*=cuda'

if they have an nVidia GPU, with

conda install amuse

installing the latest CPU-only version, and

conda install 'amuse=*=opencl'

installing as much as can be installed with OpenCL available.

@LourensVeen
Copy link
Collaborator Author

And then I tried to run my meta.yaml with the conda-forge infrastructure and discovered that they do CUDA differently. There an issue at conda-forge/cuda-version-feedstock#1 where they hashed out the design, but it doesn't seem to have made it to the maintainer docs yet, so you have to find it.

But well, that design does actually make sense and once you've figured out how to do it, it does seem to work. Although I still need to add tests, and I'm not sure how to build versions for different CUDA versions, and/or whether that is needed. 11.2 seems to be it for now.

Question for resident Mac expert @rieder: as I understand it Macs with nVidia chips and CUDA are getting rare, CUDA on Mac is no longer supported by either Apple or nVidia, and neither is OpenCL. Is that right? Does it make sense then to only build Linux packages of Sapporo2? Or should I try to see if the CPU support that the code seems to hint at really is there and can be revived? Or maybe the answer to that is to use the OpenCL version with pocl? That is supposed to work on Mac actually, as far as I understand, but I'm not sure if there's a point to doing so?

@LourensVeen
Copy link
Collaborator Author

Looks like the answer to GPU-on-mac is that somebody should add Metal support to Sapporo2 at some point. Not the highest priority, so we'll leave that for the future, and build Sapporo2 only for Linux OpenCL and CUDA.

Copy link

stale bot commented Jan 30, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues that have been around for a while without updates label Jan 30, 2024
@rieder rieder added feature request keep-open and removed stale Issues that have been around for a while without updates labels Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants