compiler: Unified Memory Allocator #2023

guaacoelho · 2022-10-21T14:24:09Z

Hello everyone,

We of SENAI CIMATEC are working in a Unified Memory Allocator to Devito trough CuPy library.

The first results using this new allocator have impressive results when using checkpointing compared to default allocator in GPU.

The impact of performance in our experiment using Overthrust (894x884x299) are close to three times* compared to default allocator in Devito.

With this approach we expect to be able to allocate memory beyond the GPU capacity in the future.

We will open a Draft PR to standardize this allocator with Devito patterns, fix possible bugs and open to community use :)

A version enabling this through External Allocator is in development too and we expect share this soon.

All feedbacks are welcome :)

Thank you all CIMATEC and Devito team, to make this possible 🙂

*All experiments were using Nvidia V100 with 32 GB of memory.

… if it is CupyAllocator

guaacoelho · 2022-10-21T14:29:36Z

It was used the version 8.3.0 of Cupy

speglich · 2022-10-21T14:47:32Z

ToDo:

Distributed GPU allocation

FabioLuporini · 2022-10-21T17:14:59Z

devito/data/data.py

@@ -82,7 +82,8 @@ def __del__(self):
            # Dask/Distributed context), which may (re)create a Data object
            # without going through `__array_finalize__`
            return
-        self._allocator.free(*self._memfree_args)
+        if self._allocator is not CUPY_ALLOC:


No, you should override free instead

in fact, free is a no-op, so why the need to special-case here?

FabioLuporini

Thanks for the contribution! I left a few comments.

There also are other issues, in particular lack of tests, update to installation, etc. You may want to take a look at this file: https://github.com/devitocodes/devito/blob/master/CONTRIBUTING.md#contributing-to-devito

FabioLuporini · 2022-10-21T17:15:12Z

devito/data/allocators.py

@@ -15,7 +16,7 @@

 __all__ = ['ALLOC_FLAT', 'ALLOC_NUMA_LOCAL', 'ALLOC_NUMA_ANY',
           'ALLOC_KNL_MCDRAM', 'ALLOC_KNL_DRAM', 'ALLOC_GUARD',
-           'default_allocator']
+           'CUPY_ALLOC', 'default_allocator']


ALLOC_CUPY for homogeneity

FabioLuporini · 2022-10-21T17:15:18Z

devito/data/allocators.py

+class CupyAllocator(MemoryAllocator):
+
+    """
+    Memory allocator based on ``posix`` functions. The allocated memory is


copy-paste docstring

FabioLuporini · 2022-10-21T17:15:25Z

devito/data/allocators.py

+    aligned to page boundaries.
+    """
+
+    is_Posix = True


leftover.....

FabioLuporini · 2022-10-21T17:15:52Z

devito/data/allocators.py

+        mem_obj = cp.zeros(size, dtype=cp.float64)
+        return mem_obj.data.ptr, mem_obj
+
+    def free(self, c_pointer):


CuPy frees the memory right? unless we explicitly tell it at some point?

FabioLuporini · 2022-10-21T17:17:10Z

devito/data/data.py

@@ -82,7 +82,8 @@ def __del__(self):
            # Dask/Distributed context), which may (re)create a Data object
            # without going through `__array_finalize__`
            return
-        self._allocator.free(*self._memfree_args)
+        if self._allocator is not CUPY_ALLOC:


in fact, free is a no-op, so why the need to special-case here?

FabioLuporini · 2022-10-21T17:20:40Z

devito/passes/iet/definitions.py

@@ -435,6 +437,9 @@ def _map_function_on_high_bw_mem(self, site, obj, storage, devicerm, read_only=F
        """
        mmap = self.lang._map_to(obj)

+        if obj._allocator is CUPY_ALLOC:


this breaks the abstraction in all sort of ways, unfortunately

You can't access the allocator here to steer compilation

what really matters at this point is the _mem_spaceof the object: https://github.com/devitocodes/devito/blob/master/devito/types/basic.py#L39

you shouldn't actually end up here, because GPU-allocated functions shuold have a local mem-space, which in turns naturally prevents them ever enter this point

…te without openacc pragmas

…l of conditional inside 'del' method

speglich · 2023-02-14T17:17:59Z

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

FabioLuporini · 2023-02-15T08:07:04Z

tests/test_data.py

@@ -206,6 +207,26 @@ def test_indexing_into_sparse(self):
        sf.data[1:-1, 0] = np.arange(8)
        assert np.all(sf.data[1:-1, 0] == np.arange(8))

+    def test_uma_allocation(self):


There's a test_external_allocator somewhere in this file. Could you move that test, and this test, within a new class, say TestAllocators?

I updated the PR with this change

FabioLuporini · 2023-02-15T08:07:14Z

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

FabioLuporini

Based on what I said above, a few tweaks are still necessary here

…ma allocators

mloubout · 2023-03-08T14:25:35Z

Should this be added to the GPU CI and actually tested on the nvidia run?

guaacoelho · 2023-03-08T14:35:13Z

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

We've updated the allocators.py file to use conditional import for cupy. I think this solves this problem.

…th size = 0 (MPI)

mloubout · 2023-06-28T18:51:41Z

devito/data/allocators.py

+# try:
+#     from mpi4py import MPI  # noqa
+# except ImportError:
+#     MPI = None


mloubout · 2023-06-28T18:52:31Z

devito/data/allocators.py

+            cls.lib = cp
+            cls._initialize_shared_memory()
+            try:
+                from mpi4py import MPI


Why reimport and not import it from devito.mpi?

mloubout · 2023-06-28T18:52:54Z

devito/data/allocators.py

+                from mpi4py import MPI
+                cls.MPI = MPI
+                cls._set_device_for_mpi()
+            except:


except ImportError other errors should be caught

mloubout · 2023-06-28T18:54:03Z

devito/data/allocators.py

+    def _alloc_C_libcall(self, size, ctype):
+        if not self.available():
+            raise ImportError("Couldn't initialize cupy or MPI elements of alocation")
+        mem_obj = self.lib.zeros(size, dtype=self.lib.float64)


always float64? I thoiught size was for UInt8

mloubout · 2023-06-28T18:55:03Z

devito/data/allocators.py

+            ctype_1d = ctype * size
+            buf = ctypes.cast(c_pointer, ctypes.POINTER(ctype_1d)).contents
+            pointer = np.frombuffer(buf, dtype=dtype)
+        else:


How can we end up here? the c_pointer is None case is already above

During the execution in MPI, domain splitting can generate a situation where the allocated data size is zero, as we have observed with Sparse Functions. When this occurs, Cupy returns a pointer with a value of zero. This conditional statement was defined for this case.

Could you add a comment noting this, until some better solution is around?

We will push it, George.

mloubout · 2023-06-28T18:55:38Z

tests/test_data.py

@@ -1,8 +1,10 @@
 import pytest
 import numpy as np
+import cupy as cp


Need to be added somehow to the test requirements and this step should be decoratred with a skipif(device)

mloubout · 2023-07-28T12:25:42Z

Can you check if #2171 fixes the install issue

speglich · 2023-07-28T12:37:30Z

@mloubout, definitely, we will test it. BTW, this docker image worked too, thanks to you.

FROM nvcr.io/nvidia/nvhpc:23.5-devel-cuda12.1-ubuntu22.04 AS devel

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        git \
        make \
        wget && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        python3 \
        python3-dev \
        python3-pip \
        python3-setuptools \
        python3-wheel \
        python3.10-venv && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 --no-cache-dir install cupy-cuda12x

mloubout · 2023-08-02T13:57:08Z

Ok the nvidia setup has been updated so please:

rebase and answer comments
add the test to the nvidia test suite
For the second point you may have to modify the pytest-gpu.yml workflow file to add that extra test to it

…caught

mloubout · 2024-03-19T00:37:26Z

devito/data/allocators.py

+    def initialize(cls):
+
+        try:
+            import cupy as cp


Move try import for cupy at the top

try: import cupy as cp except importerror: cp = None

and just check if None here.

MPI should just be imported at the top from devito.mpi

mloubout · 2024-03-19T00:38:10Z

tests/test_data.py

@@ -1473,6 +1475,55 @@ def test_gather_time_function(self):
            assert ans == np.array(None)


+class TestAllocators(object):


move to test_gpu_common as TestCupyAllocator with an nividia device skip so it's added to GPU CI

Gustavo Coelho added 4 commits October 19, 2022 12:08

dsl: Creates CupyAllocator class

0befd4e

misc: Fix indentation

db87362

dsl: Fix del method allowing the dealocation of the Cupy data

ef1f368

dsl: Changes that exclude copyin and copyout pragmas from source code…

ca806b3

… if it is CupyAllocator

FabioLuporini reviewed Oct 21, 2022

View reviewed changes

georgebisbas added the compiler label Oct 25, 2022

Gustavo Coelho added 6 commits October 26, 2022 19:53

dsl: Remove the part of the code that makes the source code be genera…

50cd534

…te without openacc pragmas

dsl: Change from CUPY_ALLOC to ALLOC_CUPY

539254c

dsl: Update CupyAllocator's mem_free_args as a tuple, allowing remova…

ddb5991

…l of conditional inside 'del' method

misc: Fix indentation and comments

d337ac8

dsl: Update free method inside CupyAllocator

6511b06

tests: Add test to unified memory allocator

ce12f56

guaacoelho changed the title ~~[draft] Unified Memory Allocator~~ compiler: Unified Memory Allocator Feb 9, 2023

FabioLuporini reviewed Feb 15, 2023

View reviewed changes

FabioLuporini requested changes Feb 15, 2023

View reviewed changes

Gustavo Coelho added 2 commits March 8, 2023 10:34

dsl: Add conditional import for Cupy module

3ce03ba

test: Update tests adding a class responsible for test external and u…

f4231e2

…ma allocators

Gustavo Coelho added 5 commits April 13, 2023 15:09

dsl: Changing import cupy from init() to initialize()

c4444a1

dsl: Update to fix the problem when ALLOC_CUPY tries to alloc data wi…

f3f90c1

…th size = 0 (MPI)

dsl: Update CupyAllocator to run at multiples nodes using MPI

41838ae

dsl: Fix CupyAllocator to properly support MPI execution.

241e444

misc: Fix indentation

e724ffb

mloubout requested changes Jun 28, 2023

View reviewed changes

mloubout mentioned this pull request Jul 25, 2023

docker: add some tweaks to nvidia docker #2171

Merged

Gustavo Coelho added 6 commits March 8, 2024 12:26

misc: Removes unwanted leftover comments.

9379b31

dsl: Update the way MPI is imported at CupyAllocator

7814a46

misc: Add explanatory comment

6df7a06

dsl: Update "except" to "except ImportError". Other errors should be …

76dcdb1

…caught

tests: Update memory allocator test to use skipif('nodevice')

6ad6611

dsl: Update of the way data type allocation is defined

92ba35c

mloubout reviewed Mar 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: Unified Memory Allocator #2023

compiler: Unified Memory Allocator #2023

guaacoelho commented Oct 21, 2022

guaacoelho commented Oct 21, 2022

speglich commented Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini left a comment

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

FabioLuporini Oct 21, 2022

speglich commented Feb 14, 2023

FabioLuporini Feb 15, 2023

guaacoelho Mar 8, 2023

FabioLuporini commented Feb 15, 2023

FabioLuporini left a comment

mloubout commented Mar 8, 2023

guaacoelho commented Mar 8, 2023

mloubout Jun 28, 2023

mloubout Jun 28, 2023

mloubout Jun 28, 2023

mloubout Jun 28, 2023

mloubout Jun 28, 2023

guaacoelho Jul 18, 2023

georgebisbas Jul 26, 2023

speglich Jul 28, 2023

mloubout Jun 28, 2023

mloubout commented Jul 28, 2023

speglich commented Jul 28, 2023

mloubout commented Aug 2, 2023

mloubout Mar 19, 2024

mloubout Mar 19, 2024

		@@ -1473,6 +1475,55 @@ def test_gather_time_function(self):
		assert ans == np.array(None)


		class TestAllocators(object):

compiler: Unified Memory Allocator #2023

Are you sure you want to change the base?

compiler: Unified Memory Allocator #2023

Conversation

guaacoelho commented Oct 21, 2022

guaacoelho commented Oct 21, 2022

speglich commented Oct 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioLuporini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

speglich commented Feb 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioLuporini commented Feb 15, 2023

FabioLuporini left a comment

Choose a reason for hiding this comment

mloubout commented Mar 8, 2023

guaacoelho commented Mar 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mloubout commented Jul 28, 2023

speglich commented Jul 28, 2023

mloubout commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment