Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for a memory leak when using Primary Context #305

Open
JBlaschke opened this issue Jul 26, 2021 · 7 comments
Open

Potential for a memory leak when using Primary Context #305

JBlaschke opened this issue Jul 26, 2021 · 7 comments

Comments

@JBlaschke
Copy link

JBlaschke commented Jul 26, 2021

I am working with a group that is using PyCUDA in a task-based parallel application. We therefore don't have a "main" thread that can "own" the PyCUDA primary context. Instead we found a solution where we push/pop the Primary Context on the context stack:

dev = drv.Device(DEV_ID)
ctx = dev.retain_primary_context()
ctx.push()
# ... do work
ctx.pop()

The problem with this approach is that any variables referring to GPU memory are still in scope when ctx.pop() is called. This appears to result in their not being free'd on the device, leaking memory. Explicitly calling del on these variables does not stop the memory leak.

Our solution has been to create an execution-guard, enuring that all GPU variables are out of scope before ctx.pop() is called:

class GPUTaskWrapper(object):
    def __init__(self, thunk):
        self.thunk = thunk
    def __call__(self, *args, **kwargs):
        dev = drv.Device(DEV_ID)
        ctx = dev.retain_primary_context()
        ctx.push()
        try:
            return self.thunk(*args, **kwargs)
        finally:
            ctx.pop()

def gpu_task_wrapper(thunk):
    return GPUTaskWrapper(thunk)

@gpu_task_wrapper
def work_function():
    # ... do work

@elliottslaughter gets credit for making this work.

@inducer
Copy link
Owner

inducer commented Jul 27, 2021

What you describe is a real problem. And given the current CUDA API, it's difficult to do better than what's currently implemented in PyCUDA. The following are the constraints:

  • garbage collection may request to deallocate GPU memory at any time.
  • GPU memory must be freed in the context in which it was allocated.
  • If that context is not active, it must be pushed onto the context stack.
  • If another thread is currently using that context, we won't be able to activate it. In this case, PyCUDA will print a warning message and simply drop that memory on the floor.

The only "bulletproof" solution that comes to mind is to maintain a per-context queue of "memory-yet-to-be-deallocated". I'd be open to considering patches implementing this if they are thoroughly tested. If you are able to migrate to a less stateful API (OpenCL?), this problem would disappear on its own.

If you can ensure that no GPU memory gets freed outside your GPUTaskWrapper, then what you have should also work. Another option would be to preallocate all GPU memory you need and hold onto it until program completion, if that's an option.

@JBlaschke
Copy link
Author

Thanks for the feedback @inducer I will look at ways to provide a PR. And I share your frustration with CUDA's context management. In the meantime, @elliotslaughter 's solution above is something that works for our needs (for now)

I wonder if this would be simpler to fix in situations where there is ever only one context per GPU (i.e. the way that the CUDA runtime API does it with primary contexts). I.e. then GPU memory could always be freed by using the on-and-only context. Essentially this simpler model would clean up after users if they are only using the primary context, and leave the work of freeing memory up to them if they have fancier uses of multiple contexts.

@inducer
Copy link
Owner

inducer commented Jul 27, 2021

where there is ever only one context per GPU

But if only one thread can have that context as its active one, how do you propose dealing with multiple threads?

@elliottslaughter
Copy link

Ok, this is a good limitation to know about.

Our system does use multiple threads, but we typically limit it so that at most one thread is running at a time. (There isn't really a point to allowing more, as the GIL will kill any parallel speedup anyway. When we really need parallelism, we use multiple processes.)

If we can promise that exactly one thread will use the context at a time, is it ok to leave the context active in multiple threads? Or is that still bad?

@inducer
Copy link
Owner

inducer commented Jul 27, 2021

This post seems to indicate that, as of CUDA 4 (i.e. after PyCUDA's context management code was written), it became possible to have one context be current for multiple threads. The docs at https://docs.nvidia.com/cuda/cuda-driver-api/index.html still don't clearly reflect that AFAICT (or maybe I didn't look hard enough). I'd be happy to consider patches (if needed) that allow making use of that in PyCUDA.

@elliottslaughter
Copy link

For what it's worth, asking around my NVIDIA contacts I was able to dig up a few more links:

@inducer
Copy link
Owner

inducer commented Aug 7, 2021

#306 (recently filed) is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants