Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for CUDA Graphs. #343

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gfokkema
Copy link

@gfokkema gfokkema commented Jan 15, 2022

Hi there!

I wanted to experiment with CUDA Graphs a bit to get a feel for the performance differences between blocking, async and graph execution.

See:

However, while most required functionality is available (async, specifying stream, etc), pycuda does not have Graph support yet.
This PR adds some initial support to launch a kernel pipeline using a CUgraph.

I'd love your comments and feedback, most likely I am not freeing memory correctly etc, let me know!
All in all everything seems to be working enough to be useful already :)

Nice bonus is CUDA Graph API offers a function to output dot files, see picture below and the demo in examples/demo_graph.py.
Note that the demo launches the kernel only once.
Due to overhead, benefits of the Graph API should only really start showing when launching kernels repeatedly.

CUDA Graph

@inducer
Copy link
Owner

inducer commented Jan 16, 2022

This looks great, thanks for working on this! To be merged, it'd of course need docs and tests. For lack of GPUs, I don't have usable CI for PyCUDA on Github, but I do have that on a Gitlab instance I run. Mind if I create a user account for you there?

cc @kaushikcfd

@gfokkema
Copy link
Author

Hi, thanks for the feedback! Yes, this PR was meant primarily to pitch the idea and get some early feedback :)

And access to already usable CI would be great!

@inducer
Copy link
Owner

inducer commented Jan 17, 2022

Made an account for you, you should have that info in your email. The site is at https://gitlab.tiker.net/inducer/pycuda.

@mgaedtke
Copy link

I did some experiments and tests with this and it seems to work without any errors so far. What would be the next steps to bring this to a future release?

@inducer
Copy link
Owner

inducer commented Jun 15, 2022

It's clear that this should happen, ideally soon. As it happens, there are now two (draft) versions of this, one here:

https://gitlab.tiker.net/kaushikcfd/pycuda/-/merge_requests/2/diffs

and the other one in this PR. (They got started independently.) @mitkotak, could you comment on your plans with respect to upstreaming your work?

@mitkotak
Copy link
Contributor

Thanks for your interest in this PR. Right now my estimate is to merge this feature into main in about a month. Most of the wrapper building is done. The purpose of my PR is to broaden the graph creation routes i.e exposing the finer-grained graph building routines in CUDAGraph API alongside the (begin|end)_capture approach. Right now I am handling regression failures, adding more tests and working on docs. Thanks !

@YanBC
Copy link

YanBC commented Oct 20, 2022

Hi there, any updates on the cuda graph feature?

@mitkotak
Copy link
Contributor

mitkotak commented Oct 21, 2022

Hi there, any updates on the cuda graph feature?

Thank you very much for the interest ! We are still testing the PR to make sure that we don't break any existing functionality but if you are curious to learn more then you can try it out using git clone https://gitlab.tiker.net/kaushikcfd/pycuda.git --branch cudagraph and then install it using pip install -e .. You can get comfortable with the syntax through examples/cudagraph_kernel.py and examples/cudagraph_streamcapture.py, and for the docs you can look for CUDAGraphs in doc/driver.rst. Thanks again for the interest and apologies for the delay !

@mgaedtke
Copy link

Hi @mitkotak, very much looking forward for this feature! Any idea, when the PR could be ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants