-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Dispatcher refactor 2: inherit from dispatcher.Dispatcher
#7815
Conversation
Current test status: ``` Ran 1266 tests in 102.919s FAILED (failures=1, errors=5, skipped=12, expected failures=7) ```
Test results: ``` Ran 1266 tests in 127.002s FAILED (failures=66, errors=17, skipped=12, expected failures=7) ```
Test results now: ``` Ran 1266 tests in 105.203s OK (skipped=12, expected failures=7) ```
gpuci run tests |
gpuci run tests |
Regarding this, my present thinking is that the way to go is to give |
- For device functions, this works exactly like on the CPU target because device functions are CompileResult objects that are inserted into the target context (and therefore need removing by the finalizer). - For kernels, we give them a dummy entry point because they were never inserted and don't need removing (similar to with object mode functions).
gpuci run tests |
gpuci run tests |
dispatcher.Dispatcher
@stuartarchibald I've merged main into this PR so the diff is now straightforward (compared to what it was 😆) gpuci run tests |
Thanks @gmarkall, the patch is much easier too look at now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, great to see all the prior refactoring effort finally manifest in this PR!
Given the logical changes herein were discussed extensively OOB whilst implementing this, I'm of the view that they are correct! The only comments I have are that there's a couple of typos to look at but otherwise this looks good.
I think due to the nature of the change, it'd be good to build this PR locally and run the RAPIDS test suite against it just to provide another layer of confidence on top of GPU CI and the Numba build farm. Thanks again!!
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
gpuci run tests |
Status marked as:
|
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and fixes.
This passed a run with RAPIDS, so I'll remove "Waiting on author" - I think it's only waiting on the BuildFarm run now? |
Buildfarm ID: |
This failed as mainline is currently failing on CUDA due to issues hopefully resolved in #7846 |
gpuci run tests |
@stuartarchibald I've merged from master and re-tested with CI and gpuCI now that #7846 is merged. Could this have another buildfarm run please? |
Buildfarm ID: |
Passed! |
@gmarkall Thanks for testing this against RAPIDS, as this patch is passing the test suite on an external code base it increases my confidence that we got this right! The Numba buildfarm also passed, have marked this as ready to merged! |
(Note: based on #7814 - it should be easier to tell the differences once that is merged - in the meantime this comparison shows the changes unique to this PR)
The main accomplishment of this PR is to make the CUDA dispatcher inherit from
numba.core.dispatcher.Dispatcher
and reuse as much of its logic as possible, rather than duplicating it (and also to serve as a base for implementing on-disk caching, AOT compilation, better support for the high-level extension API, etc.).This necessitates a set of individual changes that accompany the main change:
CUDACompileResult
, which reports its entry point as itsid
- this is because certain functions expect compile results to have entry points - however, this isn't true for CUDA because you can't enter a CUDA kernel through the C-layer dispatcher. Using theid
gives each CUDA compile result a unique identifier.compile_result
to factor out sanitizing the entries that go into the compile result, so that this logic can be shared between the core and the CUDA target.numba.cuda.decorators
so it is responsible for handling signatures and compilation - the CUDA dispatcher no longer needs to be aware of signatures, and this behaviour mirrors the CPU target. CUDA still only supports one signature, but this can now be more easily changed in future.extensions
target option, so that the CUDA dispatcher no longer needs to make a strange "defensive copy"._Kernel
so that they have theobjectmode
andentry_point
properties, as if they were compile results - this is because overloads are expected to be compile results by some of the core dispatcher logic.ForAll
that make the terminology more representative (but no functional change).resolve_value_type
, where it was wrapping a CPU dispatcher in a CUDA dispatcher instead of wrapping the Python function in a CUDA dispatcher when calling an@njit
function from a@cuda.jit
function.Fixes #7741.
Description prior to editing, kept for reference:
Fixes #5902.
To do:
_make_finalizer
- possibly one of:_Kernel
pretend to be aCompileResult
a bit more by giving it anentry_point
.