Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Dispatcher refactor 2: inherit from dispatcher.Dispatcher #7815

Merged
merged 23 commits into from
Feb 18, 2022

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Feb 4, 2022

(Note: based on #7814 - it should be easier to tell the differences once that is merged - in the meantime this comparison shows the changes unique to this PR)

The main accomplishment of this PR is to make the CUDA dispatcher inherit from numba.core.dispatcher.Dispatcher and reuse as much of its logic as possible, rather than duplicating it (and also to serve as a base for implementing on-disk caching, AOT compilation, better support for the high-level extension API, etc.).

This necessitates a set of individual changes that accompany the main change:

  • Creation of a CUDACompileResult, which reports its entry point as its id - this is because certain functions expect compile results to have entry points - however, this isn't true for CUDA because you can't enter a CUDA kernel through the C-layer dispatcher. Using the id gives each CUDA compile result a unique identifier.
  • A small change to compile_result to factor out sanitizing the entries that go into the compile result, so that this logic can be shared between the core and the CUDA target.
  • Changes to numba.cuda.decorators so it is responsible for handling signatures and compilation - the CUDA dispatcher no longer needs to be aware of signatures, and this behaviour mirrors the CPU target. CUDA still only supports one signature, but this can now be more easily changed in future.
  • Fixes to the setup of the extensions target option, so that the CUDA dispatcher no longer needs to make a strange "defensive copy".
  • Two additions to _Kernel so that they have the objectmode and entry_point properties, as if they were compile results - this is because overloads are expected to be compile results by some of the core dispatcher logic.
  • Some refactors to ForAll that make the terminology more representative (but no functional change).
  • Fix a very strange "how did it ever work"-style bug in resolve_value_type, where it was wrapping a CPU dispatcher in a CUDA dispatcher instead of wrapping the Python function in a CUDA dispatcher when calling an @njit function from a @cuda.jit function.
  • Add test of docstrings for CUDA dispatchers (Use importlib to load numba extensions #5209).

Fixes #7741.

Description prior to editing, kept for reference:

Fixes #5902.

To do:

  • Resolve the issue with _make_finalizer - possibly one of:
    • be certain that a dummy finalizer is fine,
    • remove the finalization altogether if it's no longer needed (perhaps this is an old Python / Python 2 remnant)
    • implement a custom non-dummy finalizer
    • make _Kernel pretend to be a CompileResult a bit more by giving it an entry_point.
  • Add a test that cuda.jit does NOT preserve the original __doc__ and __module__ of the decorated function #5902 is fixed (I have only manually verified it so far).

Current test status:

```
Ran 1266 tests in 102.919s

FAILED (failures=1, errors=5, skipped=12, expected failures=7)
```
Test results:

```
Ran 1266 tests in 127.002s

FAILED (failures=66, errors=17, skipped=12, expected failures=7)
```
Test results now:

```
Ran 1266 tests in 105.203s

OK (skipped=12, expected failures=7)
```
@gmarkall
Copy link
Member Author

gmarkall commented Feb 4, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Feb 4, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Feb 4, 2022

  • Resolve the issue with _make_finalizer

Regarding this, my present thinking is that the way to go is to give _Kernel objects an entry_point of None, because they were never inserted into the target context. We still should have the finalizer for device functions, because they are inserted into the target context, and are actual compile results with an entry point.

- For device functions, this works exactly like on the CPU target
  because device functions are CompileResult objects that are inserted
  into the target context (and therefore need removing by the
  finalizer).
- For kernels, we give them a dummy entry point because they were never
  inserted and don't need removing (similar to with object mode
  functions).
@gmarkall
Copy link
Member Author

gmarkall commented Feb 4, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Feb 4, 2022

gpuci run tests

@gmarkall gmarkall mentioned this pull request Feb 4, 2022
@gmarkall gmarkall changed the title [WIP] CUDA Dispatcher refactor 2 CUDA Dispatcher refactor 2: inherit from dispatcher.Dispatcher Feb 4, 2022
@gmarkall gmarkall added 3 - Ready for Review CUDA CUDA related issue/PR labels Feb 4, 2022
@gmarkall gmarkall added this to the Numba 0.56 RC milestone Feb 4, 2022
@gmarkall gmarkall marked this pull request as ready for review February 4, 2022 23:39
@gmarkall
Copy link
Member Author

gmarkall commented Feb 8, 2022

@stuartarchibald I've merged main into this PR so the diff is now straightforward (compared to what it was 😆)

gpuci run tests

@gmarkall gmarkall added the Effort - medium Medium size effort needed label Feb 8, 2022
@stuartarchibald
Copy link
Contributor

@stuartarchibald I've merged main into this PR so the diff is now straightforward

Thanks @gmarkall, the patch is much easier too look at now!

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, great to see all the prior refactoring effort finally manifest in this PR!

Given the logical changes herein were discussed extensively OOB whilst implementing this, I'm of the view that they are correct! The only comments I have are that there's a couple of typos to look at but otherwise this looks good.

I think due to the nature of the change, it'd be good to build this PR locally and run the RAPIDS test suite against it just to provide another layer of confidence on top of GPU CI and the Numba build farm. Thanks again!!

numba/cuda/compiler.py Outdated Show resolved Hide resolved
numba/cuda/compiler.py Outdated Show resolved Hide resolved
numba/cuda/dispatcher.py Outdated Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Feb 11, 2022
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall gmarkall added the 4 - Waiting on CI Review etc done, waiting for CI to finish label Feb 11, 2022
@gmarkall
Copy link
Member Author

Status marked as:

  • "Waiting on CI" for BuildFarm, and
  • "Waiting on author" for a run with RAPIDS.

@gmarkall
Copy link
Member Author

gpuci run tests

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and fixes.

@gmarkall
Copy link
Member Author

This passed a run with RAPIDS, so I'll remove "Waiting on author" - I think it's only waiting on the BuildFarm run now?

@gmarkall gmarkall added Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm and removed 4 - Waiting on author Waiting for author to respond to review labels Feb 16, 2022
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_117.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_117.

This failed as mainline is currently failing on CUDA due to issues hopefully resolved in #7846

@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

@stuartarchibald I've merged from master and re-tested with CI and gpuCI now that #7846 is merged. Could this have another buildfarm run please?

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_119.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_119.

Passed!

@stuartarchibald stuartarchibald added BuildFarm Passed For PRs that have been through the buildfarm and passed 5 - Ready to merge Review and testing done, is ready to merge and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on CI Review etc done, waiting for CI to finish labels Feb 18, 2022
@stuartarchibald
Copy link
Contributor

This passed a run with RAPIDS, so I'll remove "Waiting on author" - I think it's only waiting on the BuildFarm run now?

@gmarkall Thanks for testing this against RAPIDS, as this patch is passing the test suite on an external code base it increases my confidence that we got this right!

The Numba buildfarm also passed, have marked this as ready to merged!

@sklam sklam merged commit 27426b5 into numba:main Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR Effort - medium Medium size effort needed
Projects
None yet
3 participants