Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vulkan driver API #624

Draft
wants to merge 54 commits into
base: develop
Choose a base branch
from
Draft

Conversation

dcmvdbekerom
Copy link
Member

@dcmvdbekerom dcmvdbekerom commented Oct 9, 2023

Description

This pull request is to add Vulkan driver API. This adds support for all vendors (Nvidia/AMD/Intel/etc.) on all platforms (Windows/Linux/Mac).

For current PR:

  • Add vulkan_compute_lib.py for Vulkan backend
  • Add VkFFT for vulkan FFT's
  • Rewrite GPU code in Vulkan format
  • Add timing back in (possibly using timers from Vulkan API)
    • Fix timers (times are currently underreported)
  • Add linux support
  • Recompile vkFFT for older linux versions (hard!)
  • Add Vulkan files to manifest
  • Fix travis test issues
  • Fix memory leaks
  • Update docs
  • Fix exomol broadening issue
  • Make keywords accessible (deviceID, T_max, p_max, etc)
    • device_id
    • T_max & p_max

Cuda functionality (technically not urgently needed):

  • Homogenize Vulkan/Cuda backend to allow easy switching
    • Make iso uint32
    • Make database one big array
    • Move context into app
    • Move timer into app
    • Make class for constants
    • Move FFT into app
    • Remove dynamic resizing in favor of max T / max p (for now)
    • Remake command buffer equivalent
  • Check effect of strides on shader/kernel performance

Lower priority:

  • Convert GLSL to C++/CUDA
  • Replace CuFFT with VkFFT with CUDA backend
  • implement fft_backend
  • Use VkFFT for compilation of GPU code
  • Get bindings from GLSL (as opposed to hardcoded in gpu.py)
  • Bind descriptor set only once (as opposed to with every pipeline)
  • Dynamic scaling / remove max T and p / rewrite command buffer on the fly
  • Add pocketfft binary for multithreaded CPU FFT's

@dcmvdbekerom dcmvdbekerom marked this pull request as draft October 9, 2023 16:36
@dcmvdbekerom
Copy link
Member Author

Vulkan is working! It is still quite a bit slower, ~100ms (Vulkan) vs ~4ms (Cuda) on a Nvidia card, but I'm quite confident this can be solved through a better understanding of Vulkan and how to run pipelines efficiently.

Still lot's of open points, but the basic functionality is there.

You can change the GPU that's used by changing the deviceID keyword in line 108 of radis.gpu.gpu.py (changing the device will obviously be possible through keywords to spectrumfactory in the future).

At least for a dedicated card:
-Nvidia RTX2070 = 4ms.
Integrated cards are still much slower:
-Radeon Vega 8: 30ms.
-Intel UHD 630: 80ms.
@minouHub
Copy link
Collaborator

Here is a test I tried.

from radis import SpectrumFactory, plot_diff

sf = SpectrumFactory(
    2150,
    2450,  # cm-1
    molecule="CO2",
    isotope="1,2,3",
    wstep=0.002,
)

sf.fetch_databank("hitemp")

T = 1500.0  # K
p = 1.0  # bar
x = 0.8
l = 0.2  # cm
w_slit = 0.5  # cm-1

# s_cpu = sf.eq_spectrum(
#     name="CPU",
#     Tgas=T,
#     pressure=p,
#     mole_fraction=x,
#     path_length=l,
# )
# s_cpu.apply_slit(w_slit, unit="cm-1")

s_gpu = sf.eq_spectrum_gpu(
    name="GPU",
    Tgas=T,
    pressure=p,
    mole_fraction=x,
    path_length=l,
    backend="gpu-cuda",
)
s_gpu.apply_slit(w_slit, unit="cm-1")
# plot_diff(s_cpu, s_gpu, var="emissivity", wunit="nm", method="diff")
In ``C:\Users\Nicolas Minesi\.radisdb\hitemp`` keep only relevant input files:
CO2-02_02125-02250_HITEMP2010.hdf5
CO2-02_02250-02500_HITEMP2010.hdf5
Selected card (deviceID=0):
[X] 0: Intel(R) UHD Graphics
[ ] 1: NVIDIA RTX A1000 Laptop GPU

Traceback (most recent call last):

  File C:\Anaconda\envs\radis-vulkan2\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\users\nicolas minesi\python\examples\plot_gpu.py:49
    s_gpu = sf.eq_spectrum_gpu(

  File c:\users\nicolas minesi\python\radis\lbl\factory.py:1144 in eq_spectrum_gpu
    gpu_init(

  File c:\users\nicolas minesi\python\radis\gpu\gpu.py:214 in gpu_init
    app.schedule_shader(

  File c:\users\nicolas minesi\python\radis\gpu\vulkan\vulkan_compute_lib.py:84 in schedule_shader
    pipeline, pipelineLayout, computeShaderModule = self.createComputePipeline(

  File c:\users\nicolas minesi\python\radis\gpu\vulkan\vulkan_compute_lib.py:426 in createComputePipeline
    if len(pipelines) == 1:

TypeError: cdata of type 'struct VkPipeline_T *' has no len()

I really like the list of available GPU in the prompt btw. I tried with both GPU and it gave the same output.

@dcmvdbekerom
Copy link
Member Author

It looks like the driver wasn't able to load (one of) the shaders (this is what kernels are called in Vulkan). This can happen because Vulkan can use extensions that may or may not be present on your particular device. In the future we can test for this at runtime, but only once we know what the problem is. Which extensions are present depend both on the hardware capabilities as well as the Vulkan version.

My suspicion is that EXT_scalar_block_layout is the culprit. To test this, we first need to check your capabilities with the Vulkan capability viewer

Then can you report back what is the value of Properties->Core 1.0->apiVersion? And what is the value of Features->Core 1.2->scalarBlockLayout? (this should be True or False). Please check these values for both of your cards (change with "Select device")

@dcmvdbekerom
Copy link
Member Author

Now also works on linux! Unfortunately not on Travis :(

@erwanp
Copy link
Member

erwanp commented Nov 4, 2023

@dcmvdbekerom aren't the test failing just because you aren't running on a Travis-GPU enabled instance ?

It can be activated here:

https://docs.travis-ci.com/user/reference/overview/#gpu-vm-instance-size

@dcmvdbekerom
Copy link
Member Author

dcmvdbekerom commented Nov 5, 2023

Linux has a software (=CPU) renderer called llvmpipe, which is installed automatically with the vulkan driver package mesa-vulkan-driver. The problem was that llvm-pipe is only available for Ubuntu 20.04 and up, whereas Travis CI was running Ubuntu 16.04. After spending a full day trying to compile llvm-pipe for 16.04, I found you can change Travis to 20.04 🙈. A silver lining is that the compiled library vkfft_vulkan, needed to do the GPU FFT, can now be used down to 16.04 as long as it is used with a hardware GPU.

So in short... The tests are now working! Still failing for the moment, but at least they are now testing the actual code. The first GPU test passes, the second (multiple GPU plots) still has issues, likely related to memory leaks.

After moving from Ubuntu 16.04 -> 20.04 for the CI server, Cantera doesn't work anymore. Hopefully adding this back in will fix the issue...
@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2023

Codecov Report

Merging #624 (8704926) into develop (cefd9a2) will increase coverage by 0.48%.
Report is 5 commits behind head on develop.
The diff coverage is 69.84%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #624      +/-   ##
===========================================
+ Coverage    72.98%   73.46%   +0.48%     
===========================================
  Files          148      150       +2     
  Lines        21066    21464     +398     
===========================================
+ Hits         15374    15769     +395     
- Misses        5692     5695       +3     

this is needed when a single valued dataframe is converted to float
It turns out the FFT output buffer needs to be zeroed before use.
Also extended the test to N plots; sometimes the error didn't appear so more plots were needed.
@dcmvdbekerom
Copy link
Member Author

dcmvdbekerom commented Nov 14, 2023

Tests are now passing! This means that the Vulkan driver is working on Linux and tested on the CI server.
As briefly mentioned above, in absence of GPU hardware the Linux Vulkan drivers default to a CPU renderer llvm-pipe, which is used by Travis to run the GPU tests.

There was an issue with some erratic output that was produced about ~50% of the time. This was related to not zeroing the buffers before executing the FFT's. Currently the buffers are zeroed during iteration, which may be excessive -- zeroing during initialization might suffice.

@minouHub
Copy link
Collaborator

I tried to do a GPU computation with exomol. It failed because the self broadening is not imported correctly. @dcmvdbekerom can you add that to the to-do list. I'll take a look if you want me to

@minouHub minouHub modified the milestones: 0.15, 0.16 Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants