Skip to content

Commit

Permalink
Merge pull request #514 from dcmvdbekerom/fix_GPU
Browse files Browse the repository at this point in the history
Fix an indexing error in experimental-GPU implementation. 
Improve how spectrum arrays are rescaled
  • Loading branch information
erwanp committed Aug 28, 2022
2 parents f3b0d4f + deef09c commit 23b2533
Show file tree
Hide file tree
Showing 14 changed files with 424 additions and 161 deletions.
64 changes: 52 additions & 12 deletions docs/lbl/gpu.rst
Expand Up @@ -6,29 +6,42 @@ RADIS-GPU Spectrum Calculation

RADIS provides GPU acceleration to massively speedup spectral computations.
The GPU code is written in CUDA-C++, which is compiled at runtime through the python package cupy.
Because the CUDA code is compiled at runtime, a CUDA compiler such as NVRTC or NVCC must be installed before running the code. The easiest way to make sure a CUDA compiler is installed is by installing the Nvidia GPU computing toolkit, see: https://developer.nvidia.com/cuda-downloads.
Because the CUDA code is compiled at runtime, a CUDA compiler such as NVRTC or NVCC must
be installed before running the code. The easiest way to make sure a CUDA compiler is
installed is by installing the Nvidia GPU computing toolkit, see: https://developer.nvidia.com/cuda-downloads.
Currently only Nvidia GPU's are supported by RADIS, and this is unlikely to change in the foreseeable future.

The GPU code requires some initialization functions that are run on the CPU (

The GPU computations are so fast that in almost all cases the total computation time is limited by the CPU-to-GPU (also called host-to-device) data transfer. As a result, computing consecutive spectra is extremely fast.
The GPU computations are so fast that in almost all cases the total computation time is
limited by the CPU-to-GPU (also called host-to-device) data transfer. As a result, computing
consecutive spectra is extremely fast.

RADIS implements two functions that expose GPU functionality:

:py:func:`~radis.lbl.factory.eq_spectrum_gpu()`
:py:func:`~radis.lbl.factory.eq_spectrum_gpu_interactive()`
- :py:func:`~radis.lbl.factory.SpectrumFactory.eq_spectrum_gpu` computes a single spectrum and returns.

:py:func:`~eq_spectrum_gpu()` computes a single spectrum and returns. :py:func:`~eq_spectrum_gpu_interactive()` computes a single spectrum and allows to interactively adjust parameters such as temperature, pressure, and composition, updating the spectrum in real time. Because the database only has to be transferred once, the updated spectra are calculated extremely fast.
- :py:func:`~radis.lbl.factory.SpectrumFactory.eq_spectrum_gpu_interactive` computes a single
spectrum and allows to interactively adjust parameters such as temperature, pressure, and
composition, updating the spectrum in real time. Because the database only has to be transferred
once, the updated spectra are calculated extremely fast.

By default both functions will be compiled an ran on a GPU if available. The CUDA code is written "architecture-agnosticly" which means it can be compiled for either GPU or CPU. It is therefore possible to use the same GPU functions without an actual GPU by passing the keyword ``emulate=True``, which forces use of the CPU targeted compiled code. This feature is mostly for developers to check for errors in the CUDA code, but it can be used for interactive plotting on the CPU for small spectra.
By default both functions will be compiled an ran on a GPU if available. The CUDA code
is written "architecture-agnosticly" which means it can be compiled for either GPU or CPU.
It is therefore possible to use the same GPU functions without an actual GPU by passing the
keyword ``emulate=True``, which forces use of the CPU targeted compiled code. This feature is
mostly for developers to check for errors in the CUDA code, but it can be used for interactive
plotting on the CPU for small spectra.

GPU computation is currently only supported for equilibrium spectra. It is likely that non-equilibrium spectra will be supported at some point in the future.
GPU computation is currently only supported for equilibrium spectra. It is likely that
non-equilibrium spectra will be supported at some point in the future.


Single Spectrum
---------------

As mentioned above, the function :py:func:`~radis.lbl.factory.eq_spectrum_gpu()` produces a single equilibrium spectrum using GPU acceleration. Below is a usage example:
As mentioned above, the function :py:func:`~radis.lbl.factory.SpectrumFactory.eq_spectrum_gpu`
produces a single equilibrium spectrum using GPU acceleration. Below is a usage example::

sf = SpectrumFactory(
2100,
Expand All @@ -52,10 +65,19 @@ As mentioned above, the function :py:func:`~radis.lbl.factory.eq_spectrum_gpu()`
s.plot("radiance", show=True)


.. minigallery:: radis.lbl.SpectrumFactory.eq_spectrum_gpu
:add-heading:


Interactive Spectrum
--------------------

Computing the first GPU spectrum in a session takes a comparatively long time because the entire line database must be transferred to the GPU. The real power of GPU acceleration becomes evident when computation times are not limited by data-transfer, i.e., when multiple consecutive spectra are synthesized. One obvious use case would be the fitting of a spectrum. Another one is interactive plotting, which can be done by calling :py:func:`~radis.lbl.factory.eq_spectrum_gpu_interactive()`. A usage example is shown below:
Computing the first GPU spectrum in a session takes a comparatively long time because the
entire line database must be transferred to the GPU. The real power of GPU acceleration
becomes evident when computation times are not limited by data-transfer, i.e., when multiple
consecutive spectra are synthesized. One obvious use case would be the fitting of a spectrum.
Another one is interactive plotting, which can be done by calling
:py:func:`~radis.lbl.factory.eq_spectrum_gpu_interactive()`. A usage example is shown below::

from radis.tools.plot_tools import ParamRange
sf = SpectrumFactory(
Expand All @@ -79,11 +101,29 @@ Computing the first GPU spectrum in a session takes a comparatively long time be
plotkwargs={"nfig": "same", "wunit": "nm"},
)

Note that `eq_spectrum_gpu_interactive()` takes the place of all `eq_spectrum_gpu()`, `s.apply_slit()`, and `s.plot()` seen in the earlier example, and for this reason the syntax is a little bit different. For example, we directly pass the `var` keyword to `eq_spectrum_gpu_interactive()` to specify which spectrum should be plotted, and keyword arguments to `s.plot()` are passed through `plotkwargs`.

Quantities that are to be varied must be initialized by a :py:func:`~radis.tools.plot_tools.ParamRange(valmin, valmax, valinit)` object, which takes the minimum value, maximum value, and init values of the scan range. Each `ParamRange()` object will spawn a slider widget in the plot window with which the parameter can be interactively adjusted. The algorithm is extremely fast for a large number of lines (>100M) and will update with very low latency (<200ms typically). The code is not currently optimized for large wavenumber ranges (>500cm-1) however, which may take a bit longer (up to a couple seconds), provided the GPU didn't run out of memory.
.. minigallery:: radis.lbl.SpectrumFactory.eq_spectrum_gpu_interactive
:add-heading:


Note that `eq_spectrum_gpu_interactive()` takes the place of all `eq_spectrum_gpu()`,
`s.apply_slit()`, and `s.plot()` seen in the earlier example, and for this reason the
syntax is a little bit different. For example, we directly pass the `var` keyword to
`eq_spectrum_gpu_interactive()` to specify which spectrum should be plotted, and keyword arguments to `s.plot()`
are passed through `plotkwargs`.

Quantities that are to be varied must be initialized by a
:py:func:`~radis.tools.plot_tools.ParamRange` (valmin, valmax, valinit) object, which
takes the minimum value, maximum value, and init values of the scan range. Each `ParamRange()`
object will spawn a slider widget in the plot window with which the parameter can be
interactively adjusted. The algorithm is extremely fast for a large number of lines (>100M)
and will update with very low latency (<200ms typically). The code is not currently optimized
for large wavenumber ranges (>500cm-1) however, which may take a bit longer (up to a couple seconds),
provided the GPU didn't run out of memory.

At this moment the application of the instrumental function is done on the GPU and is limited to a Gaussian function. This will almost certainly be updated in the future to include other popular instrumental functions, including custom ones.
At this moment the application of the instrumental function is done on the GPU and is limited
to a Gaussian function. This will almost certainly be updated in the future to include other
popular instrumental functions, including custom ones.



4 changes: 4 additions & 0 deletions examples/plot_gpu.py
Expand Up @@ -6,6 +6,10 @@
Example using GPU calculation with :py:meth:`~radis.lbl.SpectrumFactory.eq_spectrum_gpu`
This method requires CUDA compatible hardware to execute.
For more information on how to setup your system to run GPU-accelerated methods
using CUDA and Cython, check :ref:`GPU Spectrum Calculation on RADIS <label_radis_gpu>`
.. note::
in the example below, the GPU code runs on CPU, using the parameter ``emulate=True``.
Expand Down
27 changes: 16 additions & 11 deletions examples/plot_gpu_widgets.py
Expand Up @@ -8,6 +8,10 @@
Example using GPU sliders and GPU calculation with :py:meth:`~radis.lbl.SpectrumFactory.eq_spectrum_gpu`
This method requires CUDA compatible hardware to execute.
For more information on how to setup your system to run GPU-accelerated methods
using CUDA and Cython, check :ref:`GPU Spectrum Calculation on RADIS <label_radis_gpu>`
.. note::
in the example below, the GPU code runs on CPU, using the parameter ``emulate=True``.
Expand All @@ -17,18 +21,18 @@
"""

from radis import SpectrumFactory
from radis.test.utils import getTestFile
from radis.tools.database import load_spec
from radis.tools.plot_tools import ParamRange

# This spectrum is significantly absorbed by atmospheric CO2
# so it will never match the synthetic spectrum.
# TODO: find different spectrum for this example.

my_file = getTestFile("CO2_measured_spectrum_4-5um.spec") # for the example here
s_exp = load_spec(my_file)
# from radis.test.utils import getTestFile
from radis.tools.plot_tools import ParamRange

s_exp.crop(4120, 4790).plot(Iunit="mW/cm2/sr/nm")
## Add an experimental file :
# my_file = getTestFile("CO2_measured_spectrum_4-5um.spec") # for the example here
# s_exp = load_spec(my_file)
# s_exp.crop(4120, 4790).plot(Iunit="mW/cm2/sr/nm")
#
## This spectrum is significantly absorbed by atmospheric CO2
## so it will never match the synthetic spectrum.
## TODO: find different spectrum for this example.


sf = SpectrumFactory(
Expand All @@ -49,6 +53,7 @@
path_length=ParamRange(0, 1, 0.2), # cm
slit_FWHM=ParamRange(0, 1.5, 0.24), # cm-1
emulate=True, # if True, runs CPU code on GPU. Set to False or remove to run on the GPU
plotkwargs={"nfig": "same", "wunit": "nm"},
plotkwargs={"wunit": "nm"}, # "nfig": "same",
)

print(s)
11 changes: 10 additions & 1 deletion radis/lbl/base.py
Expand Up @@ -2010,7 +2010,16 @@ def calc_S0(self):

self.profiler.start("scaled_S0", 2, "... Scaling equilibrium linestrength")

gp = df0["gp"]
try:
gp = df0["gp"]
except (KeyError):

if not "gu" in df0:
if not "ju" in df0:
self._add_ju(df0)
self._calc_degeneracies(df0)
gp = df0["gu"]

A = df0["A"]
wav = df0["wav"]
Ia = self.get_lines_abundance(df0)
Expand Down
70 changes: 29 additions & 41 deletions radis/lbl/factory.py
Expand Up @@ -936,7 +936,7 @@ def eq_spectrum_gpu(
.. note::
This method requires CUDA compatible hardware to execute.
For more information on how to setup your system to run GPU-accelerated methods
using CUDA and Cython, check `GPU Spectrum Calculation on RADIS <https://radis.readthedocs.io/en/latest/lbl/gpu.html>`
using CUDA and Cython, check :ref:`GPU Spectrum Calculation on RADIS <label_radis_gpu>`
Parameters
----------
Expand Down Expand Up @@ -1060,6 +1060,7 @@ def eq_spectrum_gpu(
molarmass_arr = np.empty_like(
iso_list, dtype=np.float32
) # molar mass of each isotope

Q_interp_list = []
for iso in iso_list:
if iso in iso_set:
Expand All @@ -1084,8 +1085,13 @@ def eq_spectrum_gpu(

# load the data
df = self.df0
iso = df["iso"].to_numpy(dtype=np.uint8)
v0 = df["wav"].to_numpy(dtype=np.float32)

if len(iso_set) > 1:
iso = df["iso"].to_numpy(dtype=np.uint8)
elif len(iso_set) == 1:
iso = np.full(len(v0), iso_set[0], dtype=np.uint8)

da = df["Pshft"].to_numpy(dtype=np.float32)
El = df["El"].to_numpy(dtype=np.float32)
na = df["Tdpair"].to_numpy(dtype=np.float32)
Expand Down Expand Up @@ -1341,6 +1347,13 @@ def eq_spectrum_gpu_interactive(
fig = line.figure

def update_plot(val):
# TODO : refactor this function and the update() mechanism. Ensure conditions are correct.
# Update conditions
# ... at equilibirum, temperatures remain equal :
s.conditions["Tvib"] = s.conditions["Tgas"]
s.conditions["Trot"] = s.conditions["Tgas"]
s.conditions["slit_function"] = s.conditions["slit_FWHM"]

abscoeff, transmittance, iter_params = gpu_iterate(
s.conditions["pressure"],
s.conditions["Tgas"],
Expand All @@ -1351,46 +1364,21 @@ def update_plot(val):
gpu=(not s.conditions["emulate_gpu"]),
)

# s._q["abscoeff"] = abscoeff
# s.update(var) # adds required spectral array
# _, new_y = s.get(var) # can also use wunits, Iunits, etc.

# TODO: the following should obviously be a self contained function..

if var == "abscoeff":
new_y = abscoeff
elif var == "transmittance":
new_y = transmittance
elif var in ["emissivity", "radiance"]:
emissivity = 1 - transmittance
if var == "emissivity":
new_y = emissivity
else: # var = 'radiance':
new_y = calc_radiance(
s.get_wavenumber(),
emissivity,
s.conditions["Tgas"],
unit=self.units["radiance"],
)
else:
absorbance = abscoeff * s.conditions["path_length"]
if var == "absorbance":
new_y = absorbance
# This happen inside a Spectrum() method
for k in list(s._q.keys()): # reset all quantities
if k in ["wavespace", "wavelength", "wavenumber"]:
pass
elif k == "abscoeff":
s._q["abscoeff"] = abscoeff
else:
transmittance_noslit = exp(-absorbance)
if var == "transmittance_noslit":
new_y = transmittance_noslit
else:
emissivity_noslit = 1 - transmittance_noslit
if var == "emissivity_noslit":
new_y = emissivity_noslit
else:
new_y = calc_radiance(
s.get_wavenumber(),
emissivity_noslit,
s.conditions["Tgas"],
unit=self.units["radiance_noslit"],
)
del s._q[k]

_, new_y = s.get(
var,
copy=False, # copy = False saves some time & memory, it's a pointer/reference to the real data, which is fine here as data is just plotted
wunit=plotkwargs.get("wunit", "default"),
Iunit=plotkwargs.get("Iunit", "default"),
)

line.set_ydata(new_y)
fig.canvas.draw_idle()
Expand Down
22 changes: 11 additions & 11 deletions radis/lbl/gpu.cpp
Expand Up @@ -138,9 +138,9 @@ __global__ void fillLDM(
// ... scale linestrengths under equilibrium
float Si = iter_d.N * S0[i] * (expf(iter_d.c2T * El[i]) - expf(iter_d.c2T * (El[i] + v0[i]))) / iter_d.Q[iso[i]];

float avi = ki - k0i;
float aGi = li - l0i;
float aLi = mi - m0i;
float avi = ki - (float)k0i;
float aGi = li - (float)l0i;
float aLi = mi - (float)m0i;

float aV00i = (1 - aGi) * (1 - aLi);
float aV01i = (1 - aGi) * aLi;
Expand All @@ -150,14 +150,14 @@ __global__ void fillLDM(
float Sv0i = Si * (1 - avi);
float Sv1i = Si * avi;

agnostic_add(&S_klm[m0i + l0i * N_L + k0i * N_G * N_L], aV00i * Sv0i);
agnostic_add(&S_klm[m0i + l0i * N_L + k1i * N_G * N_L], aV00i * Sv1i);
agnostic_add(&S_klm[m0i + l1i * N_L + k0i * N_G * N_L], aV01i * Sv0i);
agnostic_add(&S_klm[m0i + l1i * N_L + k1i * N_G * N_L], aV01i * Sv1i);
agnostic_add(&S_klm[m1i + l0i * N_L + k0i * N_G * N_L], aV10i * Sv0i);
agnostic_add(&S_klm[m1i + l0i * N_L + k1i * N_G * N_L], aV10i * Sv1i);
agnostic_add(&S_klm[m1i + l1i * N_L + k0i * N_G * N_L], aV11i * Sv0i);
agnostic_add(&S_klm[m1i + l1i * N_L + k1i * N_G * N_L], aV11i * Sv1i);
agnostic_add(&S_klm[k0i * N_G * N_L + l0i * N_L + m0i], Sv0i * aV00i);
agnostic_add(&S_klm[k0i * N_G * N_L + l0i * N_L + m1i], Sv0i * aV01i);
agnostic_add(&S_klm[k0i * N_G * N_L + l1i * N_L + m0i], Sv0i * aV10i);
agnostic_add(&S_klm[k0i * N_G * N_L + l1i * N_L + m1i], Sv0i * aV11i);
agnostic_add(&S_klm[k1i * N_G * N_L + l0i * N_L + m0i], Sv1i * aV00i);
agnostic_add(&S_klm[k1i * N_G * N_L + l0i * N_L + m1i], Sv1i * aV01i);
agnostic_add(&S_klm[k1i * N_G * N_L + l1i * N_L + m0i], Sv1i * aV10i);
agnostic_add(&S_klm[k1i * N_G * N_L + l1i * N_L + m1i], Sv1i * aV11i);
}
}
}
Expand Down
2 changes: 0 additions & 2 deletions radis/misc/arrays.py
Expand Up @@ -516,8 +516,6 @@ def numpy_add_at(LDM, k, l, m, I):
# we may consider escaping this.
else:
add_at = rcx.add_at
"""Wrapper to :py:func`radis.misc.arrays.numpy_add_at` or to the Radis
Cython version (here, the Cython version was loaded)"""


# @numba.njit
Expand Down
6 changes: 3 additions & 3 deletions radis/misc/profiler.py
Expand Up @@ -18,7 +18,7 @@
"""

from collections import OrderedDict
from time import time
from time import perf_counter

from radis.misc.printer import printg

Expand Down Expand Up @@ -62,7 +62,7 @@ def add_entry(self, dictionary, key, verbose, count):
def start(self, key, verbose_level, optional=""):
if __debug__:
self.initial[key] = {
"start_time": time(),
"start_time": perf_counter(),
"verbose_level": verbose_level,
}

Expand Down Expand Up @@ -109,7 +109,7 @@ def add_time(self, dictionary, key, verbose, count, time_calculated):
def stop(self, key, details):
if __debug__:
items = self.initial.pop(key)
time_calculated = time() - items["start_time"]
time_calculated = perf_counter() - items["start_time"]

if items["verbose_level"] == 1:
# Profiler ends; Deserializing to Dictionary format
Expand Down
6 changes: 6 additions & 0 deletions radis/spectrum/equations.py
Expand Up @@ -60,3 +60,9 @@ def abscoeff2xsection(abscoeff_cm1, Tgas_K, pressure_Pa, mole_fraction):
"""

return abscoeff_cm1 * (k_b * Tgas_K / mole_fraction / pressure_Pa) * 1e6 # cm2


if __name__ == "__main__":
from radis.test.spectrum.test_equations import test_equations

test_equations()

0 comments on commit 23b2533

Please sign in to comment.