Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of Non-Equilibrium parameters(Evib, Erot) #511

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

sagarchotalia
Copy link
Collaborator

@sagarchotalia sagarchotalia commented Aug 17, 2022

This pull request is to address the caching of non-equilibrium parameters, so that they can be retrieved and accessed later.

Currently, I've only implemented generation of a cache file each time the function is called. If the direction I'm proceeding in is correct, I shall add further conditions such as when cache="regen", or when a cache file already exists.

Examples tested:

Spectra calculated with CO and CO2 using the following code

s = SpectrumFactory(
    wavelength_min=3000,
    wavelength_max=4000,
    cutoff=1e-27, # All lines included, may increase calculation time
    pressure=1,
    molecule = "CO2",
    isotope="1,2,3",
    truncation=5,
    neighbour_lines=5,
    path_length=0.1,
    mole_fraction=1e-3,
    medium="vacuum",
    optimization=None, # No optimization strategy used
    chunksize=None, # Initialising chunksize as None for comparison
    wstep="auto",
    verbose=2,
)
s.fetch_databank("hitemp", load_energies=True, load_columns="noneq")
# For the HITEMP error causing some columns to have NaN values:
s.df0.drop(s.df0.index[s.df0['v1u']==-1], inplace=True)
s1 = s.non_eq_spectrum(Tvib=2000, Trot=3000)

Also added a test function test_caching_noneq_params() to make sure the caching implementation is working properly, in radis/test/lbl/test_base.py.

Aims to fix #176

@codecov-commenter
Copy link

codecov-commenter commented Aug 17, 2022

Codecov Report

Merging #511 (e44e614) into develop (c3d9d13) will increase coverage by 0.01%.
The diff coverage is 80.00%.

@@             Coverage Diff             @@
##           develop     #511      +/-   ##
===========================================
+ Coverage    73.36%   73.38%   +0.01%     
===========================================
  Files          138      138              
  Lines        19028    19052      +24     
===========================================
+ Hits         13960    13981      +21     
- Misses        5068     5071       +3     

radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
@sagarchotalia sagarchotalia added this to In progress in GSoC 2022: Performance Tweaks in RADIS via automation Aug 20, 2022
@erwanp
Copy link
Member

erwanp commented Aug 22, 2022

(btw, to update to the latest branch, you shouldn't Merge but Rebase - when possible - (see it as a "Merge without adding a commit number"; i.e. use it to integrate changes that were not yours) . )

radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
GSoC 2022: Performance Tweaks in RADIS automation moved this from In progress to Review in progress Aug 27, 2022
GSoC 2022: Performance Tweaks in RADIS automation moved this from Review in progress to Done Aug 28, 2022
@sagarchotalia sagarchotalia reopened this Aug 28, 2022
GSoC 2022: Performance Tweaks in RADIS automation moved this from Done to In progress Aug 28, 2022
radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
radis/lbl/base.py Outdated Show resolved Hide resolved
"wavenumber_min": self.input.wavenum_min,
"wavenumber_max": self.input.wavenum_max,
"wavenum_min": self.input.wavenum_min,
"wavenum_max": self.input.wavenum_max,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your tests, make sure it also works with wavelength inputs. Normally all inputs are converted to wavenumber so it should work, but better test!

assert s.df0["wav"].min() == existing_file_metadata["wavenumber_min"]
if "wavenumber_max" in existing_file_metadata:
assert s.df0["wav"].max() == existing_file_metadata["wavenumber_max"]
if "wavenum_min" in existing_file_metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test should happen always. If key is not in metadata you should regenerate the cache

@@ -110,7 +110,7 @@ def test_perf_profile_from_factory(*args, **kwargs):
molecule="CO",
verbose=3,
)
sf.load_databank("HITRAN-CO-TEST")
sf.load_databank("HITRAN-CO-TEST", load_energies=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello. Why do you need this absolutely?

I remember that we implemented automatic loading of energies when calling noneq_spectrum if it hadnt been loaded already. Doesnt it work?

def test_caching_noneq_params(verbose=True, plot=True, *args, **kwargs):
"""
Test that the cache file generated during the computation of
non-equilibrium spectra is correct, and has the accurate metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention this pull request in docstring. Very useful for maintenance, in months or years from now

And mention this test in the first message of the pull request

truncation=5,
neighbour_lines=5,
path_length=0.1,
mole_fraction=1e-3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For path length, mole fractions, medium, optimization, chunksize, and verbose, you could use the defaukt values and remove them from here. It will make it easier to underatand what you are truly testing

cache_filename = (
splitext(s.params.dbpath.split(",", 1)[0])[0]
+ "_extra_columns_EvibErot.h5.extra"
)
engine = "pytables"
if exists(cache_filename):
last_modif = getmtime(cache_filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't last_modif be measured after computing the new spectrum, i.e.> l846 ?

s2 = sf.non_eq_spectrum(Tvib=2000, Trot=3000)
# Checking proper regeneration of the cache file
assert exists(cache_filename)
assert getmtime(cache_filename) > last_modif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the conditions are the same. Shouldn't the modification time be the same (i.e. file is not regenerated)

if "isotope" in existing_file_metadata:
assert isotope == existing_file_metadata["isotope"]
else:
os.remove(cache_filename)
if "number_of_lines" in existing_file_metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. Why are you testing this here ??

These validity checks should be in the code itself, not in the test.

Here you should only be testing (and asserting) that if you change isotope, the cache file is properly regenerated (ie new_modif_time > last_modif_time)

Same for wavenumbers, etc.

Comment on lines 903 to 927
assert sf.input.wavenum_max == existing_file_metadata["wavenum_max"]
else:
os.remove(cache_filename)
if "spectroscopic_constant_file" in existing_file_metadata:
assert (
spectroscopic_constant_file
== existing_file_metadata["spectroscopic_constant_file"]
)
else:
os.remove(cache_filename)
if "last_modification" in existing_file_metadata:
assert (
time.ctime(getmtime(spectroscopic_constant_file))
== existing_file_metadata["last_modification"]
)
else:
os.remove(cache_filename)
if "neighbour_lines" in existing_file_metadata:
assert s.params.neighbour_lines == existing_file_metadata["neighbour_lines"]
assert sf.params.neighbour_lines == existing_file_metadata["neighbour_lines"]
else:
os.remove(cache_filename)
if "cutoff" in existing_file_metadata:
assert s.params.cutoff == existing_file_metadata["cutoff"]
assert sf.params.cutoff == existing_file_metadata["cutoff"]
else:
os.remove(cache_filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as comment above. You shouldn't be conditionning the presence of "last_modifcation" etc in the metadata.

It must be there. You should even be asserting that it is there.

Comment on lines +849 to +863
wavelength_min=1500,
wavelength_max=2000,
cutoff=1e-27,
pressure=1,
molecule="CO",
isotope="1,2",
truncation=5,
neighbour_lines=5,
path_length=0.1,
mole_fraction=1e-3,
medium="vacuum",
optimization=None, # No optimization strategy used
chunksize=None, # Initialising chunksize as None for comparison
wstep="auto",
verbose=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to simplify reading, can you add all of this in a dictionary and use the same dictionary many times.

conditions = {
        "wavelength_min":1500,
        "wavelength_max":2000,
#etc, fix for dictionary syntax : 
        cutoff=1e-27,
        pressure=1,
        molecule="CO",
        isotope="1,2",
        truncation=5,
        neighbour_lines=5,
        path_length=0.1,
        mole_fraction=1e-3,
        medium="vacuum",
        optimization=None,  # No optimization strategy used
        chunksize=None,  # Initialising chunksize as None for comparison
        wstep="auto",
        verbose=2,
}

s1 = SpectrumFactory(**conditions)
s2 = SpectrumFactory(**conditions)
# or change one : 
new_conditions = conditions.copy()
new_conditions["wavelength_min=1800"]
s3 = SpectrumFactory(**new_conditions)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, remove all parametesr that are not relevant. pressure, Path_length, mole_fraction, medium, optimization, chunksize, wstep, verbose are not relevant here. Keep the defaults

GSoC 2022: Performance Tweaks in RADIS automation moved this from In progress to Review in progress Sep 12, 2022
@sagarchotalia sagarchotalia moved this from Review in progress to In progress in GSoC 2022: Performance Tweaks in RADIS Sep 20, 2022
@erwanp
Copy link
Member

erwanp commented Oct 3, 2022

Hello @sagarchotalia, I'll release 0.14 this week and I'd like to include your changes. Is all the above ready ?

@sagarchotalia
Copy link
Collaborator Author

Hello @sagarchotalia, I'll release 0.14 this week and I'd like to include your changes. Is all the above ready ?

Hello! I'm really sorry, but the changes aren't ready yet as I have my exams going on. I'll try to finish them ASAP!

@erwanp
Copy link
Member

erwanp commented Oct 5, 2022

No problem. We'll just delay to post 0.14 .
Good luck with your exams!

@erwanp erwanp added this to the 0.15 milestone Oct 5, 2022
@anandxkumar
Copy link
Collaborator

@sagarchotalia any update?

@erwanp
Copy link
Member

erwanp commented Jul 30, 2023

Hello @sagarchotalia this is hanging from last year, any update ?

@minouHub minouHub modified the milestones: 0.15, 0.16 Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Automatically cache noneq params (Evib, Erot, etc.) to line database cache file
6 participants