Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store HITRAN isotopes separately #513

Open
erwanp opened this issue Aug 19, 2022 · 0 comments
Open

Store HITRAN isotopes separately #513

erwanp opened this issue Aug 19, 2022 · 0 comments
Labels
refactor requires changes in code architecture todo in the short-term roadmap
Milestone

Comments

@erwanp
Copy link
Member

erwanp commented Aug 19, 2022

馃挱 Description

Some species like CO2, CO, H2O have many isotopes but most of our calculations use only the first ones (1,2,3).

Current implementation downloads all isotopes and stores them in a same local database file, even if only the first isotopes are required.
It creates long download time; for instance in the ReadTheDocs example : https://radis.readthedocs.io/en/latest/auto_examples/plot_line_survey.html (47s to download & parse all 9 CO2 isotopes although only the first one is needed)

Implementation

We could switch to an implementation where all isotopes are stored separately.

  • file names would be CO_isoX.hdf5 instead of CO.hdf5.
  • few changes required in the registered database (~/radis.json) : It's easy to implement since our database management system allows for wildcards, i.e. CO_iso*.hdf5. The database name would still be a unique HITRAN_CO in this example
  • we already handle databases composed of many files, such as HITEMP CO2 and HITEMP H2O. Should be easy to adapt to HITRAN.

One thing to fix :

  • when computing with isotopes='all', we currently simply load the full database. If isotopes are stored in different files, how do you know if they've just not been downloaded yet (because they were never required), or if they do not exist? In the Download_hitran() script, we currently fetch the HITRAN database until it fails. We shouldn't fetch the server for each calculation. Therefore the list of all available HITRAN isotopes should be stored in RADIS, hardcoded, and a test should be set up to compare the hardcoded list to the latest HITRAN list (by fetching the website).

Performance / impact

In terms of database loading performance:

  • I expect it will be exactly the same for Vaex (which handles many files the same way as a unique one) . It might become a bit slower; though; if re-sorting all the isotope databases by wavenumber requires time (should be checked by comparing loading, let's say, the 9 CO2 isotopes in 1 file or 9 ).
  • slightly slower for Pytables if needing all the isotopes (because combining the different databases will take 2x memory, and takime) , but maybe faster if needing only a few out of many isotopes.
    Anyway, Vaex is the future so it's ok if we suffer a minor performance drop with Pytables.

The user experience to run the CO test spectrum on a fresh RADIS installation will ~40% faster, since the 1st run currently downloads all 6 CO isotopes instead of the 3 required

from radis import fetch_hitran
df = fetch_hitran("CO")
len(df)
>>> 5381
len(df.query("iso==1 | iso==2 | iso==3"))
>>> 3306           # so only 60% of the lines in the first 3 isotopes 

It may also be easier to handle cached rovibrational energies of different isotopes separately #176 @sagarchotalia (typically, spectroscopic constants are only implemented for the first few isotopes )

@erwanp erwanp added refactor requires changes in code architecture todo in the short-term roadmap labels Aug 19, 2022
@anandxkumar anandxkumar added this to the 0.15 milestone Oct 28, 2022
@erwanp erwanp modified the milestones: 0.15, 0.16 Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor requires changes in code architecture todo in the short-term roadmap
Projects
None yet
Development

No branches or pull requests

2 participants