Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-worked cutoff_error parameter for Automatic linestrength cutoff #646

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

code29563
Copy link

Description

Addresses issue #268 building on PR #451.

I adjusted some of the docstrings and printed messages to be a bit more informative by clarifying what the function does with the input arguments, i.e. that the user-input cutoff is being adjusted to conform to the user-input cutoff_error.

I chose to stick to pandas methods for simplicity rather than using numpy. But is compatibility with vaex dataframes also sought in this issue?

Copy link
Collaborator

@minouHub minouHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me but I would like @erwanp feedback. We are also waiting for Travis to be back for automatic tests

@erwanp
Copy link
Member

erwanp commented Apr 21, 2024

Rebased on latest develop version (with fixed tests), and triggering the test suite to test this PR

@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2024

Codecov Report

Attention: Patch coverage is 68.69480% with 638 lines in your changes are missing coverage. Please review.

Project coverage is 72.95%. Comparing base (c5b130d) to head (7eb1ddd).
Report is 806 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #646      +/-   ##
===========================================
- Coverage    73.16%   72.95%   -0.22%     
===========================================
  Files          137      148      +11     
  Lines        18864    21230    +2366     
===========================================
+ Hits         13802    15488    +1686     
- Misses        5062     5742     +680     

radis/lbl/base.py Outdated Show resolved Hide resolved
@code29563
Copy link
Author

I've tried fixing the code so it currently supports Pandas DataFrames. What would be the best approach for supporting Vaex, as it doesn't seem to have a cumsum() method? Convert to a numpy array then use numpy.cumsum()?

@erwanp
Copy link
Member

erwanp commented Apr 21, 2024

If we convert to Numpy we will loose all the performance benefit of Vaex. We use Vaex when arrays cannot be loaded in RAM.

There is no straightforward solution. You could try to benchmark a few things on the side; for instance creating a function in radis.misc.arrays to find a cumsum treshold on a >1e10 Vaex array. Make sure that the array doesn't fit in memory, to make sure you are dealing with the worst case scenario. Then; it is probably possible to use vaex to sort and do a bining; and use Pandas cumsum() the binned array (which will clearly reduce the memory usage).
This shall be enough to reach less than the user-specified error, without having to load the full array

@code29563
Copy link
Author

I chose 100000 as the limit on length of a df to be processed in memory, which I tested on a dataset with > 3e9 rows, but might be worth testing further.

@minouHub
Copy link
Collaborator

Seems good. Could you provide an example that raises a warning?

@code29563
Copy link
Author

Seems good. Could you provide an example that raises a warning?

For an example that both adjusts the cutoff and raises the LinestrengthCutoffWarning (based on example from docs for eq_spectrum):

from radis import SpectrumFactory
sf = SpectrumFactory(
    wavenum_min=2900,
    wavenum_max=3200,
    molecule="OH",
    wstep='auto',
    cutoff=1e-23
)
sf.misc.warning_linestrength_cutoff = 1e-24
sf.params.cutoff_error = 2
sf.fetch_databank("hitemp")

s1 = sf.eq_spectrum(Tgas=300, path_length=1, pressure=0.1)

or set sf.params.cutoff_error to >=3.44 for an example without adjusting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants