-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃殌 Feature: write a Numba parser for HITEMP Files #510
Comments
Actually we have have unexpected perf improvement already with numpy 1.23 according to numpy/numpy#13319 (comment) Unfortunately Radis requires Numpy<1.22.3 for the moment : #490 |
Hi @erwanp, I'm happy to give this one a go. Before I get too into the weeds rewriting the Per your last post in this thread, before I attempt to implement the pure python solution, should I wait to get confirmation back that the Vaex dependency is sorted so we can test the native NumPy solution before reworking the function? |
Parsing with custom c code (including SIMD intrinsics) should be extremely fast. Also: we could process one dataset while downloading the next (maybe we already do this?) |
To what extent is this still an issue, and have there been any updates that change the right way to approach it? For hitran, has
The vast majority of time spent on For hitemp, after testing |
I have worked on a C++/SIMD implementation which was very fast, I will post more details over the weekend |
@dcmvdbekerom any news on that? Is the issue still open? |
Here's a simd C++ code that reads super fast; download became the bottleneck after that:
Here are some download scripts: I will try to tidy up a little in a bit |
馃敄 Feature description
Parsing HITRAN/HITEMP file is currently very slow.
#505 improved it a bit for HITRAN files, using the HAPI implementation which is faster than RADIS's initial implementation.
However, HITEMP is unchanged, and remains a large bottleneck.
We use
np.fromfile
which is very slow.A detailed study was done few months ago : https://stackoverflow.com/questions/71411907/dramatic-drop-in-numpy-fromfile-performance-when-switching-from-python-2-to-pyth/71505529#71505529
Turns out that a pure-Python implementation of the parser, accelerated by Numba, could be 1,000 faster than the
np.fromfile
approachNote : the
np.fromfile
inefficiency is maximal when reading small chunks of data; with the large HITEMP database we won't get a x1000 speed-up, but it may be faster still !Implementation
Re-write the _read_hitran_file() function with a pure Python, then jit-Numba it.
馃憠 Why you want this feature!!
We currently spend ~3hrs parsing HITEMP CO2 & H2O files (!)
Could really help to speed this up.
The text was updated successfully, but these errors were encountered: