-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataSet implementation for groups of raw files #1224
base: master
Are you sure you want to change the base?
Conversation
/azp run libertem.libertem-data |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1224 +/- ##
==========================================
+ Coverage 72.56% 72.69% +0.12%
==========================================
Files 285 286 +1
Lines 15270 15360 +90
Branches 2521 2537 +16
==========================================
+ Hits 11081 11166 +85
- Misses 3783 3786 +3
- Partials 406 408 +2
Continue to review full report at Codecov.
|
Data pipeline tests passed on linux 36=>39 before the push to add test skipping on Mac OSX. |
/azp run libertem.libertem-data |
Azure Pipelines successfully started running 1 pipeline(s). |
@matbryan52 should we try to get this into LiberTEM 0.11? |
Definitely, the functionality for reading the files is already there in the normal RawDataSet, we just have to agree on the API design for taking multiple files as argument. |
Adds an extension of
RawDataSet
which can handle groups of files and files with frame headers/footers. Responds to #1204 .Currently the implementation is not merged into
RawDataSet
or the usual LiberTEMctx.load
or web GUI endpoints. In a future release it would be possible to design a single class which handles both single-file and multi-file reading (in fact the implementation in this PR already handles single files), but it would entail a change to the RawDataSet API/interface (specifically thepath
argument which might change to plural). For now this class should be considered undocumented / testing-only until the interface is stabilized.One limitation of the system is that when using
MMapBackend
it is impossible to have a frame header or footer which is not a multiple ofdtype.itemsize
. In this case the dataset raises aDataSetException
and tells the user to change the backend.This implementation also contains one or two optimizations specific to handling groups of files which might be useful elsewhere. Specifically:
check_valid
has been modified to only check 256 files at a time, this prevents an OSError due to too many open filesexecutor.map
on chunks of files). This prevents a big slowdown encountered when the class callsos.stat
on thousands of files, particularly over a network filesystem.Closes #206 (Dieter)
Contributor Checklist:
Reviewer Checklist:
/azp run libertem.libertem-data
passed