Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update slicer docs for indexed-gzip & keep_file_open #1058

Open
ecc521 opened this issue Sep 26, 2021 · 3 comments
Open

Update slicer docs for indexed-gzip & keep_file_open #1058

ecc521 opened this issue Sep 26, 2021 · 3 comments

Comments

@ecc521
Copy link

ecc521 commented Sep 26, 2021

I have some rather large (gzipped) NIFTI files I need to read without first buffering in memory (so reading in slices).

When loading the image via nibabel.load, slicer and dataobj appear to be re-reading from the beginning each time, resulting in quadratic time complexity with the number of slices taken.

Since I'm only interested in proceeding forward through the file, it would seem that the time complexity here should be linear - indeed, linear time complexity can be obtained by updating the code from an old question:

from io import BytesIO
from nibabel import FileHolder, Nifti1Image
from gzip import GzipFile
fh = FileHolder(fileobj=GzipFile(niftipath))
img = Nifti1Image.from_file_map({'header': fh, 'image': fh})

In this case, since GzipFile preserves the current decompression state, proceeding strictly forward in the file works at the expected speed (much faster).

Is there a way to obtain this same slicing performance using the nibabel.load() API (such as by passing a GzipFile, etc)? This would be greatly preferable, as it abstracts away file formats.

@effigies
Copy link
Member

If you install the indexed-gzip package, you should get performance improvements for free.

@ecc521
Copy link
Author

ecc521 commented Sep 26, 2021

Thanks! @effigies
Looking at the indexed-gzip docs, I was able to find the flag - keep_file_open = True
While indexed-gzip alone does help, that's all that is needed for this use case.

Still confused as to why keep_file_open is off by default, but enabling it seems to be a solution.

Unless the defaults or relevant documentation (see slicer section - no mention) needs to be revisited to make keep_file_open/indexed-gzip more visible, I'm good to close this.

@effigies
Copy link
Member

With indexed gzip, you should not need to set keep file open to get almost identical performance.

The reason it's off by default is that, when working with many files, you can exhaust file handle quotas, and the lifetimes of file handles are difficult to reason about.

Definitely good to update the docs.

@ecc521 ecc521 changed the title Quadratic Time Complexity reading gzipped NIFTIs in slices Update slicer docs for indexed-gzip & keep_file_open Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants