Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide external file (mmaped) representation #108

Open
permeakra opened this issue Dec 8, 2020 · 3 comments
Open

Provide external file (mmaped) representation #108

permeakra opened this issue Dec 8, 2020 · 3 comments

Comments

@permeakra
Copy link

I have two use cases in mind. The first is (limited) persistence, allowing access to raw data. The second is working with datasets, exceeding memory size by several orders of magnitude. In both cases it might be desirable to allow several massives in a single file. Probably, interaction with madvise could be of use.

@lehins
Copy link
Owner

lehins commented Dec 8, 2020

You can already do it with a bit of wrapper code. There is a mmap package that allows you to get ahold of a ForeignPtr to mmaped file with something liek mmapFileForeignPtr in: https://hackage.haskell.org/package/mmap-0.5.9/docs/System-IO-MMap.html I haven't tried this package myself, but it if it worked 7 years ago, I don't see why it shouldn't work now ;)

There is no need for a special representation, once you have a ForeignPtr in hand you can wrap into a S massiv array with something like unsafeMArrayFromForeignPtr0

There is no point in providing this sort of functionality in massiv directly because mmap is very much OS specific and I wanna stay OS agnostic as much as possible. However I might consider a helper package that does just this massiv-mmap or something.

Let me know how it goes if you do figure it out or hit me up on gitter if you do get stuck https://gitter.im/haskell-massiv/Lobby

I'll keep this ticket opened in case I find time to experiment with it and create such a package in a future.

@permeakra
Copy link
Author

permeakra commented Dec 8, 2020

Thanks for reply

There is no point in providing this sort of functionality in massiv directly because mmap is very much OS specific and I wanna stay OS agnostic as much as possible. However I might consider a helper package that does just this massiv-mmap or something.

If you expect massiv to be used in numerics code (Personally, I consider it as my best bet for the project I'm currently planning), you should keep in mind that such code often has to deal with data sets exceeding available RAM by orders of magnitude. If we construct a massiv representing such data the way you described, it would have different cost model of various access patterns that are different from purely in-memory massives. In addition, specialized prefetch calls are available for mmaped files. Given than, it makes sense to have specialized algorithms for mmaped representation.

@lehins
Copy link
Owner

lehins commented Dec 8, 2020

If you expect massiv to be used in numerics code, you should keep in mind that such code often has to deal with data sets exceeding available RAM by orders of magnitude.

@permeakra I didn't say this functionality isn't useful. I said that it should not be implemented in massiv package. It should instead be done in a separate package that integrates with massiv interface. The difference is subtle, but very important. I am all for making massiv able handle huge data. It is however not yet on my priority list.

If we construct a massiv representing such data the way you described, it would have different cost model of various access patterns that are different from purely in-memory massives. In addition, specialized prefetch calls are available for mmaped files. Given than, it makes sense to have specialized algorithms for mmaped representation.

It makes sense to have a new representation to account for different usage patterns, I certainly agree with that, but one way or another it will have to be a representation that is a wrapper around ForeignPtr, so the approach I described is required step in implementing this.

This is how I would implement this representation: newtype instance Array MM ix e = MMapArray (Array S ix e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants