Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mem cache reads (hits) should be fast, but are slowed down by disk cache updates. #40

Open
eschlenz opened this issue Dec 5, 2019 · 4 comments

Comments

@eschlenz
Copy link
Contributor

eschlenz commented Dec 5, 2019

Hi,

First of all, thanks for open sourcing this project. It's really cool, and easy to use. 👍

In using this library I have found something that, I believe, is problematic. My expectation is that a mem cache hit should return very quickly. The data is readily available, and nothing else should hold that up.

I believe, however, that even if data is readily available from memory, the library is still doing disk operations before returning this data. It appears to be fully reading the disk cache entry, and then writing it back out in order to bump it up in the LRU cache. https://github.com/kittinunf/Fuse/blob/master/fuse/src/main/java/com/github/kittinunf/fuse/core/Cache.kt#L66-L75

This reading and writing of the entry drastically slows down what should have been a fast operation. And in a situation (like the one I'm in), is extremely noticeable when performing hundreds of cache reads in a short period of time.

I believe what you are trying to do (bump the disk LRU cache entry up) makes sense. I'm just wondering if there is a way that can be done asynchronously, so as not to hold up returning the data that was available from memory.

Thoughts? I'd be happy to collaborate with you on this.

@kittinunf
Copy link
Owner

kittinunf commented Dec 7, 2019

@eschlenz First of all, thank you so much for such a detailed explanation of the problem!

Yes, I think you are dead right here! The key thing here is that we need to make sure that the entry of DiskCache got bumped correctly so that the disk LRUCache has the right order. And, in my personal use, for the LRUCache, this is quite important to decide whether which entry is to keep which entry is to let go.

However, I think you are right in your use case where the Mem fetches should not be bogged down by the Disk write. I think the time I implemented this, I had no choice but paying the price to make sure that the data is in the correct shape both memCache and diskCache.

In the beginning, I have introduced the asynchronicity into the library even though it kinda worked, I didn't feel it was right. It was full of callbacks and it clutters all of the nice API that want the client. So, I stepped back and gave up.

Of the top of my head right now, I feel like at this point we might need to resort ourselves to provide an extension over the asynchronous library that is widely popular on the JVM, such as Coroutine, RxJava, Reactor to perform nice asynchronicity with a pleasant API surface.

eg.

Fuse-RxJava extension

With this extension, the fetching contract should be this instead;

fun fetch(): Observable<Result<T, Exception>>
val observable = cache.rxGetWithSource("hello", { "world" })

observable.subscribeBy { (data, source) ->
  //This subscriber got a notification right away with data from the memory if there is in memory
  //The second notification should be propagated after the DiskCache is being saved correctly for those who are interested
}

Another idea is to relax this constraint, where fetching entry from memory, do not update the disk entry right away. But, probably keep the transaction of operation in a queue somehow (it could just a queue of a string in the order of being fetch). Then, every now and then (we need to come up with some number like every x number of fetches), we probably want to update the entry in the disk at once, retrospectively.

I am up for your idea if you have one as well.

Btw, thanks for using this library! I am really happy it "kinda" work in your case (with a small caveat). :)

@eschlenz
Copy link
Contributor Author

eschlenz commented Dec 9, 2019

Either of the options you proposed sound good to me. I think I, personally, would lean more towards the second solution you mentioned.

To reiterate what you said, disk writes would be thrown into a queue. At periodic intervals, the disk write operations would be processed. This would most assuredly address the problems I've had, as none of the put or get operations would be blocking. 👍

How often, if ever, would a client of your library need to know that a disk write had completed? I don't believe I would need that information. In my use case, I will just issue the put and trust that the library eventually writes it to disk. If it fails to, it's no big deal, and would just result in a disk cache miss later on. But, if there are common use cases you know of that need to know this information, then your observer pattern would make it possible. The library user could either opt in to listen to the disk write result, or choose not to.

There are some edge cases we would need to consider. If you wait until every N cache puts before processing the queue, there is a chance that the app will be put in the background at N-1, for example. The last entry might not ever get written to disk. It might be better to have a disk writing thread that periodically wakes up to see if there is any work to do. There is still a chance that the app gets killed (maybe by the user) before the thread runs. But that's about as you can do, unless you persist the disk write queue to disk as well.

@eschlenz
Copy link
Contributor Author

eschlenz commented Dec 9, 2019

I just noticed something, and wanted to ask you about it. Refer back to these lines of code: https://github.com/kittinunf/Fuse/blob/master/fuse/src/main/java/com/github/kittinunf/fuse/core/Cache.kt#L71-L76

It reads:

"If we don't have a disk cache entry for this key, then we should write the entry we got from memory to disk with the correct timestamp."

Essentially, this makes sure that the disk cache reflects the same thing as what we have in memory. Right?

But, what if the disk cache entry is not null? Should the disk cache be updated in that case too, so that the timestamp is correct in that scenario?

@kittinunf
Copy link
Owner

But, what if the disk cache entry is not null? Should the disk cache be updated in that case too, so that the timestamp is correct in that scenario?

For this, it should be fine 🤔? If the disk has the entry then, it is supposed to be the same entry with the memory one, right? According to my use-case, I don't find that having one entry in the memory and yet another in the disk has happened (or this is a time bomb yet to explode 💣, haha)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants