Extend the memo feature with optional parameters for expiring entries #1424

DavyLandman · 2020-06-03T14:41:24Z

This PR does 2 things:

Memo entries now expire after 1 hour of not being used
you can configure the memo tag with function specific expire strategies.

import util::Memo; // new module that you have to import to get these parameters

@memo=expireAfter(minutes=10) // expires entries after they haven't been used for 10 minutes
int calc(int y) = y * 2;

@memo=maximumSize(4000) // after the table is more than 4000 entries, we start clearing it to make room (in a random order)
int calc2(int y) = y * 3;

@memo={maximumSize(4000), expireAfter(minutes=10)} // can be combined
int calc3(int y) = y * 4;

The only other reason why memo caches are cleared is due to running out of memory.

Future improvements:

Implement a more LRU like structure so that the maximumSize can also take into account usage of entries
Currently retiring entries happens every 5 seconds, especially with the maximumSize, this can run quite a bit behind, but I have no real idea how to schedule this without causing quite a call overhead.

Since I know at least @joukestoel and @rodinaarssen have talked about this at some point, will you take a look before I merge into master?

@memo

use: ``` @memo{( "expireSeconds": 20 )} // or @memo{( "expireMinutes": 5 )} // these can be combined with @memo{( "expireMinutes": 5, "maxEntries": 10000 )} ``` By default it is 1 hour expiration without a max entries

@memo

``` @memo{ expire( seconds = 10, minutes = 20, // only one of the time based once should be set hours = 2, entries = 4000 ) } ```

jurgenvinju

Very cool and handy. it should help with memory efficiency.

joukestoel

Looking good to me!
Nice added bonus is that Prelude now gets a standard way to call sleep so I don't have to include separate code for that in my own Rascal projects anymore 😬

Why we discussed memoization in the past was for the need for an explicit way to programmatically clear them. I needed this to run some benchmarking of my Rascal code.
I don't see anything in this code that addresses this issue but (like discussed yesterday) I presume the previously added code to clear the caches is still there.

DavyLandman · 2020-06-05T10:01:19Z

Nice added bonus is that Prelude now gets a standard way to call sleep so I don't have to include separate code for that in my own Rascal projects anymore 😬

yeah, I'm not happy about that, but at least it's not a library function ;)

rodinaarssen

Looks good!

DavyLandman · 2020-06-05T14:11:23Z

Thanks for the checks.

I realized how we can use Caffeine for this, it makes the cleanup function a bit simpler, but it does make it slower (it's a great library but it also does a lot that we don't need). What do you think? Less of our own code for a bit more call overhead?

joukestoel · 2020-06-05T14:14:34Z

Hmmm, I would prefer less call overhead in this case since this is code which gets hit very often. So my vote would be to leave it as is at the moment. The more runtime speed we can get, the better imho :)

jurgenvinju · 2020-06-06T07:16:19Z

I agree with Jouke; our call overhead in the interpreter is big enough 😉

…

-- Jurgen J. Vinju http://jurgen.vinju.org

On 5 Jun 2020, 16:14 +0200, Jouke Stoel ***@***.***>, wrote: Hmmm, I would prefer less call overhead in this case since this is code which gets hit very often. So my vote would be to leave it as is at the moment. The more runtime speed we can get, the better imho :) — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or unsubscribe.

joukestoel · 2020-06-09T06:55:05Z

src/org/rascalmpl/util/SecondsTicker.java

- * A monotonic ticker that fast to frequently access and is updated roughly once every second
+ * A monotonic ticker that fast to frequently access and is updated roughly once every second<br/>
+ * <br/>
+ * For more reasons why we want a fast time field, see <a href="http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html">this discussion</a>.


What an incredible thorough and nice blog post! TIL :)

src/org/rascalmpl/library/Prelude.java

ben-manes · 2020-06-09T16:40:59Z

Caffeine has an optional scheduler setting that cleans up automatically based on the next expiration time. This is paced to run at most every second to avoid excessive runs due to being off by nanos / millis. The sorted order is kept by using a timing wheel, which is an amortized O(1) priority queue.

For eviction we wrap the value with list pointers to use the O(1) lru trick. Then max size is a poll of the list head and quick reorderings. However making it concurrent is where all the complexity lies.

The allocation of the LookupKey per call can be removed by using a thread local. Unfortunately that doesn’t play well with gc’ing ClassLoaders so as a general library we can’t make assumptions about our user’s usage.

I’m not sure if you can borrow much of the code, but I hope that offers optimization ideas.

DavyLandman · 2020-06-09T19:17:44Z

Hi @ben-manes,

Always a pleasure to know you'll read along if we mention Caffeine. As you might know, we've replaced most of the caches in our code-base with caffeine ones, even a few instances of String::intern() were replaced by Caffeine 🙇 .

Using Caffeine

I truly want to consider Caffeine instead of our own wrapper around ConcurrentMap and a Cleanup thread, so let me give some more context in case it wasn't clear from the discussion & javadoc:

We need to have value equality on keys, while also storing them in SoftReferences, I was working on a version of LookupKey that was just put into a SoftReference before storing into Caffeine and then overloading that SoftReference::hashCode and SoftReference::equals. (I understand why you designed Caffeine to do reference equality when you get Soft keys). This would also mean I still need to take care of invalidate-ing entries in the cache in some background runner (or every put).
Sized based eviction will be used way less common than LRU.
While the feature in rascal is called memo in reality we do not really care about memoization (as in, atomic putIfAbsent).

Regarding:

but it does make it slower (it's a great library but it also does a lot that we don't need). What do you think? Less of our own code for a bit more call overhead?

I was reading the code, and it seems there is quite some book keeping in BoundedLocalCache around every put, that was the "overhead" that I was discussing. It also calls expirationTicker().read(); for every put, and since time stamp can be an expensive operation, would it be better if we supply our own coarse ticker (I was actually thinking of contributing a coarse variant of the nanoTime to Caffeine if you are interested)?

I was trying to balance this put versus just using the ConcurrentMap that you also end up using. But please elaborate if I'm overlooking something.

Optimization Suggestions

ThreadLocal version of LookupKey sounds like a nice plan, do you know this in reality out performs the short lived objects that are relatively cheap for the GC to collect? (and indeed, there is the overhead of clearing references)
Our current eviction implementation is a bit rought, there is an optimization for the time based one, but once every while, we do a full iteration over the map. For the future we should definitely try to apply the TinyLFU approach.

ben-manes · 2020-06-09T20:26:33Z

Thanks @DavyLandman,

You should definitely use the right tool for the job. Sometimes that's caffeine, but certainly not always. So please only take what I say as an experience report, not an attempt to influence whether to use Caffeine or not.

We need to have value equality on keys, while also storing them in SoftReferences

Yeah, I forget the details of why in Guava we decided to only support identity equality for weak/soft references. I believe it was because MapMaker had an equivalence option that was rarely used, and only used wrong inside of Google's code base. When using identity equality then soft keys is the same as weak keys, since no strong reference makes the entry unretrievable.

I wouldn't be against adding (ben-manes/caffeine#344) but I am unclear on the tradeoffs, and the builder names have to be obvious & succinct (which seems harder than the code changes!).

I was reading the code, and it seems there is quite some book keeping in BoundedLocalCache around every put, that was the "overhead" that I was discussing.

Yes, though minor calls like if-conditions that don't apply so much of it is nanoseconds. Past benchmarks indicated that entry lock contention is the primary overhead, which leads us to have similar throughput as the underlying ConcurrentHashMap. A lock-free hash table of course removes that, but has its own gotchas.

Some of the code complexity is to avoid calling a put when possible since that always locks, so a get is used if the entry is present. A putIfAbsent will eagerly return if possible in the latest release, with prior having forgotten that since it is on computeIfAbsent also and that's the general usage.

It also calls expirationTicker().read(); for every put, and since time stamp can be an expensive operation, would it be better if we supply our own coarse ticker (I was actually thinking of contributing a coarse variant of the nanoTime to Caffeine if you are interested)?

Right, so this is why it's nice to have Ticker be pluggable. I don't think offering a coarse version directly is super valuable as fairly trivial code and could lead to unnecessary usage when developers blindly apply optimizations. However it is reasonable in your code if identified as a bottleneck. The cache will work fine with a custom one like yours and I tried to avoid redundant calls to the ticker to minimize the lookup overhead.

ThreadLocal version of LookupKey sounds like a nice plan, do you know this in reality out performs the short lived objects that are relatively cheap for the GC to collect?

At one point I had observed escape analysis kicking in to avoid the allocation, but it is too flaky to rely on. The idea originated in the linked discussions (ben-manes/caffeine#294) where an amp-agent seemed to run into allocation issues due to its high frequency and requires to be very low overhead. They were able to use my idea of thread local, but also point out why I couldn't apply that trick safely so it has remained backlogged.

A high rate of short allocation will impact the GC, so being garbage free on read is desirable for a cache. Since they use bump pointer allocation, it is fast until threads contend trying to get new chunks and trigger pacing to allow the GC to catch up. Allocations can be observed as a bottleneck in very hot paths where escape analysis doesn't work, e.g. if crossing a thread boundy via a concurrent queue. In this case I don't consider it too concerning, so just take note of possible optimizations if you do observe it later on.

(and indeed, there is the overhead of clearing references)

You probably want to call Refeference.clear() explicitly when discarding the entry, to avoid the cost of making the GC figure it out.

Our current eviction implementation is a bit rough..

Overall I am quite happy with how things turned out in Caffeine, given the constraints of a general purpose library. A targeted design like yours can leverage having more informed decisions and make nice optimizations that a general case cannot. Of course remember to benchmark, since that's almost always forgotten as many times those nice gains unfortunately don't materialize in practice.

@ben-manes

h/t @ben-manes

DavyLandman · 2020-06-10T11:36:54Z

Based on our discussion, I've simplified the code quite a bit, I a single class now for lookup, and either store it inside a SoftReference when putting, or just raw for lookup. Simplified the code quite a bit.

DavyLandman · 2020-06-10T12:24:50Z

Yes, though minor calls like if-conditions that don't apply so much of it is nanoseconds. Past benchmarks indicated that entry lock contention is the primary overhead, which leads us to have similar throughput as the underlying ConcurrentHashMap. A lock-free hash table of course removes that, but has its own gotchas.

Even the postWrite task architecture stuff?

A high rate of short allocation will impact the GC, so being garbage free on read is desirable for a cache. Since they use bump pointer allocation, it is fast until threads contend trying to get new chunks and trigger pacing to allow the GC to catch up. Allocations can be observed as a bottleneck in very hot paths where escape analysis doesn't work, e.g. if crossing a thread boundy via a concurrent queue. In this case I don't consider it too concerning, so just take note of possible optimizations if you do observe it later on.

I'm putting the ThreadLocal in for now, will start some profiling & measurements in the future.

Of course remember to benchmark, since that's almost always forgotten as many times those nice gains unfortunately don't materialize in practice.

When time permits: always :)

DavyLandman added 6 commits June 1, 2020 17:58

Implemented new, configurable version of memo

990d9ae

use: ``` @memo{( "expireSeconds": 20 )} // or @memo{( "expireMinutes": 5 )} // these can be combined with @memo{( "expireMinutes": 5, "maxEntries": 10000 )} ``` By default it is 1 hour expiration without a max entries

Updated test to actually work

ac7210c

Change expire syntax of the memo config to:

66df71a

``` @memo{ expire( seconds = 10, minutes = 20, // only one of the time based once should be set hours = 2, entries = 4000 ) } ```

Now we use a nice adt for configuring the memo feature

5cf44d3

Reduce the amount of cleanups

a3bf54d

Added possibility to combine expiration options

cefdc7e

DavyLandman requested review from joukestoel, rodinaarssen and jurgenvinju June 3, 2020 14:41

jurgenvinju approved these changes Jun 3, 2020

View reviewed changes

Fixed mixed tabs and spaces

0d048e1

joukestoel approved these changes Jun 5, 2020

View reviewed changes

rodinaarssen approved these changes Jun 5, 2020

View reviewed changes

Last comments added to ticker class

7cb2297

joukestoel reviewed Jun 9, 2020

View reviewed changes

ben-manes reviewed Jun 9, 2020

View reviewed changes

src/org/rascalmpl/library/Prelude.java Outdated Show resolved Hide resolved

DavyLandman added 3 commits June 9, 2020 22:54

Simplified code

953260f

h/t @ben-manes

Simplified code based on discussions

2fed9a6

Fixed npe in keyargs

e9ebeb4

Refactoring of class names

0239e48

Refactored to using a thread local for lookup

effaff7

Use result of store

de10cec

DavyLandman merged commit 6dc2ced into master Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend the memo feature with optional parameters for expiring entries #1424

Extend the memo feature with optional parameters for expiring entries #1424

DavyLandman commented Jun 3, 2020

jurgenvinju left a comment

joukestoel left a comment

DavyLandman commented Jun 5, 2020

rodinaarssen left a comment

DavyLandman commented Jun 5, 2020 •

edited

joukestoel commented Jun 5, 2020

jurgenvinju commented Jun 6, 2020 via email

joukestoel Jun 9, 2020

ben-manes commented Jun 9, 2020

DavyLandman commented Jun 9, 2020

ben-manes commented Jun 9, 2020

DavyLandman commented Jun 10, 2020

DavyLandman commented Jun 10, 2020

Extend the memo feature with optional parameters for expiring entries #1424

Extend the memo feature with optional parameters for expiring entries #1424

Conversation

DavyLandman commented Jun 3, 2020

jurgenvinju left a comment

Choose a reason for hiding this comment

joukestoel left a comment

Choose a reason for hiding this comment

DavyLandman commented Jun 5, 2020

rodinaarssen left a comment

Choose a reason for hiding this comment

DavyLandman commented Jun 5, 2020 • edited

joukestoel commented Jun 5, 2020

jurgenvinju commented Jun 6, 2020 via email

joukestoel Jun 9, 2020

Choose a reason for hiding this comment

ben-manes commented Jun 9, 2020

DavyLandman commented Jun 9, 2020

Using Caffeine

Optimization Suggestions

ben-manes commented Jun 9, 2020

DavyLandman commented Jun 10, 2020

DavyLandman commented Jun 10, 2020

DavyLandman commented Jun 5, 2020 •

edited