Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use atomics for metrics #91

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

use atomics for metrics #91

wants to merge 2 commits into from

Conversation

aathan
Copy link

@aathan aathan commented Nov 24, 2022

AFAIK metricsMu is not really performing any useful function over and above the likely more efficient atomic integer operations available on most modern CPUs. In the Metric() return function, the metricsMu served to ensure all the counters are returned as of an instant in time when no other counter is incremented relative to other counters, but since the write locks are only held for the duration of an increment of a single element of the structure, even that has no utility (i.e., there are never increments of two counters at once while holding the lock across both increments, which is the only thing the read lock would protect). Finally, the atomic loads probably collapse to simple reads on many CPUs, with the atomic load possibly only forcing cpu cache coherency (i.e., you can probably just write Metrics(){return c.metrics} and be just fine).

@swithek
Copy link
Contributor

swithek commented Nov 26, 2022

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

@gaydin
Copy link

gaydin commented Nov 27, 2022

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

func (c *Cache[K, V]) Metrics() (m Metrics) {
// I imagine on most architectures this is equivalent to
// simply writing: return c.metrics
m.Insertions = atomic.LoadUint64(&c.metrics.Insertions)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this would be more expressive, but it's just question of taste, ofc:

return Metrics{
	Insertions: atomic.LoadUint64(&c.metrics.Insertions),
	Hits:       atomic.LoadUint64(&c.metrics.Hits),
	Misses:     atomic.LoadUint64(&c.metrics.Misses),
	Evictions:  atomic.LoadUint64(&c.metrics.Evictions),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my suggestion as well.


return c.metrics
func (c *Cache[K, V]) Metrics() (m Metrics) {
// I imagine on most architectures this is equivalent to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what do you mean by this comment? Afaik it's not the best idea to do just return c.metrics on any platforms.

Copy link
Author

@aathan aathan Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted this PR almost 2 months ago and no longer remember the details of the code. What I vaguely remember is that it seemed like there may be unnecessary atomic operations / locks in various places. Granted, some architectures may require atomic loads where others don't. However, I don't have time to re-read all of the code again now. I'm guessing that what I was indicating in that comment is that the atomic loads are of no value if they're already guarded by a read lock -- if they are, then returning a struct by value (plus or minus any golang quirks) is likely pretty equivalent to creating a new struct and copying the values over explicitly in your code.

The main thing is that unless you're worried about partial reads of partially written words, there's not really much point in atomic reads of individual values in the structs; and separately, if you're reading the individual values atomically, there's not much point to locking the struct to read it since those values are updated independently of each other. That is, there isn't case where value A and value B inside the struct must be updated atomically together. Therefore nobody should care if you read A then read B and an update happened in between.

Anyway, all of the above caveat emptor because it's been too long for me to remember the code. I just vaguely remember that there may be some thread safety overkill going on.

@biosvs
Copy link

biosvs commented Jan 19, 2023

I also noticed this potential improvement, glad to see opened PR for this (:

@swithek do you have any concerns regarding this improvement or regarding this particular PR?

@swithek
Copy link
Contributor

swithek commented Jan 19, 2023

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

Having seen the benchmark results, I'm not entirely convinced this would be a good change. It doesn't look like an improvement at all. Perhaps the banchmark tests should be different to properly test this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants