Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slowTickIfNecessary with infrequently used EWMA #3929

Open
wants to merge 1 commit into
base: release/4.2.x
Choose a base branch
from

Conversation

aweisberg
Copy link

EWMA.tickIfNecessary does an amount of work that is linear to the amount of time that has passed since the last time the EWMA was ticked. For infrequently used EWMA this can lead to pauses observed in the 700-800 millisecond range after a few hundred days.

It's not really necessary to perform every tick as all that is doing is slowly approaching the smallest representable positive number in a double. Instead pick a number close to zero and then bound the number of ticks to allow that to be reachable from the largest value representable by the EWMA. Actually approaching the smallest representable number is still measurably slow and not particularly useful.

@aweisberg aweisberg requested review from a team as code owners January 30, 2024 22:22
@aweisberg
Copy link
Author

It seems like versions prior to 4.2.x are unmaintained so it's not possible to get this fixed in earlier versions?

More context on the issue and how it was found CASSANDRA-19332

@joschi
Copy link
Member

joschi commented Jan 31, 2024

@aweisberg Thanks for your contribution!

It seems like versions prior to 4.2.x are unmaintained so it's not possible to get this fixed in earlier versions?

Yes, that's correct. Any version before Dropwizard Metrics 4.2.x is unmaintained.

Which version are you using exactly in Cassandra?

@aweisberg
Copy link
Author

It's a mix, this is a list of active versions with 3.0/3.11 soon to be unsupported. 4.x will probably be supported for 5+ years. Not a big deal as the work around is to read all the Meters periodically.
5.0+ 4.2.19
4.1 3.1.5
4.0 3.1.5
3.11 3.1.5
3.0 3.1.0

@aweisberg aweisberg force-pushed the fix_slow_ewma_ticking branch 2 times, most recently from 3102202 to 7642773 Compare February 6, 2024 18:50
@aweisberg
Copy link
Author

aweisberg commented Feb 6, 2024

@joschi what is the next step? I reworked the fix slightly to reset the EWMAs to the smallest representable positive value instead of performing even the bounded amount of ticks.

It doesn't set it to 0.0 because that was not the existing behavior which is after the EWMA is first used will always at least have rate set to Double.MIN_NORMAL. Probably not important, but who knows what downstream stuff might not want that to change.

@smiklosovic
Copy link

@joschi would you mind to take a look at this please? Cassandra project would be very happy if this was included :)

EWMA.tickIfNecessary does an amount of work that is linear to the amount of time that has passed since the last time the EWMA was ticked. For infrequently used EWMA this can lead to pauses observed in the 700-800 millisecond range after a few hundred days.

It's not really necessary to perform every tick as all that is doing is slowly approaching the smallest representable positive number in a double. Instead pick a number close to zero and if the number of ticks required is greater then that don't do the ticks just set the value to close to zero immediately. Actually approaching the smallest representable number is still measurably slow and not particularly useful.

To avoid changing the observed output of the EWMA (which previous was only 0.0 if never used) set it close to Double.MIN_NORMAL rather then to 0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants