Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Sample and reimplement #9

Open
xaprb opened this issue Jan 22, 2014 · 5 comments
Open

Remove Sample and reimplement #9

xaprb opened this issue Jan 22, 2014 · 5 comments

Comments

@xaprb
Copy link
Contributor

xaprb commented Jan 22, 2014

First a little background. I will share some details of how we use multitick, but guard some of the specifics since not all of that code is open-source.

In many places where we use multitick, especially agents, we want the following functionality:

  • Occasional ticking at larger intervals (e.g. 1 minute) during normal operation. Currently we call this "sampled ticking."
  • Jitter (uneven ticking) so that there isn't a DDOS against our APIs. Ideally, predictable and deterministic to ease debugging and incident analysis.
  • Guarantees about how many ticks will occur in an interval, even if jittered so it will happen unevenly. In other words, if we say we want a tick each minute but it's jittered, we do not want ticks to happen randomly but average once a minute. We mean, once per minute there WILL be one and only one tick, guaranteed.
  • All users of the ticker get exactly the same ticks, so if several routines are doing sampled jittered ticking, they all get it at the same time. This is important because we may be doing things like getting some data samples, and we want all of them to correspond to the same point in time so we can make sense of the overall picture they present.
  • Ability to turn off the sampling and get ticks at the underlying higher frequency. This is in response to something like a fault being detected by Adaptive Fault Detection. When such a condition is present, we want to be able to increase the frequency of our occasional data samples and get a lot of data for a short time. All users of the ticker should again get all the same ticks in this case, so they're coordinated.

In our internal codebase we currently accomplish this with something called RandomlyWithin(). This is based on a technique I learned from a colleague years ago. This is part of a larger bit of code, but for context, I think it's enough to say that we have an internal ts package that provides time functionality for a Ts type, which is Unix timestamps (integers). In that package, we have code like this:

// RandomlyWithin takes a timestamp and a duration, then returns a random time
// within that duration. For example, if you want to do something at an
// instant within the current hour, you can say now.RandomlyWithin(ts.Hour)
// and you'll get back a timestamp within the current hour at which you should
// do whatever it is you want. Although the choice of timestamp is random, it
// is deterministic, and you will receive the same answer all hour long. See
// http://stackoverflow.com/questions/14823792/how-to-choose-a-random-time-
// once-per-hour
// If you want to add some entropy, pass an optional second parameter, which
// will ensure that your process doesn't have the same behaviors as some other.
func (t Ts) RandomlyWithin(dur Ts, entropy ...uint32) Ts {
    md5hasher := md5.New()
    intervalStart := t.Floor(dur)
    toHash := uint32(intervalStart)
    if len(entropy) > 0 {
        toHash += entropy[0]
    }
    md5hasher.Write([]byte{
        uint8(toHash >> 24 & 255),
        uint8(toHash >> 16 & 255),
        uint8(toHash >> 8 & 255),
        uint8(toHash & 255)})
    randomNum := binary.BigEndian.Uint32(md5hasher.Sum(nil)[0:4])
    result := intervalStart + Ts(randomNum)%dur
    return result
}

So this is selecting an offset within the selected interval dur, based on the start period of the interval, and with some additional offset the caller can pass in.

Every time interval will have exactly one timestamp that satisfies the test. For example, if you ask if a timestamp is RandomlyWithin(ts.Minute) you will get a result for exactly one timestamp during that minute. This property is important because it does not rely on the application state. That is, if you have a process doing this once a minute and you restart it, there is no possibility that it will take a selected action twice in the minute.

Note that the function doesn't return true/false whether the given timestamp is the chosen one. It returns the chosen one. This can be useful because the calling application can then do things like countdowns; even if the selected timestamp isn't the chosen one, it knows which one will be chosen (or if that's already passed).

We use this code in our applications roughly like the following:

for now := range ticker {
    nowTs = ts.FromTime(now)
    if nowTs.RandomlyWithin(ts.Minute) == nowTs {
        // ...
    }
}

We were planning to move this functionality into multitick with the Sample() function, and make it appropriate for more general use cases, but after implementing it a little differently and looking at it, it doesn't really do what we want in some ways. So I think we need to reimplement it.

Here are two ideas I've had while thinking about this.

  1. Make an object or a function that implements something similar to ts.RandomlyWithin() as shown above. The advantages of this are that extra ability to do things like figure out when the tick you're waiting for is coming. The pattern for use would then look pretty similar to the above code.
  2. Make multitick.Ticker more flexible. It can either create a ticker internally and use it, or you can give it a ticker and it'll read ticks from that and broadcast them as usual. This way we could construct a Ticker by subscribing to another Ticker. (Chaining Tickers together could be very useful in general). If we do this, then one Ticker could generate events every second, and another one could filter them out and pass through one per interval (for example, one per minute). Turning on and off the filtering could be accomplished with Sample, as it is now. However, Sample could be implemented in terms of item 1, so users of the package could either get a simple interface without bells and knobs, or use the knobs if they want to.

I'm looking for suggestions and feedback.

@JnBrymn
Copy link
Contributor

JnBrymn commented Jan 22, 2014

One problem I ran into with the sample idea is that some processes need
per-second ticks and per-minute ticks, so I wrote code like this
(somewhat pseudo code):

ticker := multitick.Ticker{ShortInterval:time.Second,
LongInterval:time.Minute}
secTick := ticker.SubscribeShort()
minTick := ticker.SubscribeLong()
select {
case <- secTick:
  //do stuff every second
case <- minTick:
  //do important stuff every minute
}

I like the way this code looks and for me it's easier to reason about
than:

ticker := multitick.Ticker{Interval:time.Second}
secTick := ticker.Subscribe()
for {
  tick <- secTick
  //do stuff every second
  if ticker.RandomlyWithin(time.Minute) == tick {
    //do important stuff every minute
  }
}

But to my surprise, I didn't realize that the time.Ticker would drop ticks.
For the first example above this means that the "important" minute stuff
would not be executed because secTick won. Perhaps an easy fix for this is
to make the channel from the subscription buffered so that ticks aren't
dropped. The API could be ticker.SubscribeShort(10).

Along a different topic. I like the idea of tickers subscribing to other
tickers. You could do interesting stuff like this:

secTicker := multitick.Ticker{Interval:time.Second}
minTicker := secTicker.OnceEvery(60)
minTickerWithJitter := secTicker.OnceRandomlyWithinEvery(60)
minTickerWithJitterButWait1MinToStart := minTickerWithJitter.Wait(1)
bufferedMinuteTicker := minTicker.Buffered(10) //will start dropping after buffer is full
guaranteedMinuteTicker := minTicker.Guaranteed() //will pause until it can be consumed

@xaprb
Copy link
Contributor Author

xaprb commented Jan 22, 2014

Something like ticker.SubscribeShort(10) feels too specific to our use case. Some users might want more than two types of durations.

I do like the naming of OnceEvery() because that is clearer than RandomlyWithin(). That's a much better name. It beats Sample() also.

I don't think we can make the channels buffered without causing bugs. If ticks can pile up and then be delivered late, that makes it really hard to reason about what might happen in every possible case. It also breaks expectations that a multitick behaves like a time.Ticker with broadcasting. That expectation, by the way, is one of the reasons that I think the OnceEvery() functionality might be better off not co-mingled with the ticker functionality. It feels wrong somehow, although I can't really figure out why.

xaprb added a commit that referenced this issue Jan 24, 2014
Remove Sample() until after a decision on #9
@xaprb
Copy link
Contributor Author

xaprb commented Mar 4, 2014

Conceptually, a multiticker is a ticker and a multiplexor. It would actually be useful to make a multiplexor component and build a multiticker out of that plus a ticker. Two useful things to add into this mix would be a filter that can provide the OnceEvery() functionality, and a smoother that can round the ticks to the desired precision (which will be necessary for OnceEvery() to work right).

Then you could build a multiticker, and use it to construct a OnceEvery. The OnceEvery would be implemented as a filter and a multiplexor.

I think I'm going to hack on this. It feels like the right decomposition into independent bits of functionality and can be done without any interface changes.

@xaprb
Copy link
Contributor Author

xaprb commented Mar 7, 2014

I lied. It wasn't as clean as I thought. This isn't important so it's on the back burner again.

@JnBrymn
Copy link
Contributor

JnBrymn commented Mar 7, 2014

I call dibs once it's front burner again. MultiTicker beat me once, but
I'll get him next time. :-D

-John

On Fri, Mar 7, 2014 at 3:02 PM, Baron Schwartz notifications@github.comwrote:

I lied. It wasn't as clean as I thought. This isn't important so it's on
the back burner again.

Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-37062013
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants