Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make timer callbacks respect reentrancy of grains #2574

Open
sergeybykov opened this issue Jan 7, 2017 · 96 comments · May be fixed by #8955
Open

Make timer callbacks respect reentrancy of grains #2574

sergeybykov opened this issue Jan 7, 2017 · 96 comments · May be fixed by #8955
Assignees

Comments

@sergeybykov
Copy link
Contributor

Today, timer callbacks interleave with grain methods (at await points) for both reentrant and non-reentrant grains. This is a source of confusion because developers, naturally, do not expect interleaving in non-reentrant grains. We need to make timer callbacks respect reentrancy of grain classes and not interleave for non-reentrant grains.

@cata
Copy link
Contributor

cata commented Jan 7, 2017

Heads-up: this is a breaking change to an actively used feature(/bug? :-) )

We are actively relying on timer callback reentrancy. This being said, I agree that the behavior is confusing and should be changed. I think a straightforward migration path would be to call a reentrant grain method (specified using AlwaysInterleave or MayInterleave) from the timer callback.

@sergeybykov
Copy link
Contributor Author

@cata, thanks for the heads-up. I didn't realize people rely on this behavior.

@kogir
Copy link

kogir commented Jan 9, 2017

This timer reentrancy initially surprised me as well, but now our code has also come to rely on it. The non-reentrant methods in our grains often control flags used by the timer callback.

Would be happy to migrate, but this "loophole" is currently the only way to have a partially reentrant grain - a useful feature that would be sad to lose.

@sergeybykov
Copy link
Contributor Author

Maybe we should make the behavior configurable then, to explicitly allow for both reentrant and non-reentrant execution of timer callbacks?

@cata
Copy link
Contributor

cata commented Jan 10, 2017

@sergeybykov - configurable behavior (an extra parameter on a RegisterTimer overload?) would work great for us.
The default behavior could be changed to match the grain's default reentrancy policy, thus making it less surprising.

@gabikliot
Copy link
Contributor

I think that is exactly what we agreed upon and planned to do a long time ago. That is at least what I remember.

@jamescarter-le
Copy link
Contributor

jamescarter-le commented Mar 14, 2017

@sergeybykov @gabikliot Please can I clarify if this behavior is applicable to Reminders too?

I ask because the Reminders are implemented via Timers underneath. I'm thinking something I've implemented ontop of GrainServices may be subject to this interleaving which I did not anticipate would be an issue.

If someone can explain the proposed method of resolving this I am happy to have a go.

@sergeybykov
Copy link
Contributor Author

@jamescarter-le Reminders are different in behavior from timers because their ticks are delivered via true grain calls that follow reentrancy/interleaving rules.

@shlomiw
Copy link
Contributor

shlomiw commented Mar 30, 2017

It's very important for timers callback not to interleave. Optional configuration (to allow re-entrant) is also welcomed.
When do you plan releasing it?

Please consider also an option to register a timer as such it wont allow Orleans deactivating the Grain.
Thanks.

@sergeybykov
Copy link
Contributor Author

We don't have concrete plans for this. But we are open to contributions.

@shlomiw
Copy link
Contributor

shlomiw commented Apr 2, 2017

In the meanwhile, what is the correct way to 'self invoke' from a timer method to avoid interleaving?
using this.AsReference<IFoo>() ?
is it possible to achieve this without exposing the invoked method in the interface?

@gabikliot
Copy link
Contributor

Yes, as reference.
No, it has to be exposed via interface. But you can define a different interface, it can be internal in the grain impl. assembly.

@shlomiw
Copy link
Contributor

shlomiw commented Apr 2, 2017

Ok, got it, thanks

@shlomiw
Copy link
Contributor

shlomiw commented Jun 5, 2017

To avoid the reentrancy of timers - I've used in my timer this.AsReference<IFoo>() to self-invoke a different method and by that it respects the non-reentrancy definition of the grain.
Now, I've seen that sometimes, when there are other methods which takes long time, then this method is executed multiple times sequentially and not in a 'timed' manner.

The timer only invokes the other method which get its turn scheduled, and sometimes, due to other long methods turns, it can be scheduled sequentially, and by that it doesn't respect the expected 'timer' intervals (i.e. run every x seconds).

Now, I can add a defense to avoid it running sequentially by checking the last time it was executed, but I was wondering if you have a better solution.

FYI - this is the method I used to self-invoke a method with the Grain context scheduler:

public static class GrainExtensions
    {
        /// <summary>
        /// SelfInvokeAfter
        /// invoke self method in next 'turn'
        /// this is how it's done to avoid interleaving and re-entrant issues
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="grain"></param>
        /// <param name="action"></param>
        public static void SelfInvokeAfter<T>(this Grain grain, Func<T, Task> action)
            where T : IGrain
        {
            var g = grain.AsReference<T>();
            var factory = new TaskFactory(TaskScheduler.Current);
            factory.StartNew(async () => await action(g));
        }
    }

@jason-bragg
Copy link
Contributor

@shlomiw I may be misunderstanding what you're trying to do here, but if the goal is to ensure that work triggered by the timer conforms to grain call semantics (ie, not reentrant) you need only make await the grain call.

    public static Task SelfInvokeAfter<T>(this Grain grain, Func<T, Task> action)
        where T : IGrain
    {
        var g = grain.AsReference<T>();
        return action(g));
    }

Assuming "action" contains a call to the grain, awaiting SelfInvokeAfter in the timer callback will prevent the next timer call from being scheduled until the first one is complete. By calling factory.StartNew(async () => await action(g)) and not waiting the completion of the operation, the timer callback is free to return immediately and begin counting down to the next trigger, before current work is complete.

@shlomiw
Copy link
Contributor

shlomiw commented Jun 5, 2017

If I can await the self-invoke from the timer then it sure makes sense (I thought I have to schedule it for next turn, to avoid 'deadlock', but it should be possible since the timer is reentrant!).
Thanks! you perfectly understood me. I'll validate it and let you know.

@shlomiw
Copy link
Contributor

shlomiw commented Jun 6, 2017

@jason-bragg - thanks! it works correctly as you suggested :)

@jason-bragg
Copy link
Contributor

@shlomiw, Glad that helped.
That's a common pattern used by those that want to schedule periotic work that respects Orleans call semantics.
It should be noted that since the timer is generating a grain call, assuming the timer period is shorter than the grain collection time, grains using this pattern will never be deactivated due to idleness, because the timer is always generating calls, which is interpreted as activity.
I don't know your service so I won't assume this is an issue for you, but wanted you to be aware of this side effect.

@shlomiw
Copy link
Contributor

shlomiw commented Jun 7, 2017

@jason-bragg - I'm aware of it, thanks for bringing that up!
I still think you should let the timers be configurable about their re-entrancy, I would also say that the default should be same as the Grain. This can be very confusing and cause unexpected results.

@shlomiw
Copy link
Contributor

shlomiw commented Feb 1, 2018

@jason-bragg -

It should be noted that since the timer is generating a grain call, assuming the timer period is shorter than the grain collection time, grains using this pattern will never be deactivated due to idleness, because the timer is always generating calls, which is interpreted as activity.

Now I have a scenario where I want to use a timer for grain local maintenance, which still respects the re-entrancy (i.e. - non-entrant), but I don't want it to keep the grain alive.

Any idea how?

@sebastianburckhardt
Copy link
Contributor

I am sometimes using await loops with delays:

while (! done)
{
     do_maintenance_work();
     await Task.Delay(period);
}

As far as I know, this does the right thing if the grain deactivates (i.e. it will no longer resume at the await) but I am not 100% sure.

@shlomiw
Copy link
Contributor

shlomiw commented Feb 4, 2018

@sebastianburckhardt - tnx for your reply!
I'm not sure I understand. In what you suggested, if my grain is non-reentrant (as in my case) it would never get outside requests.

@shlomiw
Copy link
Contributor

shlomiw commented Feb 4, 2018

btw - instead of using timers, I was thinking about using grain call filter (interceptor) and execute my maintenance after I got a request (i.e. every once in a while).

@skyflyer
Copy link

skyflyer commented Nov 3, 2018

@gabikliot, @sergeybykov, @jason-bragg,

I just tested timer invocation with the suggested workaround and I still see interleaving calls. I'm using Orleans 2.0*.

The grain implementation
using System;
using System.Threading.Tasks;
using OrleansGrainInterfaces;
using Orleans;
using Microsoft.Extensions.Logging;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using Orleans.Concurrency;

namespace OrleansGrains
{
    public class ValueGrain : Grain, IValueGrain
    {
        private readonly ILogger _logger;
        private readonly Random _rnd;
        IDisposable _timer;

        public ValueGrain(ILogger<ValueGrain> logger) {
            _logger = logger;
            _rnd = new Random();
        }

        public async Task<string> Get()
        {
            Console.WriteLine($"***Start***");
            await Task.Delay(3000);
            
            Console.WriteLine($"***Finish***");
            return this.GetPrimaryKeyString();
        }

        public async Task OnTimer(object data)
        {
            await this.AsReference<IValueGrain>().Update();
        }

        public async Task Update()
        {
            Console.WriteLine($"***StartTimer***");
            var waitPeriod = _rnd.Next(5000,15000);
            await Task.Delay(waitPeriod);
            Console.WriteLine($"***FinishTimer***");
        }

        public override Task OnActivateAsync()
        {
            _timer = RegisterTimer(this.OnTimer, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(10));
            return Task.CompletedTask;
        }

        public override Task OnDeactivateAsync()
        {
            _timer?.Dispose();
            return Task.CompletedTask;
        }
    }
}

Output produced:

***Start***
***StartTimer*** <-- should not start, since the `Get` method was not finished yet
***Finish***
***Start*** <-- new `Get` invocations during a timer invocation
***Finish***
***Start***
***FinishTimer***
***Finish***

I understand that the cals are (supposedly?) interleaved between await, but the documentation explicitly states (emphasis mine):

Each invocation of asyncCallback is delivered to an activation on a separate turn and will never run concurrently with other turns on the same activation. Note however, asyncCallback invocations are not delivered as messages and are thus not subject to message interleaving semantics. This means that invocations of asyncCallback should be considered to behave as if running on a reentrant grain with respect to other messages to that grain.

I guess I'm not understanding this two sentences correctly, as they seem contradictory to me.

So, timer callbacks will always interleave (based on testing), but the workaround, by using this.AsReference<IValueGrain>().Update() should honor the turn-based method invocation, but it does not in practice.

I'd appreciate an explanation, as I'm probably just misunderstanding some concepts here.

@ReubenBond
Copy link
Member

We plan to address this in/before Orleans 8.0

@turowicz
Copy link

turowicz commented Aug 3, 2023

So what is the prescribed solution to have a thread safe invocation?

@Galileon
Copy link

Galileon commented Aug 3, 2023

hi, cześć
This works


async Task OnCheckDirectedInteractionsTimeOut(object state) => await Task.Factory.StartNew(() => this.AsReference<IInteractionsRouterGrain>().ProcessOnCheckDirectedInteractionsTimeOut(), CancellationToken.None, TaskCreationOptions.PreferFairness, _ts);

      public InteractionsRouterGrain(ILogger<InteractionsRouterGrain> logger,
            Extent.Services.Base.Perfomance.ServiceDiagnosticsCounters diagnosticCounters,
            SessionServiceConfiguration sessionServiceConfiguration,
            Extent.Services.MediaService.Proto.MediaService.MediaServiceClient mediaServiceClient,
            IOrleansContextAccessor contextAccessor)
        {
            (_log, _sessionServiceConfiguration, _contextAccessor, _diagnosticCounters) = (logger, sessionServiceConfiguration, contextAccessor, diagnosticCounters);
            _ts = TaskScheduler.Current;
        } 

 public async override Task OnActivateAsync()
        {
            _log.LogDebug("Activating for {InstanceId} / {GrainIdentityString}...", this.GetPrimaryKey(), IdentityString);
            var streamProvider = GetStreamProvider(OrleansConstants.SessionServiceStreamProviderName);
            await streamProvider.GetStream<UserSessionStateChangedGrainEvent>(this.GetPrimaryKey(), OrleansConstants.UserSessionStateChangedStreamNamespace).SubscribeAsync(OnNextAsync);
            await streamProvider.GetStream<InteractionQueuedGrainEvent>(this.GetPrimaryKey(), OrleansConstants.InteractionsQueuedStreamNamespace).SubscribeAsync(OnNextAsync);
            await streamProvider.GetStream<ServiceSettingChangedGrainEvent>(this.GetPrimaryKey(), OrleansConstants.ServiceSettingChangedStreamNamespace).SubscribeAsync(OnNextAsync);
            await streamProvider.GetStream<InteractionMediaCallDisconnectedGrainEvent>(this.GetPrimaryKey(), OrleansConstants.InteractionMediaCallDisconnectedStreamNamespace).SubscribeAsync(OnNextAsync);
            await streamProvider.GetStream<InteractionMediaCallConnectedGrainEvent>(this.GetPrimaryKey(), OrleansConstants.InteractionMediaCallConnectedStreamNamespace).SubscribeAsync(OnNextAsync);
            await LoadServiceSettings();
            RegisterTimer(OnCheckDirectedInteractionsTimeOut, null, TimeSpan.Zero, TimeSpan.FromSeconds(5));
            RegisterTimer(OnReditectInteractionTimerMethod, null, TimeSpan.Zero, TimeSpan.FromSeconds(5));
            await base.OnActivateAsync();
        }


i didnt found anything better since that time .


@turowicz
Copy link

turowicz commented Aug 4, 2023

This seems to work:

public override async Task OnActivateAsync(CancellationToken cancellationToken)
{
    _ = RegisterTimer(FlushTimerAsync, null, timeout, timeout);

    await base.OnActivateAsync(cancellationToken);
}

public async Task FlushAsync()
{
    // activation safe code
}

private async Task FlushTimerAsync(object arg)
{
    await Task.Run(async () => await this.AsReference<IMyGrain>().FlushAsync());
}

Can anyone confirm this is the right way? cc @ReubenBond @benjaminpetit

@ReubenBond
Copy link
Member

@turowicz You should be able to remove the Task.Run part of this and await the grain call instead.

private async Task FlushTimerAsync(object arg)
{
    await this.AsReference<IMyGrain>().FlushAsync();
}

@turowicz
Copy link

turowicz commented Aug 7, 2023

@ReubenBond in my tests I get System.InvalidOperationException : Activation access violation. A non-activation thread attempted to access activation services. without that Task.Run part.

I took the Task.Run from @Galileon's answer.

@ReubenBond
Copy link
Member

@turowicz, could you link me to a repro? It may be that something is causing the timer callback passed to RegisterTimer to leave the Grain's scheduler, such as ConfigureAwait(false)

@Rohansi
Copy link

Rohansi commented Aug 7, 2023

Would really love to see this be resolved so grain timers fit within the actor's expected concurrency model.

@turowicz
Copy link

turowicz commented Aug 8, 2023

@turowicz
Copy link

turowicz commented Aug 9, 2023

@ReubenBond it fails correctly as the access violation is swallowed inside the IQueueAdapterReceiver, but it shows in test as invalid number of invocations in Moq due to reattempt to process message.
GrainsQueueAdapterReceiver.cs line 43

@yevhen
Copy link
Contributor

yevhen commented Aug 11, 2023

When I want timer to execute in non-interleaved way I just call self via interface method. Works like a charm.

@turowicz
Copy link

turowicz commented Aug 11, 2023

@yevhen thats great but my calls are made via IQueueAdapterReceiver and it causes Access Violations in composition with a timer. Perhaps its only caused by the OrleansInternalClient?

@yevhen
Copy link
Contributor

yevhen commented Aug 11, 2023

It shouldn't make any difference in theory. AFAIU, the internal client just skips the socket and routes messages by directly using the silo's internal infrastructure.

@yevhen
Copy link
Contributor

yevhen commented Aug 11, 2023

@turowicz Can you share the code?

@turowicz
Copy link

@yevhen the code is linked here #2574 (comment)

@turowicz
Copy link

@ReubenBond can you confirm that there is nothing odd with calling the Grains from the IQueueAdapter?

@tomachristian
Copy link

tomachristian commented Apr 13, 2024

@ReubenBond the proposed solution by you and others with the grain queueing a request to itself from inside the callback has a serious flaw unfortunately… Take the following scenario:

  • The timer ticks (the callback delegate begins execution)
  • Immediately after this, the grain is scheduled to deactivate (think of it as calling DeactivateOnIdle() as the first thing inside the timer callback to simulate the runtime deactivating the grain)
  • The callback awaits a call to the grain itself via a reference
  • The call does not get processed by the grain because it was scheduled to deactivate, effectively hanging
  • The OnDeactivate method on the grain cannot get called because the timer callback has not completed
  • This is a dead-lock

Of course the dead-lock will eventually resolve, as the call to the grain itself from inside the timer callback will eventually timeout, depending on how Orleans is configured.

Also, this call cannot be picked up by another activation of the same grain, because this one (the previous one) has not yet deactivated AFAIK (right?).

A good way to solve this is to not await the call to the grain itself from inside the timer callback, or even better, have that method that gets called be annotated with [OneWay]. Even though this might have the problem of timer ticks getting batched together one after another when the period is really small and the execution of one tick takes longer than the period, due to no longer awaiting.

IMO, this should AT LEAST be properly documented/described and provided with alternatives with pros/cons, even before any fix is added (will it be?).

This has to be done, given that Orleans is supported by MS and is a first class-citizen.

PS: If my description does not provide enough explanation, I am happy to provide more information or even reproducible steps in code.

@Rohansi
Copy link

Rohansi commented Apr 15, 2024

@tomachristian I've had no issues with the workaround from this thread when using Task.Run. Have you tried this #2574 (comment)?

If you're already doing that then I'm not sure how you'd getting a deadlock at all when timer callbacks don't block calls to the grain. That's the whole issue being raised in this thread - timer callbacks do not block calls to the grain (which isn't intuitive). If they did then that workaround should also trigger a deadlock because the timer callback triggers an external call (separate call chain via Task.Run) to the grain and awaits it. How could the grain call run if the timer callback is waiting for the grain call to complete?

@tomachristian
Copy link

tomachristian commented Apr 15, 2024

@Rohansi please try this code to see what I mean. I didn't say I have this problem, I am using OneWay to avoid this, but people using the solution proposed here MIGHT have this problem. It is non-deterministic.

public interface IMyGrain : IGrainWithStringKey
{
    Task ExecuteSomething();

    Task Tick();
}

public sealed class MyGrain : Grain, IMyGrain
{
    public Task ExecuteSomething()
    {
        var self = this.AsReference<IMyGrain>();

        RegisterTimer(async _ =>
        {
            // Imagine the runtime doing this exactly at this point in time
            DeactivateOnIdle();

            await self.Tick();
        }, null!, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));

        return Task.CompletedTask;
    }

    public Task Tick()
    {
        Console.WriteLine("STUCK BECAUSE OF DEAD-LOCK, EVENTUALLY REACHED BECAUSE OF TIMEOUTS");

        return Task.CompletedTask;
    }

    public override Task OnDeactivateAsync(DeactivationReason reason, CancellationToken cancellationToken)
    {
        Console.WriteLine("STUCK BECAUSE OF DEAD-LOCK, EVENTUALLY REACHED BECAUSE OF TIMEOUTS");

        return Task.CompletedTask;
    }
}

public class Tests : BackgroundService
{
    private readonly IMyGrain _grain;

    public Tests(IGrainFactory grainFactory)
    {
        _grain = grainFactory.GetGrain<IMyGrain>("1");
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await _grain.ExecuteSomething();
    }
}

You are right that timer callbacks do not block calls to the grain, I am not saying they do. I explained in those bullet points what is going on. Grain gets deactivated (eg. by runtime), cannot process further requests/invocations (neither from outside or from itself), the call to itself never gets fullfilled, the callback never completes, thus the OnDeactivateAsync never gets called.

@Rohansi
Copy link

Rohansi commented Apr 18, 2024

@tomachristian I haven't tried that specifically because it is different from the suggested workaround - there is no Task.Run around the call to the Tick method. While it looks like it would be useless it actually changes how the method is called. Instead of the Orleans runtime seeing the call stack as ExecuteSomething -> Tick (reentrant) it has separate call stacks for ExecuteSomething and Tick.

The call to Tick should have the same semantics as virtual actors in general, if the target grain was deactivated then re-activate it. If that isn't happening then I'd think that's its own bug.

@tomachristian
Copy link

@Rohansi it dead-locks with and without Task.Run, I invite you to try it.

@tomachristian
Copy link

It is already reported here #7865, btw.

@Rohansi
Copy link

Rohansi commented Apr 18, 2024

Good to know. I actually stopped using Orleans because it was so prone to tricky deadlocks. Loved using it but these are some pretty critical issues that have been unresolved for too long.

@tomachristian
Copy link

@Rohansi indeed. Unfortuantely I am noticing this myself as well, more and more

@ReubenBond
Copy link
Member

ReubenBond commented Apr 21, 2024

@tomachristian @Rohansi thank you for reporting. I've opened some PRs (#8950 (3.x: #8949, 7.x: #8951), #8953, and #8954) to fix the deadlock and improve timers in Orleans. The PRs do not yet implement non-reentrant timers, but that is something we hope to address soon. EDIT: I have opened a PR for Non-reentrant grain timers here: #8955 cc @benjaminpetit

@ReubenBond ReubenBond linked a pull request Apr 21, 2024 that will close this issue
@ReubenBond
Copy link
Member

ReubenBond commented Apr 21, 2024

PR for Non-reentrant timers: #8955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.