Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANR related to AbstractFuture.get() #890

Closed
1 task
bb4242 opened this issue Dec 13, 2023 · 10 comments
Closed
1 task

ANR related to AbstractFuture.get() #890

bb4242 opened this issue Dec 13, 2023 · 10 comments
Assignees
Labels

Comments

@bb4242
Copy link

bb4242 commented Dec 13, 2023

Version

Media3 1.2.0

More version details

No response

Devices that reproduce the issue

I've seen this happen on many different devices, including:

  • Google bluejay

  • Google sunfish

  • Samsung dm1q

  • Samsung r9q

  • Mix of Android 13 and Android 14

Devices that do not reproduce the issue

I'm not sure if I've noticed any specific devices that don't have this problem.

Reproducible in the demo app?

Not tested

Reproduction steps

Hello! I have an app that plays media using the Media3 library. We're currently seeing a number of ANRs in production in the Google Play console. The cause is shown as

org.dharmaseed.android.MiniPlayerFragment.lambda$onResume$0$org-dharmaseed-android-MiniPlayerFragment
    Input dispatching timed out

The stack trace for the main thread is

Thread 1
    "main" tid=1 Waiting
Main thread
      at jdk.internal.misc.Unsafe.park (Native method)
      at java.util.concurrent.locks.LockSupport.park (LockSupport.java:211)
      at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:557)
      at org.dharmaseed.android.MiniPlayerFragment.lambda$onResume$0$org-dharmaseed-android-MiniPlayerFragment (MiniPlayerFragment.java:53)
      at org.dharmaseed.android.MiniPlayerFragment$$ExternalSyntheticLambda0.run (unavailable:2)
      at android.os.Handler.handleCallback (Handler.java:958)
      at android.os.Handler.dispatchMessage (Handler.java:99)
      at android.os.Looper.loopOnce (Looper.java:205)
      at android.os.Looper.loop (Looper.java:294)
      at android.app.ActivityThread.main (ActivityThread.java:8177)
      at java.lang.reflect.Method.invoke (Native method)
      at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:552)
      at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:971)

The MiniPlayerFragment code in the stack trace attempts to create a MediaController instance in onResume, which connects to our MediaSessionService, like this:

    @Override
    public void onResume() {
        super.onResume();

        // Create the media controller
        Context ctx = getContext();
        SessionToken sessionToken =
                new SessionToken(ctx, new ComponentName(ctx, PlaybackService.class));
        controllerFuture = new MediaController.Builder(ctx, sessionToken).buildAsync();
        controllerFuture.addListener(() -> {
            try {
                mediaController = controllerFuture.get();
                mediaController.addListener(playerListener);
                playerListener.onMediaMetadataChanged(mediaController.getMediaMetadata());
                playerListener.onPlaybackStateChanged(mediaController.getPlaybackState());
                playerListener.onIsPlayingChanged(mediaController.isPlaying());
            } catch (InterruptedException | ExecutionException | CancellationException e) {
                Log.e(LOG_TAG, "Could not create media controller. " + e.toString());
            }
        }, ContextCompat.getMainExecutor(ctx));

    }

It also attempts to release the controller in onPause:

    @Override
    public void onPause() {
        super.onPause();
        MediaController.releaseFuture(controllerFuture);
    }

This ANR is very difficult to reproduce for me when testing locally, but it shows up regularly in production. As far as I can tell, the root of the problem is that controllerFuture.get() is blocking long enough to cause the ANR. I'm not sure why it would block, since as far as I know, that code should only be executed once the future is complete. Do you have any idea why this is happening? It seems like this might be somewhat related to #167, but I'm not seeing any ForegroundServiceDidNotStartInTimeExceptions happening in production.

Expected result

The future should complete successfully and return the MediaController object.

Actual result

The controllerFuture.get() call blocks long enough to cause an ANR.

Media

I don't think this is related to the specific media being played, since it's happening while trying to get a MediaController object.

Bug Report

@icbaker
Copy link
Collaborator

icbaker commented Dec 15, 2023

I'm not sure why it would block, since as far as I know, that code should only be executed once the future is complete. Do you have any idea why this is happening?

I agree this seems surprising - like something in Guava isn't working the way we expect. What happens if you replace the call to controllerFuture.get() with Futures.getDone(controllerFuture) which shouldn't block (just immediately explode if the future isn't done)? Or you could try using Futures.addCallback(...) so that the MediaController is passed directly into the FutureCallback.onSuccess method.

@bb4242
Copy link
Author

bb4242 commented Dec 15, 2023

Hi @icbaker, thanks for taking time to look at this, I appreciate your help!

I've experimented with both of your suggestions (using Futures.getDone(controllerFuture) or Futures.addCallback(...)). Locally, both of these seem to work fine. But the original code also works fine locally; it's only in production that I've encountered these ANRs, making this a tricky problem to diagnose. Perhaps I'll release a new version to production with one of your suggested options and see whether that helps (it generally takes several days for the rollout in the Play Store and new failure rate data to start coming in).

In thinking about this further, something else occurred to me. There is second place in the app where we create a MediaController inside onResume of an Activity (PlayTalkActivity) in the basically same way as the problematic code from MiniPlayerFragment I reported above:

        SessionToken sessionToken =
                new SessionToken(this, new ComponentName(this, PlaybackService.class));
        controllerFuture = new MediaController.Builder(this, sessionToken).buildAsync();
        controllerFuture.addListener(() -> {
                try {
                    mediaController = controllerFuture.get();
                    mediaController.addListener(playerListener);
                    playerListener.onIsPlayingChanged(mediaController.isPlaying());
                    updatePlayerUI();
                } catch (InterruptedException | ExecutionException | CancellationException e) {
                    Log.e(LOG_TAG, "Could not create media controller. " + e.toString());
                }
            }, ContextCompat.getMainExecutor(this));

But interestingly, I haven't seen any similar ANRs happening in PlayTalkActivity. I wonder why this is? I have a hypothesis, but I'm not sure how plausible it is:

MiniPlayerFragment is embedded inside of several different activities (including PlayTalkActivity, in fact) in the app. When the user switches between two activities that both contain MiniPlayerFragments, then, it means that MiniPlayerFragment.onPause() gets called when Android pauses the old activity, and then MiniPlayerFragment.onResume() gets called very shortly thereafter when Android starts the new activity. In contrast, I don't believe PlayTalkActivity will ever get paused and then quickly resumed again in the same way.

onPause() for both MiniPlayerFragment and PlayTalkActivity calls MediaController.releaseFuture(controllerFuture). Do you think there might be a problem with calling MediaController.releaseFuture(controllerFuture) in onPause(), and then quickly calling controllerFuture = new MediaController.Builder(ctx, sessionToken).buildAsync() in onResume() (thus making the old controllerFuture instance eligible for garbage collection)? From these docs, I noticed this:

As with other components, remember to release the MediaController when it is no longer needed, such as in the onStop() method of an Activity or Fragment.
MediaController.releaseFuture(controllerFuture);
Releasing the controller will still deliver all pending commands sent to the session and only unbind from the session service either once these commands have been handled or after a timeout period, whichever occurs first.

That last sentence makes me suspect it might not be safe to garbage collect controllerFuture immediately after calling MediaController.releaseFuture(), but I'm not sure.

Any thoughts or insights you have around this would be much appreciated! Thanks again for your help, and have a nice day.

bb4242 added a commit to dharmaseed/dharmaseed-android that referenced this issue Dec 17, 2023
For background on the issue, and why this change may help, see
androidx/media#890

The basic idea here is to reduce the rate at which we create and release
MediaControllers in MiniPlayerFragment, by moving the creation and
releasing to fragment lifecycle callbacks that are invoked less
frequently than `onResume()` and `onPause()`. It's not completely clear
that this will definitely solve the problems we're seeing in production,
but it seems like a worthwhile experiment.
@icbaker
Copy link
Collaborator

icbaker commented Dec 18, 2023

Ah I hadn't spotted in your original code snippet that controllerFuture is not locally-scoped and, if there are multiple threads involved, could potentially be re-assigned in between the future completing and the listener being invoked, resulting in your .get() call being made on a different (incomplete) instance - which would ofc block.

Given that, it seems much safer to switch to my Futures.addCallback(...) suggestion, because then there's no confusion about which instance you're calling .get() on - the completed value is just passed directly in to the onSuccess(...) method you implement.

@bb4242
Copy link
Author

bb4242 commented Dec 18, 2023

Thanks @icbaker. I had actually originally implemented controllerFuture as being locally scoped inside of MiniPlayerFragment.onResume(). However, I ran into problems with that approach: since controllerFuture could be immediately garbage collected once onResume() exits, sometimes this would happen before the future had completed, and the listener that calls .get() would never run. This problem is also highlighted in the MediaController.Builder docs:

The future must be kept by callers until the future is complete to get the controller instance. Otherwise, the future might be garbage collected and the listener added by addListener would never be called.

So I switched controllerFuture to be an instance-scope variable in this commit. FWIW, it appears that the Media3 demo app also follows this strategy of scoping controllerFuture as an instance variable inside of the player activity:

private lateinit var controllerFuture: ListenableFuture<MediaController>

I'm pretty sure there is only a single thread (the main application thread) that calls any of the MiniPlayerFragment lifecycle callbacks (some quick local tests confirm this). So I don't think the problem comes from multiple threads trying to concurrently access or modify controllerFuture. My suspicion is more that it's not safe to garbage collect controllerFuture immediately after calling MediaController.releaseFuture(controllerFuture), or perhaps that it's not safe to immediately try to create another connection to the media session service while the first connection is still being cleaned up (i.e. delivering all pending commands, as mentioned in the docs) after we call releaseFuture, even if all of this happens on a single thread.

In any case, I'm very open to your suggestion to move to using Futures.addCallback(...) and see if that helps. I'm not familiar enough with Guava to deeply understand the differences between using Futures.addCallback(...) and controllerFuture.addListener(...), but if we do find Futures.addCallback(...) is a better approach, perhaps it would make sense to update documentation that currently suggests using controllerFuture.addListener(...) (such as here and here) to reflect this?

Thanks again for your help here!

bb4242 added a commit to dharmaseed/dharmaseed-android that referenced this issue Dec 19, 2023
For background on the issue, and why this change may help, see
androidx/media#890. I had hoped that upgrading
the Media3 library version back in #92 would completely solve the
problems we had been seeing in production. While it did help in some
ways, by eliminating `ForegroundServiceDidNotStartInTimeException`s (see
androidx/media#167), it also seems to have
morphed a lot of problems into a new form, namely App Not Responding
(ANR) problems that arise in `MiniPlayerFragment`.

The basic idea here is to reduce the rate at which we create and release
`MediaController`s in `MiniPlayerFragment`, by moving the creation and
releasing to fragment lifecycle callbacks that are invoked less
frequently than `onResume()` and `onPause()`. My hypothesis is that
rapid creation and releasing of `MediaController`s results in some kind
of race condition that is very hard to reproduce locally but will occur
in production with enough use of the app.

It's not completely clear that this PR will definitely solve the
problems we're seeing in production, but it seems like a worthwhile
experiment to try. Other than potentially decreasing the rate of ANRs,
it should not have any other noticeable effects on the behavior of the
app.
@icbaker
Copy link
Collaborator

icbaker commented Dec 20, 2023

I'm pretty sure there is only a single thread (the main application thread) that calls any of the MiniPlayerFragment lifecycle callbacks (some quick local tests confirm this).

Yeah that would be my assumption too, but I haven't come up with a single-threaded sequence of events that explains the issue you're seeing - and it's also worth noting that you can't repro locally, so your local testing may not accurately represent the problematic scenario.


FWIW, it appears that the Media3 demo app also follows this strategy of scoping controllerFuture as an instance variable inside of the player activity:

private lateinit var controllerFuture: ListenableFuture<MediaController>

This uses lateinit, which prevents reassignment, so makes the problem I'm describing impossible.

Edit: This isn't true (see comment below).


I'm not familiar enough with Guava to deeply understand the differences between using Futures.addCallback(...) and controllerFuture.addListener(...)

It's not really a Guava-specific thing, it's just about Java scopes and when references are resolved. You could also use addListener with a local reference to achieve the same thing.

Some comparisons (in Java, sorry, hopefully clear enough though):

class Foo {
  private ListenableFuture<Bar> barFutureField;

  private void usingFieldReference() {
    barFutureField = createBarFuture();
    barFuture.addListener(
        () -> {
          try {
            // References barFutureField, and this resolves to a particular object
            // *when this line is executed inside the callback* (i.e. after the future
            // is done), meaning if the field is reassigned then we will access the
            // 'new' (possibly incomplete) object here.
            Bar bar = barFutureField.get();
            // Do something with bar
          } catch (ExecutionException | InterruptedException e) {
            // Handle exception
          }
        }, ContextCompat.getMainExecutor(this));
  }

  private void usingLocalReference() {
    barFutureField = createBarFuture();
    // Resolves to a specific object *before* the callback is invoked
    // (or even registered).
    ListenableFuture<Bar> barFuturelocal = barFutureField;
    // Add the listener to the local, not the field
    barFuturelocal.addListener(
        () -> {
          try {
            // This *must* be the same object the listener was registered on,
            // since it's a local.
            Bar bar = barFuturelocal.get();
            // Do something with bar
          } catch (ExecutionException | InterruptedException e) {
            // Handle exception
          }
        }, ContextCompat.getMainExecutor(this));
  }

  private void usingAddCallback() {
    barFutureField = createBarFuture();
    Futures.addCallback(
        barFutureField,
        new FutureCallback() {
          public void onSuccess(Bar result) {
            // You get the result passed in here, so no need to
            // call barFutureField.get() and no scope for accidentally
            // accessing a new, reassigned, incomplete future.
          }
          public void onFailure(Throwable t) {
            // Handle exception
          }
        },
        ContextCompat.getMainExecutor(this));
  }
}

@bb4242
Copy link
Author

bb4242 commented Dec 22, 2023

I'm pretty sure there is only a single thread (the main application thread) that calls any of the MiniPlayerFragment lifecycle callbacks (some quick local tests confirm this).

Yeah that would be my assumption too, but I haven't come up with a single-threaded sequence of events that explains the issue you're seeing - and it's also worth noting that you can't repro locally, so your local testing may not accurately represent the problematic scenario.

Yes, that's a fair point! 🙂

FWIW, it appears that the Media3 demo app also follows this strategy of scoping controllerFuture as an instance variable inside of the player activity:

private lateinit var controllerFuture: ListenableFuture<MediaController>

This uses lateinit, which prevents reassignment, so makes the problem I'm describing impossible.

Ah, thank you - I missed that lateinit detail.

I'm not familiar enough with Guava to deeply understand the differences between using Futures.addCallback(...) and controllerFuture.addListener(...)

It's not really a Guava-specific thing, it's just about Java scopes and when references are resolved. You could also use addListener with a local reference to achieve the same thing.

Some comparisons (in Java, sorry, hopefully clear enough though):

class Foo {
  private ListenableFuture<Bar> barFutureField;

  private void usingFieldReference() {
    barFutureField = createBarFuture();
    barFuture.addListener(
        () -> {
          try {
            // References barFutureField, and this resolves to a particular object
            // *when this line is executed inside the callback* (i.e. after the future
            // is done), meaning if the field is reassigned then we will access the
            // 'new' (possibly incomplete) object here.
            Bar bar = barFutureField.get();
            // Do something with bar
          } catch (ExecutionException | InterruptedException e) {
            // Handle exception
          }
        }, ContextCompat.getMainExecutor(this));
  }

  private void usingLocalReference() {
    barFutureField = createBarFuture();
    // Resolves to a specific object *before* the callback is invoked
    // (or even registered).
    ListenableFuture<Bar> barFuturelocal = barFutureField;
    // Add the listener to the local, not the field
    barFuturelocal.addListener(
        () -> {
          try {
            // This *must* be the same object the listener was registered on,
            // since it's a local.
            Bar bar = barFuturelocal.get();
            // Do something with bar
          } catch (ExecutionException | InterruptedException e) {
            // Handle exception
          }
        }, ContextCompat.getMainExecutor(this));
  }

  private void usingAddCallback() {
    barFutureField = createBarFuture();
    Futures.addCallback(
        barFutureField,
        new FutureCallback() {
          public void onSuccess(Bar result) {
            // You get the result passed in here, so no need to
            // call barFutureField.get() and no scope for accidentally
            // accessing a new, reassigned, incomplete future.
          }
          public void onFailure(Throwable t) {
            // Handle exception
          }
        },
        ContextCompat.getMainExecutor(this));
  }
}

Thank you, I really appreciate you taking the time to write this out. For some reason, I had been under the mistaken assumption that barFutureField gets bound inside the lambda expression at the time of calling barFutureField.addListener(...), not when the lambda actually runs. Your explanation cleared that up for me, thanks! I see what you mean about the advantages of using Futures.addCallback(...).


As an experiment, we released a small change to production two days ago, which simply moves the media controller initialization code in MiniPlayerFragment from onResume() to a fragment lifecycle callback that runs less frequently (onViewCreated), and similarly moves the code that releases the media controller from onPause to onDestroyView. We have only very preliminary data so far, but what we can see is interesting. There have been no further occurrences of ANRs related to MiniPlayerFragment (although it may be too soon to conclude that the problem is totally fixed; more data is needed). However, we now have very similar ANRs (i.e. ones related to AbstractFuture.get()) happening in the other app component that uses a media controller, PlayTalkActivity. These PlayTalkActivity ANRs were not happening prior to this change, and no code within PlayTalkActivity was changed.

I'm not completely sure what to make of this yet, other than to hypothesize there may be some kind of interaction taking place between PlayTalkActivity and MiniPlayerFragment when they each attempt to connect to the media service. An alternate hypothesis could be that the earlier MiniPlayerFragment ANRs were masking a latent problem in PlayTalkActivity, and this problem is being exposed now that MiniPlayerFragment is fixed. But since both PlayTalkActivity and MiniPlayerFragment use basically the same code to create the connection (i.e. using a field variable for controllerFuture and controllerFuture.addListener(...)), this second hypothesis seems less likely to me; I'd be surprised if PlayTalkActivity has some unique problem that MiniPlayerFragment doesn't also have.

In any case, I think our next step should be to switch to your suggestion of trying to use Futures.addCallback in both MiniPlayerFragment and PlayTalkActivity, and see what happens. 🙂

@icbaker
Copy link
Collaborator

icbaker commented Dec 22, 2023

FWIW, it appears that the Media3 demo app also follows this strategy of scoping controllerFuture as an instance variable inside of the player activity:

private lateinit var controllerFuture: ListenableFuture<MediaController>

This uses lateinit, which prevents reassignment, so makes the problem I'm describing impossible.

My statement here is incorrect (I've edited the comment above). lateinit var doesn't prevent reassignment, it just prevents reassigning to null (i.e. it can be null at the start, but once you see it non-null it will never be null again): https://stackoverflow.com/questions/48443167/kotlin-lateinit-to-val-or-alternatively-a-var-that-can-set-once

So you could be right and this code is brittle in the same way.

One difference I noticed between the session demo app and your example is that you explicitly pass ContextCompat.getMainExecutor(this) whereas we pass in MoreExecutors.directExecutor() to execute the callback on. Concretely this means, when the future is completed (I think!):

  • Your callback will be queued onto the main looper and run when all the tasks in front of it have finished.
  • The session demo app callback will be directly executed by the thread completing the future, with no queueing.

The controller future is completed here:

postOrRun(new Handler(applicationLooper), () -> holder.setController(controller));

This completion happens on the application looper, which is by default is the looper of the thread used to build the MediaController, but can be customized (neither your example nor the session demo app do so). In both the session demo and your app this should also be the main thread (as you observed).

I don't have solid theory here, but I wonder if the fact your listener invocation is added to the back of the main looper's queue instead of being executed immediately can result in it executing 'late', with the field already having been reassigned (during a previous loper iteration).

@bb4242
Copy link
Author

bb4242 commented Dec 27, 2023

Oh, that's an interesting idea about the possible difference in executor behavior causing a problem, thanks! Switching to MoreExecutors.directExecutor() seems like it would be a useful experiment for us to try. I'll try to push out that change to production soon and we can see what effect it has. I do also intend to try your suggestion of switching to Futures.addCallback() as well, although I think maybe I'll do that in a separate release so we can better isolate the effect of these changes and help get to the root cause of the problem.

Thanks again for helping to dig into this!

intermarc added a commit to dharmaseed/dharmaseed-android that referenced this issue Apr 21, 2024
This merge pulls in changes that were made to fix the "App Not
Responding" issues that we have been seeing in production. Apart from
updating to the latest release of the media3 library, the changes
concern two ways in which the new `PlaybackService` works.

**Retrieval of the MediaController**
Our ANR problem has been discussed in
[this](androidx/media#890) open issue. There
was a very helpful response regarding what might be done. In particular,
the suggestions were:

1. Using `Futures.addCallback()` instead of
`controllerFuture.addListener()`
2. Using `MoreExecutors.directExecutor()` instead of
`ContextCompat.getMainExecutor()`

Both of these were implemented in
403c480

**Stopping the `MediaSessionService` when the app is killed in the
background**
Another potential source of ANRs and crashes was described in [this
issue](androidx/media#805), which applies to
our app as well. I've also been able to reproduce a similar and perhaps
directly related issue on a phone running an older version of android
(see the video in #99).

To limit the scope of the changes in this pull request, I've opted for
the simple solution of ending the `MediaSessionService` when the app is
closed in the background (see c980064).
With this change, the media notification no longer "hangs around" when
the app is stopped in the background, thereby preventing this kind of
issue.

The disadvantage of this approach is that the media notification is gone
and the user has to explicitly re-start the app and find the right talk
to continue listening. I've created the branch `playback_resumption` to
implement a more satisfactory solution (following the [official
documentation](https://developer.android.com/media/media3/session/background-playback#resumption)).
@intermarc
Copy link

With some delay, we have finally gotten around to implementing your suggestions, @icbaker. I'm happy to report that this has solved our problem: we are no longer seeing any related ANRs in production!

To summarize, the changes we implemented were the following:

  1. Use Futures.addCallback() to set the mediaController object
  2. Use MoreExecutors.directExecutor() instead of ContextCompat.getMainExecutor()

I didn't have the patience to phase in one change at a time, so we can't be 100% certain which of the two resolved the issue. Looking at the code, however, I agree with @bb4242 that a race condition related to the mediaController future seems unlikely. For what it is worth, I think that the 2nd change was the one that fixed the issue.

Thanks again for your support! Your clear and detailed answers were very helpful.

copybara-service bot pushed a commit that referenced this issue May 31, 2024
I originally tried switching to `Futures.addCallback` (as a follow-up
to Issue: #890), but it seemed like a good chance to go further into
Kotlin-ification.

Before this change, if the connection to the session failed, the app
would hang at the 'waiting' screen with nothing logged (and the music
keeps playing). This behaviour is maintained with the `try/catch` around
the `.await()` call (with additional logging). Without this, the failed
connection causes the `PlayerActivity` to crash and the music in the
background stops. The `try/catch` is used to flag to developers who
might be using this app as an example that connecting to the session
may fail, and they may want to handle that.

This change also switches `this.controller` to be `lateinit` instead of
nullable.

Issue: #890
PiperOrigin-RevId: 638948568
@icbaker
Copy link
Collaborator

icbaker commented May 31, 2024

Thanks for the update.

I spent a bit of time looking at the session demo app's usage of ListenableFuture. I prototyped a change that switched it to Futureas.addCallback, but actually decided to go further and switch it to use Guava's integration with Kotlin suspend functions and coroutines. You can see this in 1329821. I think this achieves the same result, by ensuring there's no way of accidentally reading a 'stale' field value from inside the callback. Hopefully that helps future developers integrating their Kotlin apps with MediaController.Builder.buildAsync().

I'm going to close this issue, as I think the question has been resolved.

@icbaker icbaker closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants