Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix frequent MiniPlayerFragment ANRs seen in production #96

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

bb4242
Copy link
Collaborator

@bb4242 bb4242 commented Dec 17, 2023

For background on the issue, and why this change may help, see androidx/media#890. I had hoped that upgrading the Media3 library version back in #92 would completely solve the problems we had been seeing in production. While it did help in some ways, by eliminating ForegroundServiceDidNotStartInTimeExceptions (see androidx/media#167), it also seems to have morphed a lot of problems into a new form, namely App Not Responding (ANR) problems that arise in MiniPlayerFragment.

The basic idea here is to reduce the rate at which we create and release MediaControllers in MiniPlayerFragment, by moving the creation and releasing to fragment lifecycle callbacks that are invoked less frequently than onResume() and onPause(). My hypothesis is that rapid creation and releasing of MediaControllers results in some kind of race condition that is very hard to reproduce locally but will occur in production with enough use of the app.

It's not completely clear that this PR will definitely solve the problems we're seeing in production, but it seems like a worthwhile experiment to try. Other than potentially decreasing the rate of ANRs, it should not have any other noticeable effects on the behavior of the app.

For background on the issue, and why this change may help, see
androidx/media#890

The basic idea here is to reduce the rate at which we create and release
MediaControllers in MiniPlayerFragment, by moving the creation and
releasing to fragment lifecycle callbacks that are invoked less
frequently than `onResume()` and `onPause()`. It's not completely clear
that this will definitely solve the problems we're seeing in production,
but it seems like a worthwhile experiment.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff looks larger than it really is; I mainly just renamed onResume to onViewCreated and onPause to onDestroyView. But I also moved onViewCreated below onCreateView since this reflects the order in which these lifecycle methods are called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, that's really helpful.

Copy link
Contributor

@intermarc intermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity that #88 did not solve these issues completely, as we had hoped.

I'm not familiar with the life cycle of the media session, but your hypothesis concerning race-conditions is plausible. Since we can't reproduce this issue (I myself have not had any ANRs), it's certainly worth a try!

There is a comment in the original bug report that states that these issues do not actually cause crashes, so perhaps this not such a big deal for the users in practice. It would still be great to find a solution.

@bb4242
Copy link
Collaborator Author

bb4242 commented Dec 19, 2023

Yeah, it's been kind of an adventure tracking down and solving these issues! It's been rather slow going, since like you, I can't seem to reproduce these issues locally. This appears to force us into trying out possible fixes in production, which is a very slow feedback cycle (it usually takes 3 or 4 days for initial crash and ANR data to start coming in after the rollout of a new release, and sometimes many more days after that to get a sense of whether the rates are changing in a meaningful way).

That said, I do think we're making some progress. Prior to the rollout of #92, we were seeing a lot of ForegroundServiceDidNotStartInTimeException crashes. Since #92, these crashes appear to be completely resolved. Now, we seem to get a similar number of these new MiniPlayerFragment ANRs. Arguably ANRs are better than crashes, at least! I do still have some concerns about the ANRs, both because they degrade the user experience and we're currently experiencing a high enough ANR rate that causes Google to de-prioritize discoverability of the app in the Play store (how big of an effect this is, I'm not sure; I can certainly still find the app if I search for it). In any case, I'll go ahead and release this attempted fix now, and we can see what effect, if any, it has! 🙂

@bb4242 bb4242 merged commit eccad4c into master Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants