Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of site connection issues during outage. #2918

Open
Senjan21 opened this issue Feb 7, 2024 · 1 comment
Open

Handling of site connection issues during outage. #2918

Senjan21 opened this issue Feb 7, 2024 · 1 comment
Assignees
Labels
a: moderation Related to community moderation functionality: (moderation, defcon, verification) l: 1 - intermediate p: 2 - normal Normal Priority python Pull requests that update Python code status: approved The issue has received a core developer's approval t: bug Something isn't working

Comments

@Senjan21
Copy link
Contributor

Senjan21 commented Feb 7, 2024

Currently, under somewhat specific circumstances the bot might result in few Cogs/Extensions not being loaded if site will drop the connection or otherwise will fail to return relevant information to the bot. Most notably the bot will not load filters cog, superstarify, reminders and python_news.

This issue happens with no indication of a failure to moderators as such bot might end up in a state where filters will be disabled and mods will be none the wiser.

The bot should:

  • Alert moderators that such a thing occurred (filters weren't loaded)
  • Potentially try to gracefully handle the issue and try to reconnect / retry later.
@Senjan21 Senjan21 added t: bug Something isn't working a: moderation Related to community moderation functionality: (moderation, defcon, verification) p: 2 - normal Normal Priority l: 1 - intermediate up for grabs Available for anyone to work on python Pull requests that update Python code labels Feb 7, 2024
@ChrisLovering
Copy link
Member

The cogs won't have been loaded because there was an unhandled error in the async cog_load function.

We should update cogs that rely on site being availalble during cog load to handle errors like this, and retry a few times, with some backoff logic, to be resiliant to temporary site outages.

If the bot still can't load the cogs after a number of attempts, we should then raise an error so that it appears in sentry and can be actioned.

@ChrisLovering ChrisLovering added the status: approved The issue has received a core developer's approval label Feb 7, 2024
@jb3 jb3 self-assigned this Mar 30, 2024
@wookie184 wookie184 removed the up for grabs Available for anyone to work on label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a: moderation Related to community moderation functionality: (moderation, defcon, verification) l: 1 - intermediate p: 2 - normal Normal Priority python Pull requests that update Python code status: approved The issue has received a core developer's approval t: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants