Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Set up SNS notifications for critical alarms #109

Open
ojongerius opened this issue May 3, 2018 · 7 comments
Open

Set up SNS notifications for critical alarms #109

ojongerius opened this issue May 3, 2018 · 7 comments

Comments

@ojongerius
Copy link
Contributor

ojongerius commented May 3, 2018

Definition of done: critical alerts create a phone call to team members.

This could be possible by having critical alarms firing of separate SNS topics that have a Twilio webhook as subscriber.

I've seen people create Lambas to connect to Twilio when they fire, but that kind of defeats the purpose, we want to know when Lambdas are on 🔥

Warning: this will be less sophisticated than services like Pagerduty, VictorOps etc, having a schedule, and escalations is well out of scope for this issue.

/cc @freeCodeCamp/open-api did I miss anything, and concerns? Is this a blocker for our first release?

@QuincyLarson
Copy link

@ojongerius I just set up UptimeRobot which has SMS notifications without the need for Twilio. It polls all our services once a minute and if any of them are down, it will email us and also it can send an SMS notification. It's easy to configure and I've already set it up for me and Stuart to get texts.

Here's our new status page: https://status.freecodecamp.org

What do you think of this service? Do you think it can be a replacement for PagerDuty, etc.? Will there still be significant benefit to configuring Cloudwatch and Twilio?

@ojongerius
Copy link
Contributor Author

@QuincyLarson I can think of scenarios where your casual polling will succeed, but service is impaired for other type of requests.
Having said that I've caught many issues with simple scheduled end to end tests, that would have gone under the radar of specific monitors on metrics and unit tests.

I would not see it as a replacement, but a great addition 💯

re: https://status.freecodecamp.org is down for me at the moment?

▶ wget https://status.freecodecamp.org/
--2018-05-11 11:20:04--  https://status.freecodecamp.org/
Resolving status.freecodecamp.org (status.freecodecamp.org)... 69.162.67.140
Connecting to status.freecodecamp.org (status.freecodecamp.org)|69.162.67.140|:443... failed: Operation timed out.
Retrying.

--2018-05-11 11:21:21--  (try: 2)  https://status.freecodecamp.org/
Connecting to status.freecodecamp.org (status.freecodecamp.org)|69.162.67.140|:443...

@QuincyLarson
Copy link

@ojongerius Yes - I agree that there are plenty of corner cases that justify us having a more robust solution.

Not sure why you weren't able to hit the status page, but it's up now:

FreeCodeCamp➜~» wget https://status.freecodecamp.org/                                                                                           [17:46:26]
--2018-05-12 17:46:30--  https://status.freecodecamp.org/
Resolving status.freecodecamp.org... 69.162.67.141
Connecting to status.freecodecamp.org|69.162.67.141|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13053 (13K) [text/html]
Saving to: 'index.html'

index.html                              100%[===============================================================================>]  12.75K  --.-KB/s   in 0.04s  

2018-05-12 17:46:31 (320 KB/s) - 'index.html' saved [13053/13053]

@ojongerius
Copy link
Contributor Author

Just noticed that SNS has supported SMS via SNS since 2016 ..

@ojongerius ojongerius changed the title Connect SNS to Twilio for critical notifications Set up SNS notifications for critical alarms May 17, 2018
@QuincyLarson
Copy link

@ojongerius Awesome - so it doesn't require Twilio integration? We could use it for messaging when we have outages?

@ojongerius
Copy link
Contributor Author

That's right. Unless AWS is down... So there still is a strong use case for external monitoring that includes alerting.

@QuincyLarson
Copy link

@ojongerius Yes - but if AWS goes down there isn't a lot we can do anyway. It's gone down what - 4 or 5 times in 10 years?

@raisedadead raisedadead added this to To do in Open API via automation Dec 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Open API
  
To do
Development

No branches or pull requests

2 participants