Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-deploy Smoke Test #1331

Open
sterlinghirsh opened this issue Feb 3, 2023 · 0 comments
Open

Post-deploy Smoke Test #1331

sterlinghirsh opened this issue Feb 3, 2023 · 0 comments
Assignees
Labels
0 maintenance Request to update or maintain an existing feature needs_more_spec

Comments

@sterlinghirsh
Copy link
Member

We want to be able to set up a list of pages that get smoke tested immediately post-deploy on the live site. The simplest version would be like "load the page and verify it has a 200 http response" for /Parts, /Tools, /Parts/iPhone /products/mako-etc. I'm not sure if we have precedent for tests that run post-merge/deploy on main or tests that run in production, but it seems like it could save us a lot of headache or let us know right away when some page gets broken.

Next level would be if we can check for no js errors on those pages or do some assertions about the content, but we probably won't want to run a full test suite. Another potential test opportunity would be to use our prod preview vercel deployments to run a subset of tests to validate our new code against production data. Let me know what you think and if this is something your team could work on.

I spoke with @mlahargou and he raised some concerns, mostly that we should improve our tests to try to catch these problems before deploying. I generally agree, and I'm not proposing this post-deploy testing instead of pre-merge testing, but in addition. I also agree we should have more alerting on various types of errors, auto rollback, etc. We do have datadog and sentry set up, so I think it's mostly a matter of setting up the proper alerting.

But it's come up a few times that even after testing in dev, we still end up hitting errors in prod because of some dev/prod mismatch. Another goal of mine is to increase parity between dev and prod to avoid this class of bugs, but I don't think it's possible to get 100% there.

The downtime we had last week was caused by a lack of autodeploy on Strapi, which allowed us to merge a change to Strapi weeks ago without deploying it. When we deployed the frontend change that depended on the Strapi change, it broke in production. Because our dev previews and govinor always use the latest Strapi, CI and manual QA passed.

We've also hit bugs (rarely) resulting from the Cloudfront configuration which affects prod but not our preview branches. This type of thing is pretty tricky to match in dev and may not generate Sentry reports even in prod, so that's why I'm proposing something to explicitly check it in prod right away. It may be the case that after we get Strapi auto-deploy, auto-rollback for frontend, and better alerting, we might not need production smoke tests as much, but I still think it helps cover our bases since it's often the things we didn't anticipate that cause downtime. Besides, nothing beats just going to the live site to make sure it works.

@sterlinghirsh sterlinghirsh added 0 maintenance Request to update or maintain an existing feature needs_more_spec labels Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 maintenance Request to update or maintain an existing feature needs_more_spec
Projects
None yet
Development

No branches or pull requests

2 participants