Upgrade from Amazon Linux 2 to 2022 (or 2023?) #1278

BryanQuigley · 2022-10-28T16:16:31Z

Detailed Description

This does not have any current urgency, but wanted to get these notes and context written down.

AWS has announced a new Amazon Linux release structure in Amazon Linux 2022. They also released an ECS version which is the variant of AL2 DB uses. The 2022 version are all in preview mode right now.

Context

It's the next version of Amazon Linux (and we use ECS optimized builds), so we are going to eventually want to move to it.
We can drop older cgroup support
General performance improvement including around memory and cgroup improvements
We can try items to help save memory using newer features - like zswap or zram (didn't work on AL2)

Alternatives

If they introduce a newer Fargate that is backed by this more modern OS - there may be a chance we can switch to Fargate instead. That is blocked on using the kernel option vm_max_map_count.

Possible Implementation

Changing to it is trivial, has been slightly tested: c780b2f

We don't necessarily need to wait until it's out of preview, but do want to a lot of performance testing/comparison to confirm it's better.

Make fresh deployment on staging,
Get fresh benchmark numbers / https://github.com/PublicMapping/districtbuilder/tree/develop/load-tests and memory usage
Make terraform change to 2022 image in test branch so it's on staging
Get 2022 image benchmark numbers and memory usage
Decide if worth pursing for production move. If no, document. If yes, continue
Make terraform change PR.

The text was updated successfully, but these errors were encountered:

KlaasH · 2023-03-09T21:44:02Z

Ok, an update on where this stands:

It's now called Amazon Linux 2023
It's still in preview. They released an RC0 some time in mid-late 2022 and they're now on RC3, released in February.
I don't see any indication that there's a concrete, or even not-concrete, timeline for finalizing it. Presumably they'll keep making RCs and fixing bugs until they feel it's ready for production.
The release notes and the FAQ say "Release Candidate is not recommended for production workloads."

I think it's in our interest to wait until they're fully done. The new OS might provide better performance or it might be about the same, but there's no reason to think it'll make such a significant difference that we want to get on board the moment it's available. Most of the work of this task will be load testing and evaluation, and doing that early, on a preview, is probably not a substitute for doing it on the actual final AMI. So it makes sense to do it once, when the thing is ready.

BryanQuigley · 2023-03-16T21:52:25Z

It just got released: https://aws.amazon.com/about-aws/whats-new/2023/03/amazon-linux-2023/

aaronxsu · 2023-03-27T19:57:31Z

Hey @KlaasH , I saw the above message from Bryan that the newer version is now released. Would you suggest us to make the move or would you like to have a review of the above doc next sprint before a decision?

KlaasH · 2023-03-28T14:20:31Z

I don't have a very clear sense of how much performance improvement we expect from upgrading and how important it is that we capture it. The zswap thing sounds promising, but also sounds like it could be a trade-off since I assume we'd have to reserve a chunk of memory for that, so the amount of normal memory available would be reduced. I don't know the implications of the cgroup changes.

I think the first big question for me is whether it makes sense to upgrade our EC2 instances or if we're hoping to be able to upgrade to Fargate and think it will be possible soon enough that it doesn't make sense to spend time on an intermediate upgrade. The issue description above mentions that we need to be able to increase vm.max_map_count. Here's an issue where a number of people mention needing that parameter, and the latest update, from February, is that they're making progress (at least on the broader issue of sysctl support in general, no specific word on whether that parameter will be available, though I would think it's on their list to at least try to add, since it sounds like there's a common use case that requires it).

So yeah, we definitely could make the move now, but whether we should depends on how much we expect to gain from it, how important those gains are to the functionality or stability of the app, and whether we're likely to keep it for a while or change again soon.

aaronxsu · 2023-03-28T17:13:20Z

Thanks @KlaasH I was skeptical about the zswap bit as well, and I agree with your reasoning. I think we might have an understanding of the performance implication through some potential load tests when we have capacity to do load testing again? Regarding new version of Fargate with sysctl support (which seems like will unblock us from ~~upgrading~~ switching due to the need for configuring vm_max_map_count), did you have any luck finding this on their roadmap? Some light search did not help me much...

Hey @BryanQuigley could you expand on why we may drop the code block for reading control group memory max when determining the docker memory limit after this upgrade please? I think I am missing some context here.

BryanQuigley · 2023-03-28T21:28:40Z

I don't understand how they plan to implement the sysctl kernel changes - but it sounds like it's a way off.

for reading control group memory max

The cgroup commit that can be changed is: 700d413

All our Linux/Mac machines are running using cgroupsv2 while our production/staging sites are using v1. There are numerous memory improvements with cgroupsv2 as well - https://docs.kernel.org/admin-guide/cgroup-v2.html#issues-with-v1-and-rationales-for-v2 that may match some of the issues with the agent being killed. Or in other words, I don't know if it's worth troubleshooting why it's happening on the old OS, when a new one is available.

KlaasH · 2023-08-14T13:37:44Z

Update (sort of) re Fargate:
The issue I was watching on the AWS "containers roadmap" repo about adding sysctl parameter control to fargate (aws/containers-roadmap#460) was resolved, but without adding the one we need (max_map_count).

There's another issue, aws/containers-roadmap#1452, that's specific to that one parameter. There hasn't been much activity on it, but until three days ago the last comment was "#460 should handle this issue." Now that that hasn't happened, maybe it will get some attention. Then again maybe not—max_map_count was mentioned in the discussion for #460, so it's possible they concluded it's not feasible to expose that one. But hopefully they just decided to prioritize other ones but will be continuing to work on it. We shall see.

aaronxsu · 2023-08-14T16:01:07Z

Thanks @KlaasH for continuing looking out for it. Please keep us posted!

KlaasH mentioned this issue Mar 6, 2023

More r6 instance types available. #1294

Closed

KlaasH self-assigned this Mar 9, 2023

aaronxsu added the operations label Mar 27, 2023

KlaasH removed their assignment Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade from Amazon Linux 2 to 2022 (or 2023?) #1278

Upgrade from Amazon Linux 2 to 2022 (or 2023?) #1278

BryanQuigley commented Oct 28, 2022 •

edited

KlaasH commented Mar 9, 2023

BryanQuigley commented Mar 16, 2023

aaronxsu commented Mar 27, 2023

KlaasH commented Mar 28, 2023

aaronxsu commented Mar 28, 2023 •

edited

BryanQuigley commented Mar 28, 2023

KlaasH commented Aug 14, 2023

aaronxsu commented Aug 14, 2023

Upgrade from Amazon Linux 2 to 2022 (or 2023?) #1278

Upgrade from Amazon Linux 2 to 2022 (or 2023?) #1278

Comments

BryanQuigley commented Oct 28, 2022 • edited

Detailed Description

Context

Alternatives

Possible Implementation

KlaasH commented Mar 9, 2023

BryanQuigley commented Mar 16, 2023

aaronxsu commented Mar 27, 2023

KlaasH commented Mar 28, 2023

aaronxsu commented Mar 28, 2023 • edited

BryanQuigley commented Mar 28, 2023

KlaasH commented Aug 14, 2023

aaronxsu commented Aug 14, 2023

BryanQuigley commented Oct 28, 2022 •

edited

aaronxsu commented Mar 28, 2023 •

edited