Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: windows runner - enable fast launch option #466

Closed
pharindoko opened this issue Nov 15, 2023 · 21 comments · Fixed by #521
Closed

feat: windows runner - enable fast launch option #466

pharindoko opened this issue Nov 15, 2023 · 21 comments · Fixed by #521

Comments

@pharindoko
Copy link
Contributor

Currently it takes up to 4 minutes to get a windows runner up and running and registered in GH.
AWS announced this last year...
https://aws.amazon.com/blogs/modernizing-with-aws/launch-windows-faster-on-ec2/

This can be configured in distribution settings in image builder
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-ami-config-fast-launch.html#win-fast-launch-key-terms

Important is that if no fast launch snapshots are available the standard launch will be done...

When you experience a higher number of launches than anticipated, you might use up all the pre-provisioned snapshots that you have available. This doesn't cause any launches to fail. However, it can result in some instances going through the standard launch process, until snapshots can be replenished.

Could we implement this stuff @kichik ?
I could create a first draft PR ...
What`s your opinion on this ?

@pharindoko pharindoko changed the title Windows runner: Fast launch option feat: windows runner - enable fast launch option Nov 15, 2023
@kichik
Copy link
Member

kichik commented Nov 15, 2023

Yeah, of course. Do you have a sense of how much time it would save? Is it going to just be a flag we need to turn on or are there parameters to tune?

@pharindoko
Copy link
Contributor Author

pharindoko commented Nov 15, 2023

Currently it takes 4-5 Minutes to start the Windows Runner.
Aws claims that the instance should be up and running after 1-2 Minute(s)...

Yes the scaling has to be set to a fixed value... kinda strange but it should be ok. Shared the link to the properties in the previous post.

The user needs to edit multiple properties - a flag is not enough I would say ...

@kichik
Copy link
Member

kichik commented Nov 16, 2023

The user needs to edit multiple properties - a flag is not enough I would say ...

So what do you think it's going to look like? Flag to enable + that "how many snapshots" number? Could it matter if this runs on the image builder or runner VPC?

If it saves time, I will be happy to merge a PR for it.

@pharindoko
Copy link
Contributor Author

pharindoko commented Nov 16, 2023

Guess we need all the properties to make it manageable...
I would start with default settings cdk provides ...
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_imagebuilder.CfnDistributionConfiguration.FastLaunchConfigurationProperty.html

if you want to do the PR - go for it :)

@pharindoko
Copy link
Contributor Author

pharindoko commented Jan 26, 2024

btw I try to implement this but this doesn`t work as expected :-/ I`ll check it again the next few days. Let`s see.

@RichiCoder1
Copy link

btw I try to implement this but this doesnt work as expected :-/ Ill check it again the next few days. Let`s see.

What issues did you end up running into? I remember when I last tried I ran into some weird launch config/iam permission issues.

@pharindoko
Copy link
Contributor Author

pharindoko commented Feb 7, 2024

Ok, I changed the builder.ts (src/image-builders-aws-image-builder/builder.ts) file

Just in a simple manner so I can start to test the fastlaunch for windows.

    const dist = new imagebuilder.CfnDistributionConfiguration(this, 'AMI Distribution', {
      name: uniqueImageBuilderName(this),
      // description: this.description,
      distributions: [
        {
          region: Stack.of(this).region,
          amiDistributionConfiguration: {
            Name: `${cdk.Names.uniqueResourceName(this, {
              maxLength: 100,
              separator: '-',
              allowedSpecialCharacters: '_-',
            })}-{{ imagebuilder:buildDate }}`,
            AmiTags: {
              'Name': this.node.id,
              'GitHubRunners:Stack': stackName,
              'GitHubRunners:Builder': builderName,
            },
          },
          launchTemplateConfigurations: [
            {
              launchTemplateId: launchTemplate.launchTemplateId,
            },
          ],
// my addition to enable fastLaunch
          fastLaunchConfigurations: [{
            accountId: Stack.of(this).account,
            enabled: this.os === Os.WINDOWS ? true : false,
            maxParallelLaunches: 6,
            snapshotConfiguration: {
              targetResourceCount: 5,
            },
          }],
// my addition to enable fastLaunch
        },
      ],
    });

and received following error:

'EC2 Client Error: 'Can't enable EC2 Fast Launch. Run instances dry run failed for enabling EC2 Fast Launch. No default
 VPC for this user. GroupName is only supported for EC2-Classic and default VPC. To enable EC2 Fast Launch, you must  
 use a launch template that specifies a subnet ID or network interface ID.''''

Means the launchTemplate attached has no subnet / network setting which is true...
but the L2 Construct ec2.LaunchTemplate doesn`t support setting the subnet (aws/aws-cdk#14494)

L2 Construct

    const launchTemplate = new ec2.LaunchTemplate(this, 'Launch template', {
      requireImdsv2: true,

    });

And using the L1 Construct CfnLaunchTemplate you can add networkinterfaces and the subnet but it requests an ImageId, otherwise an error is thrown that it can`t find the ami.
At the moment of the distribution generation I do not have a ImageID available. (chicken-egg-problem)

it`s attached when the image is generated

  protected createImage(infra: imagebuilder.CfnInfrastructureConfiguration, dist: imagebuilder.CfnDistributionConfiguration, log: logs.LogGroup,
    imageRecipeArn?: string, containerRecipeArn?: string): imagebuilder.CfnImage {
    const image = new imagebuilder.CfnImage(this, this.amiOrContainerId('Image', imageRecipeArn, containerRecipeArn), {
      infrastructureConfigurationArn: infra.attrArn,
      distributionConfigurationArn: dist.attrArn,
      imageRecipeArn,
      containerRecipeArn,
      imageTestsConfiguration: {
        imageTestsEnabled: false,
      },
    });
    image.node.addDependency(infra);
    image.node.addDependency(log);

    return image;
  }

any idea how to circumvent this ?

@kichik
Copy link
Member

kichik commented Feb 7, 2024

I had a similar issue when trying to figure out #394. The only solution I could think of is some Lambda to create a new launch template with the right parameters every time EC2 Image Builder updates its launch template (which can be used by multiple runners). I don't love this solution. It will be hard to get right.

@pharindoko
Copy link
Contributor Author

@kichik wow ok I was able to create my first windows server 2022 ec2 with fast launch using some custom nasty hacks in builder.ts and ec2.ts

The result is nice -> runner came up after 1 1/2 minutes instead of 4 1/2 minutes.

Thanks to the nice interfaces you created I guess I can use
IRunnerProvider

export interface IRunnerProvider extends ec2.IConnectable, iam.IGrantable, IConstruct {

and IRunnerImageBuilder

export interface IRunnerImageBuilder {

to create a special windows runner fast launch provider.
I will try and report the progress here.
If feasable I`ll create a draft PR for this kind of special provider.

@RichiCoder1
Copy link

If feasable I`ll create a draft PR for this kind of special provider.

That's awesome! I'll be happy to test and review, the start times are def a lot

@kichik
Copy link
Member

kichik commented Feb 10, 2024

The result is nice -> runner came up after 1 1/2 minutes instead of 4 1/2 minutes.

That's an impressive improvement.

I feel like we will have to end up with launch template per provider anyway for EC2 Fleet support. Might as well bite the bullet and do it for this one too. Maybe I can make it implicit enough with another aspect.

@kichik
Copy link
Member

kichik commented Feb 13, 2024

@pharindoko with proper dependency settings, this might give you a fully configured launch template: a6a7c7c (still in branch)

@pharindoko
Copy link
Contributor Author

@kichik
Thanks for the update.

My biggest issue is that the L2 construct does not support vpcsubnets and I'm in a non-default vpc.

So basically I need to create the L1 construct for 'CfnTLaunchTemplate'
and import this into the L2 construct.

And if I do this I need to provide the ImageId, Security group, SubnetId and the InstanceType in advance ...

So I created the CfnDistributionConfiguration between createImage and createPipeline in builder.ts
and forwarded the CfnDistributionConfiguration
only to the createPipelien function.

Couldn't provide a draft PR as I'm ill :-/

@kichik
Copy link
Member

kichik commented Feb 13, 2024

Oh yeah I was going to use overrides for that.

Couldn't provide a draft PR as I'm ill :-/

Get well soon!

@pharindoko
Copy link
Contributor Author

Oh yeah I was going to use overrides for that.

Couldn't provide a draft PR as I'm ill :-/

Get well soon!

Overrides could be an option. I tried to use overrides but ended up with circular references in the cloudformation stack.

@pharindoko
Copy link
Contributor Author

@kichik @RichiCoder1
Ok made a sample repo that shows how a builder (lib/fast-launch-ec2.builder.ts) and provider (lib/fast-launch-ec2.provider.ts) could look like to activate the windows fast launch option for windows runners in a non-default-vpc (in all vpcs)

https://github.com/pharindoko/cdk-runner-win-test

Things to note:

  1. Create the distribution after the image generation
    https://github.com/pharindoko/cdk-runner-win-test/blob/9ebb7d2b11369fc30231757a92bcbfb1a60d3546/lib/fast-launch-ec2.builder.ts#L273

  2. L1 Construct for launch-template required for non-default-vpc usecases:
    This requires properties: instancetype, subnetid, imageId and the security group (important to have outbound connection)
    https://github.com/pharindoko/cdk-runner-win-test/blob/9ebb7d2b11369fc30231757a92bcbfb1a60d3546/lib/fast-launch-ec2.builder.ts#L255

  3. Remove properties from instance creation call to avoid conflicts with launchtemplate:
    removed properties: instancetype, subnetid, securitygroups
    https://github.com/pharindoko/cdk-runner-win-test/blob/9ebb7d2b11369fc30231757a92bcbfb1a60d3546/lib/fast-launch-ec2.provider.ts#L267

Questions:

  1. @kichik Can we integrate such a special custom provider for windows fast launch to your default providers list ?
  2. Could we inherit this provider from the ec2 provider ? (some functions and variables need to be protected instead of private I guess)
    -> we could keep this special provider outside of your solution and I don`t need to include the index.js - asset files for the reaper and delete-ami function in the solution itself.

@pharindoko
Copy link
Contributor Author

pharindoko commented Feb 26, 2024

here`s another idea - by extending the ec2 provider and image builder I would be able to overwrite / extend this provider /builder specifically for the fast launch case with less code... #514

@pharindoko
Copy link
Contributor Author

@kichik please can you have a look to the solutions I proposed
#466 (comment)

#514

@kichik
Copy link
Member

kichik commented Mar 8, 2024

Sorry for the late response. Been meaning to find a time to dive deeper into this.

It seems very doable. I would try to make it an option of the existing builder. I think my idea of generating launch templates per provider from #518 should make this possible without a lot of code duplication. We can add the fast launch configuration to the builder the same way we add the launch template.

I am worried about this part though:

Create the distribution after the image generation

As coded right now, I don't know if the provider will function properly after initial deployment. It might have to wait for the schedule to be a new AMI until it works. We may have to find a workaround for this. Maybe the launch template can begin with a dummy Windows AMI.

Can we integrate such a special custom provider for windows fast launch to your default providers list ?

I would prefer for this to be an option of the provider. It would then in turn request a special launch template from the builder that takes care of it. I can also see a possibility of this ending up being a provider option. Either way, I would much rather update the current provider/builder for this rather than creating a whole new one.

Could we inherit this provider from the ec2 provider ? (some functions and variables need to be protected instead of private I guess)

I would prefer to keep the private fields private. If I turn them to protected, that means I have to maintain backwards compatibility in future updates. They were not designed with that goal in mind.

...we could keep this special provider outside of your solution

I think this can be included in the library itself. No reason to keep it separate. It's useful enough for everybody and shouldn't require a ton of changes.

For now, to get you going, are you able to use your repo as-is? If not, would copying a few more files in do it?

@kichik
Copy link
Member

kichik commented Mar 9, 2024

Ok this was a little simpler than I thought it would be. Took a while to get just right, but doesn't require as many changes as I feared. Can you please review #521 and maybe even try it out?

@pharindoko
Copy link
Contributor Author

Ok this was a little simpler than I thought it would be. Took a while to get just right, but doesn't require as many changes as I feared. Can you please review #521 and maybe even try it out?

Thanks @kichik. Looks good - will try it out with a non-default vpc.

@mergify mergify bot closed this as completed in #521 Mar 27, 2024
mergify bot pushed a commit that referenced this issue Mar 27, 2024
Add an option for enabling [fast launch](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-ami-config-fast-launch.html) for Windows AMIs. This should speed up booting up of EC2 Windows runners.

Note that it does come with [extra cost](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-fast-launch-manage-costs.html).

```typescript
const ec2WindowsImageBuilder = Ec2RunnerProvider.imageBuilder(stack, 'Windows EC2 Builder', {
  os: Os.WINDOWS,
  vpc,
  awsImageBuilderOptions: {
    fastLaunchOptions: {
      enabled: true,
    },
  },
});
const runners = new GitHubRunners(stack, 'runners', {
  providers: [
    new Ec2RunnerProvider(stack, 'Fast Windows', {
      labels: ['windows'],
      imageBuilder: ec2WindowsImageBuilder,
    }),
  ],
}
```

Before:
<img width="69" alt="image" src="https://github.com/CloudSnorkel/cdk-github-runners/assets/1156773/f92a363e-9892-430a-9511-f70800909fe0">

After:
<img width="74" alt="image" src="https://github.com/CloudSnorkel/cdk-github-runners/assets/1156773/d7da163c-4f82-4cbb-9b48-bfb1e8da3850">

Resolves #466
Closes #514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants