Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure for Orka (2024 and beyond) #3686

Open
8 of 12 tasks
UlisesGascon opened this issue Apr 19, 2024 · 8 comments
Open
8 of 12 tasks

Infrastructure for Orka (2024 and beyond) #3686

UlisesGascon opened this issue Apr 19, 2024 · 8 comments

Comments

@UlisesGascon
Copy link
Member

UlisesGascon commented Apr 19, 2024

I plan to work on it during the weekend, so I can provide a good overview on the next build meeting on Tuesday.

Current tasks on MacOS infra

Blocked until ARM nodes are provided

  • Confirm org decision regarding new ARM nodes (discussion ongoing in the mailing list)
  • Add new VMs for MacOS 13 ARM
  • Add new VMs for MacOS 11 ARM
@UlisesGascon
Copy link
Member Author

Current Orka state

updated on April 19, 2024

SSH port Node: macpro-4 Node: macpro-5 Node: macpro-6
8822 release-macos11-x64-1 empty test-macos11-x64-1
8823 empty empty test-macos11-x64-2
8824 empty test-macos1015-x64-2 test-macos1015-x64-1
8825 empty empty empty

@UlisesGascon
Copy link
Member Author

UlisesGascon commented Apr 19, 2024

Next Orka state

updated on April 22, 2024

Intel Nodes

SSH port Node: macpro-4 Node: macpro-5 Node: macpro-6
8822 release-macos11-x64-1 test-macos13-x64-2 test-macos11-x64-1
8823 test-macos13-x64-1 release-macos13-x64-1 test-macos11-x64-2
8824 empty test-macos1015-x64-2 test-macos1015-x64-1
8825 empty empty empty

ARM Nodes

We assume that ARM Nodes can handle only 2 VMs and not +4 as Intel in the past due license limitations. This needs to be confirmed with support AFAIK?

SSH port Node: arm-1 Node: arm-2 Node: arm-3
8822 test-macos11-arm64-1 release-macos13-arm64-1 empty
8823 release-macos11-arm64-1 test-macos13-arm64-1 test-macos13-arm64-2

How Nearform machines are "relocated"?

  • release-nearform-macos11.0-arm64-1 -> release-orka-macos11-arm64-1
  • test-nearform-macos11.0-arm64-1 -> test-orka-macos11-arm64-1

@targos
Copy link
Member

targos commented Apr 22, 2024

release-macos13-x64-2
release-macos13-arm64-2

I don't think it's necessary to have two identical release machines.

@targos
Copy link
Member

targos commented Apr 22, 2024

test-nearform-macos11.0-arm64-1

Are these typos?

@UlisesGascon
Copy link
Member Author

UlisesGascon commented Apr 22, 2024

Great feedback @targos! I updated the tables

I don't think it's necessary to have two identical release machines.

We have space for redundancy, but let's remove them for now.

Are these typos?

I made a better reference for the "relocated" machines

@targos targos pinned this issue May 2, 2024
@targos
Copy link
Member

targos commented May 2, 2024

release-macos13-x64-2
release-macos13-arm64-2

I don't think it's necessary to have two identical release machines.

Actually, I think we should have one x64 and two arm64 machines, because there are two jobs that run on macos-arm64 during a release (osx11-release-pkg and osx11-arm64-release-tar).

@ryanaslett
Copy link
Contributor

Some questions/thoughts/suggestions:

  1. Requirements Question: Do we still need to support 10.15 and/or 11? from (https://github.com/nodejs/node/blob/main/BUILDING.md#supported-platforms) I see:

Node.js does not support a platform version if a vendor has expired support for it. In other words, Node.js does not support running on End-of-Life (EoL) platforms. This is true regardless of entries in the table below.

And the table lists MacOS 11>.

And that table may be outdated as it seems as though MacOS 11 was EOL as of November 2023 ?

  1. ARM support in Orka:

We assume that ARM Nodes can handle only 2 VMs and not +4 as Intel in the past due license limitations. This needs to be confirmed with support AFAIK?

https://orkadocs.macstadium.com/docs/apple-arm-based-support confirms this:

IMPORTANT

You can deploy up to 2 VMs per Apple silicon-based node.

  1. From what I can gather macOS infra seems to be brittle, with nodes often running into disk issues/maintenance issues.

#3592
#3685
(https://github.com/nodejs/build/issues?q=is%3Aissue+macos+is%3Aclosed+disk) etc.

My suggestion to avoid Jenkins worker decay is to lean into an ephemeral node strategy so that each build has a fresh Orka instance to run on.

We can do that with the following Jenkins plugin for Orka:
https://plugins.jenkins.io/macstadium-orka/#plugin-content-ephemeral-agents

We would first need to set up a packer build process to create our VM images so that Orka would have a baseline image to create:
https://orkadocs.macstadium.com/docs/packer

The packer process can leverage our existing ansible playbooks:
https://developer.hashicorp.com/packer/integrations/hashicorp/ansible/latest/components/provisioner/ansible.

This strategy would require that we have an Orka3.0 cluster. Rather than trying to do an upgrade of the existing cluster, I propose that we ask macstadium to allow us to provision a new cluster with the resources we need in it (enough arm/intel backing nodes for our macos11/13 testing and release), get it built/provisioned and working, and then decommission/return all the existing macstadium/orka machines.

I believe this would end up with us using roughly the same amount of resources, so should be palatable for macstadium to support this transition.

@mhdawson
Copy link
Member

This strategy would require that we have an Orka3.0 cluster. Rather than trying to do an upgrade of the existing cluster, I propose that we ask macstadium to allow us to provision a new cluster with the resources we need in it (enough arm/intel backing nodes for our macos11/13 testing and release), get it built/provisioned and working, and then decommission/return all the existing macstadium/orka machines.

+1 from me if Macstadium will support that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants