Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade test-ibm-ubuntu1804-x64-1 to Ubuntu 22.04 #3713

Closed
targos opened this issue May 10, 2024 · 21 comments · Fixed by #3722
Closed

Upgrade test-ibm-ubuntu1804-x64-1 to Ubuntu 22.04 #3713

targos opened this issue May 10, 2024 · 21 comments · Fixed by #3722
Assignees

Comments

@targos
Copy link
Member

targos commented May 10, 2024

https://ci.nodejs.org/computer/test%2Dibm%2Dubuntu1804%2Dx64%2D1/

It needs Node.js to run linter jobs.
ESLint v9 requires recent versions of Node.js but those cannot run on Ubuntu 18.04.

Blocks: nodejs/node#52780

@targos
Copy link
Member Author

targos commented May 10, 2024

We could also take the opportunity to introduce Ubuntu 24.04 in CI.

@richardlau
Copy link
Member

We could also take the opportunity to introduce Ubuntu 24.04 in CI.

Sounds reasonable if it's an option on the providers.

FWIW This machine is one of three jenkins-workspace machines, all of which are meant to be able to run the linter job. The other two are currently hosted on Equinix Metal and need to be migrated somewhere else (#3597).

@targos
Copy link
Member Author

targos commented May 11, 2024

It's not an option yet on IBM Cloud.

I put the machine offline and will wait until it finishes its current jobs.

Looking at https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#jenkins-workspace, it seems I need a Coverity login. Do I have to create an account and ask for access?

@richardlau
Copy link
Member

Yes. I've invited your email to the project as an owner.

@targos targos self-assigned this May 12, 2024
@targos
Copy link
Member Author

targos commented May 12, 2024

I did an "OS reload" on https://cloud.ibm.com/gen1/infrastructure/virtual-server/101642536/details and now I wonder if I made a mistake, because the machine doesn't have a large disk:

root@test-ibm-ubuntu2204-x64-3:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           390M  1.1M  389M   1% /run
/dev/xvda2       24G  1.6G   21G   7% /
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/xvda1      975M   55M  869M   6% /boot
tmpfs           390M  4.0K  390M   1% /run/user/0

@targos
Copy link
Member Author

targos commented May 12, 2024

Also:

TASK [jenkins-workspace : Link to repository directory from bintmp home] ***********************************************************************************************
changed: [test-ibm-ubuntu2204-x64-3]

TASK [jenkins-workspace : Initialize Git repository] *******************************************************************************************************************
[WARNING]: Module remote_tmp /home/binary_tmp/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To
avoid this, create the remote_tmp dir with the correct permissions manually
fatal: [test-ibm-ubuntu2204-x64-3]: FAILED! => {"changed": false, "cmd": "/usr/bin/git clone --bare https://github.com/nodejs/node /home/binary_tmp/binary_tmp.git", "msg": "Cloning into bare repository '/home/binary_tmp/binary_tmp.git'...\nfatal: Invalid path '/home/iojs/build': Permission denied", "rc": 128, "stderr": "Cloning into bare repository '/home/binary_tmp/binary_tmp.git'...\nfatal: Invalid path '/home/iojs/build': Permission denied\n", "stderr_lines": ["Cloning into bare repository '/home/binary_tmp/binary_tmp.git'...", "fatal: Invalid path '/home/iojs/build': Permission denied"], "stdout": "", "stdout_lines": []}

@targos
Copy link
Member Author

targos commented May 12, 2024

I get the same error if I execute the playbook on one of the equinix hosts.

@richardlau
Copy link
Member

I did an "OS reload" on https://cloud.ibm.com/gen1/infrastructure/virtual-server/101642536/details and now I wonder if I made a mistake, because the machine doesn't have a large disk:

root@test-ibm-ubuntu2204-x64-3:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           390M  1.1M  389M   1% /run
/dev/xvda2       24G  1.6G   21G   7% /
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/xvda1      975M   55M  869M   6% /boot
tmpfs           390M  4.0K  390M   1% /run/user/0

The storage tab in the web UI shows the 1000 GB disk is attached so I'd guess we need to mount it. It's probably /dev/xvdc1:

root@test-ibm-ubuntu2204-x64-3:~# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop1     7:1    0 40.4M  1 loop /snap/snapd/20671
loop2     7:2    0   87M  1 loop /snap/lxd/27037
loop3     7:3    0 38.7M  1 loop /snap/snapd/21465
loop4     7:4    0 63.9M  1 loop /snap/core20/2264
loop5     7:5    0   87M  1 loop /snap/lxd/28373
loop6     7:6    0 63.9M  1 loop /snap/core20/2318
xvda    202:0    0   25G  0 disk
├─xvda1 202:1    0    1G  0 part /boot
└─xvda2 202:2    0   24G  0 part /
xvdb    202:16   0    2G  0 disk
└─xvdb1 202:17   0    2G  0 part [SWAP]
xvdc    202:32   0 1000G  0 disk
└─xvdc1 202:33   0 1000G  0 part
xvdh    202:112  0   64M  0 disk
root@test-ibm-ubuntu2204-x64-3:~#

I'm not sure if it was supposed to be mounted as /home or /home/iojs.

I wasn't able to find any references to this machine in this repository, but the IBM hosted Docker host may have a similar /etc/fstab entry we could try: #2494 (comment)

The Ansible issue looks like a permissions problem, maybe some refactor to the Ansible scripts since the last time we set up jenkins-workspace machines (#3036) has broken it. IIRC the temp git repository in /home/binary_tmp/binary_tmp.git is owned by binary_tmp but there's a symlink to /home/iojs which is owned by iojs.

@targos
Copy link
Member Author

targos commented May 13, 2024

@richardlau Feel free to take this over if you have the time :)

@ryanaslett
Copy link
Contributor

The other two are currently hosted on Equinix Metal and need to be migrated somewhere else (#3597).

I was about to embark on setting up two new jenkins workspace hosts on mnx.io, so I'm very interested to know if there are changes that need to be made to make that possible (re:)

maybe some refactor to the Ansible scripts since the last time we set up jenkins-workspace machines

@richardlau
Copy link
Member

I have the disk mounted now by adding

/dev/xvdc1      /home/iojs   ext4    defaults        0       0

to /etc/fstab and rebooting the machine. (I first tried mounting at /home but it looks like the disk has the contents of a user (iojs)). I'll PR something to MANUAL_STEPS.md.

root@test-ibm-ubuntu2204-x64-3:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           390M  1.1M  389M   1% /run
/dev/xvda2       24G  5.3G   18G  24% /
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/xvda1      975M  104M  821M  12% /boot
/dev/xvdc1      984G  223G  711G  24% /home/iojs
tmpfs           390M  4.0K  390M   1% /run/user/0
root@test-ibm-ubuntu2204-x64-3:~#

I don't think we'll need to do this for the Equinix machines depending on how the disks were set up (on IBM Cloud the default disk tends to be 25G with additional storage as extra disks).

I'll rerun the Ansible script from #3715 and have a look at the Ansible failure.

@richardlau
Copy link
Member

Hmm. Ansible script succeeded, possibly because the binary_tmp git repo directories are already created.

PLAY RECAP ******************************************************************************************************************************************************************************************************
test-ibm-ubuntu2204-x64-3  : ok=58   changed=16   unreachable=0    failed=0    skipped=124  rescued=0    ignored=0

Since the machine is offline in Jenkins for now, I'll try removing the directories and see if Ansible can recreate them. (Maybe tomorrow.)

@targos
Copy link
Member Author

targos commented May 14, 2024

I seems that my attempt to run the Ansible script on test-equinix-ubuntu2204-x64-2 broke something: https://ci.nodejs.org/job/node-test-binary-windows-native-suites/22706/console

@richardlau
Copy link
Member

I seems that my attempt to run the Ansible script on test-equinix-ubuntu2204-x64-2 broke something: https://ci.nodejs.org/job/node-test-binary-windows-native-suites/22706/console

It's not obvious to me how that is broken.

@targos
Copy link
Member Author

targos commented May 14, 2024

I took a wrong example. It's not obvious to me what is broken, but CI runs are not always starting:
nodejs/node#52980
https://ci.nodejs.org/job/node-test-pull-request/59208/

@richardlau
Copy link
Member

That is weird. 😞

@richardlau
Copy link
Member

It's not obvious to me either. There are messages on the machine when you log in saying a restart is required, and no obvious errors in the logs on the machine. The Jenkins service is running, also seemingly without errors (but another warning about reloading the unit).

root@test-equinix-ubuntu2204-x64-2:~# systemctl status jenkins
Warning: The unit file, source configuration file or drop-ins of jenkins.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● jenkins.service - Jenkins agent
     Loaded: loaded (/lib/systemd/system/jenkins.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-05-12 06:55:41 UTC; 2 days ago
   Main PID: 3820860 (java)
      Tasks: 70 (limit: 38254)
     Memory: 20.2G
        CPU: 46min 22.656s
     CGroup: /system.slice/jenkins.service
             ├─3820860 /usr/bin/java -Xmx128m -jar /home/iojs/slave.jar -jnlpUrl https://ci.nodejs.org/computer/test-equinix-ubuntu2204-x64-2/jenkins-agent.jnlp -secret >
             ├─3916174 ssh-agent
             ├─3916208 git whatchanged --no-abbrev -M "--format=commit %H%ntree %T%nparent %P%nauthor %aN <%aE> %ai%ncommitter %cN <%cE> %ci%n%n%w(0,4,4)%B" -n 1024 9807ede6fb17afe36a2447df65eb6b63df8d1d37 ^7>
             └─3916241 git whatchanged --no-abbrev -M "--format=commit %H%ntree %T%nparent %P%nauthor %aN <%aE> %ai%ncommitter %cN <%cE> %ci%n%n%w(0,4,4)%B" -n 1024 7707d05bfa4a2d67fe2c2e37dee28e651672c7a3 ^7>

May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: May 12, 2024 6:55:42 AM hudson.remoting.Launcher$CuiListener status
May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: INFO: Server reports protocol JNLP4-connect-proxy not supported, skipping
May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: May 12, 2024 6:55:42 AM hudson.remoting.Launcher$CuiListener status
May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: INFO: Trying protocol: JNLP4-connect
May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: May 12, 2024 6:55:42 AM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run
May 12 06:55:42 test-equinix-ubuntu2204-x64-2 java[3820860]: INFO: Waiting for ProtocolStack to start.
May 12 06:55:43 test-equinix-ubuntu2204-x64-2 java[3820860]: May 12, 2024 6:55:43 AM hudson.remoting.Launcher$CuiListener status
May 12 06:55:43 test-equinix-ubuntu2204-x64-2 java[3820860]: INFO: Remote identity confirmed: 4f:22:06:c2:64:29:f3:3f:6d:7c:d7:01:f2:18:bb:93
May 12 06:55:43 test-equinix-ubuntu2204-x64-2 java[3820860]: May 12, 2024 6:55:43 AM hudson.remoting.Launcher$CuiListener status
May 12 06:55:43 test-equinix-ubuntu2204-x64-2 java[3820860]: INFO: Connected
lines 1-24/24 (END)

Maybe we could try restarting the agent and/or rebooting the machine?

@targos
Copy link
Member Author

targos commented May 14, 2024

It's rebooting.

@richardlau
Copy link
Member

richardlau commented May 14, 2024

I'll open a new issue, but the machine is not coming back up 😞. Looks to be a repeat of #3528 (comment).

Edit: Opened #3721

@richardlau
Copy link
Member

Getting back to the Ansible issue, I think one of the issues is that

- name: Link to repository directory from bintmp home
file:
src: "{{ home }}/{{ server_user }}/build/binary_tmp.git"
dest: "~binary_tmp/binary_tmp.git"
state: link
owner: "binary_tmp"
group: "binary_tmp"
mode: 0755
is setting the owner/group for the target of the link and not the link itself. Apparently setting follow to false will make the ownership/group affect the link.

I also think I can automate the download and installation of the coverity tool. Will test and push changes to #3715.

@richardlau
Copy link
Member

I also think I can automate the download and installation of the coverity tool. Will test and push changes to #3715.

Couldn't push to the branch for that PR, so pushed to new PR #3722.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants