-
Notifications
You must be signed in to change notification settings - Fork 887
stage1: upgrade to systemd-v241 from Flatcar 2219.99.0 #4002
base: master
Are you sure you want to change the base?
Conversation
@iaguis PTAL. |
Thanks for this! Now we need to fix the CI. As @alban says in #3790 there was a change in system that made rkt not work when systemd<205 is used on the host. There were two suggestions: updating the SemaphoreCI platform to ubuntu 16.04 which has a newer systemd and skip registration with machined (more details here) when we detect systemd on the host has a version <205. Initially, I thought we could do both. However it seems the function ( At this point I think we can ignore legacy systems so I just updated the Semaphore platform and it now complains about some package not being available, please fix that and we'll continue going through issues the CI finds until we fix them all. |
Now that I look at the commit that added this In any case, this is something we can do later so don't worry about it for now. |
99e3a18
to
1cac248
Compare
b097878
to
77c224b
Compare
Currently both jobs on semaphoreCI are failing, status is as follows: Job 1:
Job 2:
It seems related to dependency issues (see #3698), however the aim is to fix it for the CI so I'm unsure how we can port that solution in this case, thoughts? @iaguis |
They are two different failures, you often need to go further up to see the actual error. Job 1 (KVM)I assume this KVM build failure happens now because we're using the "Docker Light" Semaphore platform which misses some dependencies needed to build qemu. I suggest getting SSH access to the SemaphoreCI machine. Once there you can build with Job 2Here it seems pretty much all the tests fail with a message like this:
This means systemd-nspawn (through rkt) started the container specifying a property that systemd on the host doesn't understand ( As mentioned in #4002 (comment), So an easy solution for this error is installing machined, which is part of the |
@iaguis Thanks for the feedback! I have now solved issues related to the dependencies ( Issues related to TestConfig have also been solved, the problem was that the Job 1 seems to be stuck at TestAppIsolatorMemory consistently despite of rebuilding (there was a possibility that maybe we were getting a slower VM on Semaphore), whereas Job 2 just fails the test, could there possibly be something that makes it timeout for the KVM flavor, but not for the coreOS? Is it safe to assume that I should just keep on trying to make the rest of the tests pass? Mostly referring to Job 2, which as of now are:
|
Cool! Hopefully we're getting close. For the KVM flavor, let's worry about it after we fix all the issues in the CoreOS flavor. Yes, we should fix those Job 2 tests pass. I'll have a look at them tomorrow to provide some guidance. |
It seems systemd added an extra directory level in the cgroupv1 hierarchy (systemd/systemd@720f0a2) whereas before they only had that extra directory in the cgroupv2 hierarchy. This breaks rkt's cgroup settings because we assume that the pod will be in Adding the About With that we still need to fix:
I haven't had time to look at those. |
19cd83c
to
0a77e96
Compare
For the record, instead of committing the commented volume mount tests I am now awaiting on Flatcar's Edge build which will have reverted the change that broke the test, which was to mount recursively instead of non-recursively, reference: systemd/systemd#13170 Once the build is released we will update rkt to use Flatcar Edge instead of CoreOS Container Linux Stable. |
4898c0a
to
a2b5a05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that tests pass now! 🎉
Some small things:
-
Could you rework commits so only the ones needed are present? For example, there's one commit that introduces Flatcar and another that updates the Flatcar version, the first one is not needed. Something similar with the several commits that deal with the
payload
addition. The idea is to keep changes divided logically in different commits so future developers have a good history. -
Can you limit commit messages to 80 columns like mentioned in the contributing guidelines?
-
Please change the PR description summarizing the PR changes and mention this work was done as a part of GSOC.
Thanks for this!
"make unit-check" throws a "gofmt checking failed", this resolves the issue. Refers to rkt#4001
"make unit-check" throws many "KeyValue composite literal uses unkeyed fields", even though these checks can be disabled manually, this solution seems to be more future-proof. Refers to rkt#4001
Because of the switch to Docker Light from the Semaphore platform to have a newer distro, this is required to successfully build rkt on SemaphoreCI.
There are three arguments but call only takes two, which stops the test from passing.
New versions of systemd allow creating the ptmx device so we're switching to mem.
Since go1.11, `syscall.Stat()` is implemented with `newfstatat()` (golang/go@1073256). However we are testing that blocking `stat()` resulted in `syscall.Stat()` failing, which doesn't work anymore. We were already blocking `newfstatat()` in arm64 architectures so we just block that syscall unconditionally now.
Currently functional tests are being run for the host flavor, this will disable them if it detects the semaphoreCI environment, as its systemd version is too old there.
cad5f20
to
d492b3a
Compare
Thanks for the feedback @iaguis! I believe all of the comments have been addressed, let me know. |
Currently the build with SemaphoreCI fails as it can't find missing/obsolete packages, this aims to fix libsystemd-journal-dev.
Arguments of wrong type are being passed to Fatalf, which stops the test from passing.
Some dependencies are required to compile QEMU and also to address the "cannot set property CollectMode" in regards to allocate_scope().
Test was able to pass in the past due to a bug on the reflect package, but now it cannot set an embedded pointer to an unexported struct.
Due to systemd adding an extra directory in the cgroupv1 hierarchy, rkt's cgroup settings broke as it assumed that the pod would be in another path.
Newer versions of systemd removes underscores from the hostname, therefore expected value needs changed.
As CoreOS Container Linux is in maintenance mode (coreos/bugs#2559 (comment)) and won't see many new features and we need some systemd bugfixes that rkt needs (specifically systemd/systemd#13173 and systemd/systemd#12860) let's switch to Flatcar Linux. Flatcar Linux us a drop-in replacement of Container Linux and its Edge channel contains the needed bugfixes. To do that, we need to change the GPG key to verify the images through the script.
This PR is part of Google Summer of Code 2019.
Summary of work included within this PR: