Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: nightly dependency treadmill #15910

Merged
merged 1 commit into from
Oct 6, 2022

Conversation

edsantiago
Copy link
Collaborator

As discussed in f2f: this is the cleanest, simplest mechanism
I can think of to auto-test the Big Three dependencies: simply
run go mod edit immediately after git checkout, then run the
entire CI test suite.

If this approach works, we can set up a new CIRRUS_CRON=treadmill
job. I'm expecting it not to work, because Murphy, but want to
see what the unexpected gotchas are.

This differs significantly from the buildah treadmill, in that
buildah is almost impossible to re-vendor without manual intervention.
(In practice, so are these, but let's dream of a world in which
this will run and pass every night). (I want a pony too).

Signed-off-by: Ed Santiago santiago@redhat.com

None

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2022
@edsantiago edsantiago force-pushed the cron_treadmill branch 4 times, most recently from c4d1b2b to da331be Compare September 22, 2022 19:53
@edsantiago
Copy link
Collaborator Author

This is kind of weird: it compiles, and almost all the int tests have passed! (one flake, the FD one which I still haven't filed an issue for). I'm actually kind of worried that maybe I'm NOP'ing, and this is not testing an actual revendor. Is there a way to look at test logs and see which containers/foo versions got picked up in the built binary?

Anyhow, gotta go. Thoughts welcome on this approach.

@rhatdan
Copy link
Member

rhatdan commented Sep 23, 2022

I would figure the validate tests would fail on this with unvendored content?

@edsantiago
Copy link
Collaborator Author

That's what the git add vendor is for.

@rhatdan
Copy link
Member

rhatdan commented Sep 23, 2022

We need to commit a breaking change to make sure this fails. :^)

@edsantiago
Copy link
Collaborator Author

Re-pushed on 40e8bcb, passed again. (With two flakes, both of them nasty ones, but I'll follow up on Monday).

@edsantiago
Copy link
Collaborator Author

@containers/podman-maintainers hi! I re-pushed during scrum, and, lo, it's failing:

[+0008s]   >$ ./bin/podman pull -q quay.io/libpod/alpine_labels:latest
[+0008s]   >Error: writing blob: adding layer with blob "sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3": processing tar file(time="2022-09-26T10:55:31-05:00" level=error msg="stat /home/some9968dude/.config/containers/storage.conf: no such file or directory"

also

[+0045s] #  ERROR: for nginx  Cannot create container for service nginx: no such image: docker.io/library/alpine: image not known

Vendor diffs here.

I restarted each one, failed the same way. Does not seem to be a flake.

@edsantiago
Copy link
Collaborator Author

Another failure, in int tests now:

         $ podman [options] pull quay.io/libpod/cirros:latest
         time="2022-09-26T10:56:57-05:00" level=error msg="stat containers/storage.conf: no such file or directory"

@edsantiago
Copy link
Collaborator Author

culprit (for at least one of the failures, haven't checked the others) seems to be containers/storage#1357

@rhatdan
Copy link
Member

rhatdan commented Sep 26, 2022

Do the tests have XDG_CONFIG_HOME set and no storage.conf defined?

@edsantiago
Copy link
Collaborator Author

I'm guessing you don't need an answer, because you submitted containers/storage#1363 11 minutes after this question, but: using view-source on one result shows $XDG_RUNTIME_DIR set but no $XDG_CONFIG_ANYTHING.

@rhatdan
Copy link
Member

rhatdan commented Sep 26, 2022

Weird that XDG_CONFIG_HOME is now showing it is set. Will see if updated PR to storage fixes the problem.
I guess your nightly treadmill is already paying off.

@edsantiago
Copy link
Collaborator Author

And it's not even nightly yet!

P.S. does this help?

podman/libpod/runtime.go

Lines 123 to 163 in aaeabb0

// SetXdgDirs ensures the XDG_RUNTIME_DIR env and XDG_CONFIG_HOME variables are set.
// containers/image uses XDG_RUNTIME_DIR to locate the auth file, XDG_CONFIG_HOME is
// use for the containers.conf configuration file.
func SetXdgDirs() error {
if !rootless.IsRootless() {
return nil
}
// Set up XDG_RUNTIME_DIR
runtimeDir := os.Getenv("XDG_RUNTIME_DIR")
if runtimeDir == "" {
var err error
runtimeDir, err = util.GetRuntimeDir()
if err != nil {
return err
}
}
if err := os.Setenv("XDG_RUNTIME_DIR", runtimeDir); err != nil {
return fmt.Errorf("cannot set XDG_RUNTIME_DIR: %w", err)
}
if rootless.IsRootless() && os.Getenv("DBUS_SESSION_BUS_ADDRESS") == "" {
sessionAddr := filepath.Join(runtimeDir, "bus")
if _, err := os.Stat(sessionAddr); err == nil {
os.Setenv("DBUS_SESSION_BUS_ADDRESS", fmt.Sprintf("unix:path=%s", sessionAddr))
}
}
// Set up XDG_CONFIG_HOME
if cfgHomeDir := os.Getenv("XDG_CONFIG_HOME"); cfgHomeDir == "" {
cfgHomeDir, err := util.GetRootlessConfigHomeDir()
if err != nil {
return err
}
if err := os.Setenv("XDG_CONFIG_HOME", cfgHomeDir); err != nil {
return fmt.Errorf("cannot set XDG_CONFIG_HOME: %w", err)
}
}
return nil
}

@rhatdan
Copy link
Member

rhatdan commented Sep 26, 2022

So it is self inflicted.

@edsantiago
Copy link
Collaborator Author

ISTR a very very good reason for adding that. tig blame dates it to #3466, August 2019.

@nalind
Copy link
Member

nalind commented Sep 26, 2022

Do the tests have XDG_CONFIG_HOME set and no storage.conf defined?

time="2022-09-26T10:56:57-05:00" level=error msg="stat containers/storage.conf: no such file or directory"
suggests that the path name passed to stat() was just "containers/storage.conf", as though XDG_CONFIG_HOME (and the path that was combined with "containers" and "storage.conf") was an empty value.

@edsantiago
Copy link
Collaborator Author

This is ready for review, and to merge as soon as CI passes (which it will, because the $CIRRUS_CRON conditional will not trigger). I have NOT added a Cirrus cron job, because I don't seem to have enough Cirrus privs.

# Big Three dependencies, and run full CI test suite. Notification
# email will go out to monitor-list upon failure.
if [[ "$CIRRUS_CRON" = "treadmill" ]]; then
for pkg in common image/v5 storage; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is buildah excluded? This will fail if common makes a breaking change that also affects buildah.
IMO we need to vendor everything at once, excluding buildah will cause even more troubles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildah is impossible to vendor automatically. Seriously, absolutely impossible. That's the whole drama behind #13808.

However... does this address your concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this thing here will just start to fail we break something in c/common. Until common is declared stable this job will fail regularly.

Is there any reason to not just vendor everything in your treadmill PR vs this automated job?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not just vendor everything in your treadmill PR vs this automated job?

My sanity. Buildah treadmill is already a pretty big time suck (some days). I'm reluctant to spend more time on this sort of integration work.

@rhatdan
Copy link
Member

rhatdan commented Sep 29, 2022

LGTM

@edsantiago
Copy link
Collaborator Author

There's one problem with this, that I haven't been able to figure out: immediately after a vendor-everything-into-podman PR, this nightly cron job is going to fail, because I (deliberately) did not add --allow-empty to the git commit. I'm now reevaluating that decision:

  • As it is, the nightly cron job will fail, and will force me to look at it and see "oh, no problem, it's just because nothing has changed".
  • With --allow-empty, the check will pass, and will run the rest of CI even though it's a waste of cycles.

I was trying to save the planet. Now I'm thinking it's more important to save human time. Thoughts?

@vrothberg
Copy link
Member

We're doomed, one way or another. So I think you should have a good time and not be bothered by false positives.

@rhatdan
Copy link
Member

rhatdan commented Sep 29, 2022

Agreed. Humans should be primary.

@edsantiago
Copy link
Collaborator Author

So done. Rebased & repushed with --allow-empty.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 30, 2022
As discussed in f2f: this is the cleanest, simplest mechanism
I can think of to auto-test the Big Three dependencies: simply
run go mod edit immediately after git checkout, then run the
entire CI test suite.

This differs significantly from the buildah treadmill, in that
buildah is almost impossible to re-vendor without manual intervention.
(In practice, so are these, but let's dream of a world in which
this will run and pass every night). (I want a pony too).

Signed-off-by: Ed Santiago <santiago@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 30, 2022
@edsantiago
Copy link
Collaborator Author

This has been working well in buildah. Apart from, sigh, flakes.

@baude
Copy link
Member

baude commented Oct 6, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2022
@openshift-merge-robot openshift-merge-robot merged commit 94cfe7b into containers:main Oct 6, 2022
@edsantiago edsantiago deleted the cron_treadmill branch October 6, 2022 11:36
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants