Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCAD CPU Preemption Test is failing intermittently in e2e #684

Open
VanillaSpoon opened this issue Nov 8, 2023 · 1 comment
Open

MCAD CPU Preemption Test is failing intermittently in e2e #684

VanillaSpoon opened this issue Nov 8, 2023 · 1 comment

Comments

@VanillaSpoon
Copy link
Member

Describe the Bug

The MCAD CPU Preemption Test is failing intermittently in CI and local e2e tests.
It appears to fail here:

err = waitAWAnyPodsExists(context, aw2)
		// With improved accounting, no pods will be spawned
		Expect(err).To(HaveOccurred())

Returning nil, instead of an err: Expected an error to have occurred. Got: <nil>: nil

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

MCAD

Expected Behavior:

The expectation is that aw2 would not be able to be scheduled due to insufficient CPU resources, leading to an error that is caught by the assertion.

Logs & Failures:

Logs show that aw2 AppWrapper appears to get running successfully, contrary to our expectations. This behavior suggests a discrepancy in the resources during the test runs.

[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[cleanupTestObjects] Deleting AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
• Failure [6.211 seconds]
AppWrapper E2E Test
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:33
  MCAD CPU Preemption Test [It]
  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:97

  Expected an error to have occurred.  Got:
      <nil>: nil

  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSS

Summarizing 1 Failure:

[Fail] AppWrapper E2E Test [It] MCAD CPU Preemption Test 
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116

Ran 2 of 30 Specs in 11.217 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 28 Skipped
--- FAIL: TestE2E (11.22s)
FAIL
FAIL	github.com/project-codeflare/multi-cluster-app-dispatcher/test/e2e	11.229s
FAIL
End to end test script return code set to 1
@VanillaSpoon
Copy link
Member Author

This can be seen here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant