Add `go vet` to the Makefile and fix its errors #691

ronensc · 2023-11-15T10:43:24Z

Issue link

What changes have been made

Added go vet to the Makefile
Fixed its errors

Verification steps

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- Testing is not required for this change

openshift-ci · 2023-11-15T10:43:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign asm582 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ronensc · 2023-11-28T13:34:18Z

@anishasthana I think MCAD-CI is flaky as I don't think its failure is related to the changes in this PR. Could you please try rerunning it?

ronensc · 2023-11-28T15:43:40Z

Failed again. Same failing test "MCAD CPU Preemption Test"... 😢
https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/6876210215/job/19098434211#step:8:722

It seems it is failing in other PRs as well.
https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/7018959159/job/19095405569#step:8:726

https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/7003208146/job/19048461449#step:8:725

anishasthana · 2023-11-29T00:29:09Z

Hmm @asm582 are you aware of any recent issues with CI?

dgrove-oss · 2023-11-29T15:04:14Z

I observed when porting the test suite to MCAD v2 that many of the tests only work as expected if the capacity of the cluster is calculated to be about 2 CPUs. If the cluster has more capacity, then the tests that expect certain AppWrappers to not be runnable or to be preempted because higher-priority AppWrappers will have taken all the available capacity fail. The failing test here is one of those. Looking at the logs, I see that the controller thinks the available capacity is about 6 CPUs

 The available capacity to dispatch appwrapper is cpu 5950.00, memory 32085303296.00, GPU 16 and time took to calculate is 8.462585ms

In the v2 test suite, I am redoing all of these tests to be based on a % of total CPU capacity instead of using hardwired resource values to avoid this fragility.

asm582 · 2023-11-29T15:17:52Z

Thank you for the PR. I am curious, if we are doing %-based CPU calculation then are we diverging from the way Scheduler does accounting? and would that be ok?

asm582 · 2023-11-29T15:19:57Z

Hmm @asm582 are you aware of any recent issues with CI?

I usually set 2 CPUs and 8 GB RAM as resource setting in podman or docker desktop and they would run to completion locally. I am not sure if we changed resource requirement in the CI system

dgrove-oss · 2023-11-29T15:31:59Z

To be clear, tests like this are what are fragile in my opinion: https://github.com/project-codeflare/multi-cluster-app-dispatcher/blob/main/test/e2e/queue.go#L97-L125 because they are using hardwired CPU requests (not CPU requests that are calculated dynamically by the test as a fraction of the available capacity on the actual cluster when it is run).

Here it creates an AppWrapper with a 1100 CPU request (2 pods at 550 each) and then assumes that if it creates a 854 CPU request (2 pods at 427 each..the function name is misleading) it won't fit. If docker is limited to 2 CPUs, then the worker nodes have 4000 CPU capacity, Kubernetes + the mcad operator take around 2100 CPU and the math works as expected. The slightest change in available resource (or in the resources used by the system pods) and the test doesn't work as expected.

ronensc · 2023-12-04T16:05:58Z

@dgrove-oss thanks for the detailed explanation of the root cause of the CI failure!
I noticed that these fragile tests were commented out temporarily in mcad v2. Should we adopt a similar approach here? Can this PR be approved even though the CI is currently failing?

ronensc added 2 commits November 15, 2023 12:35

Run "go vet" on build

baf2378

Fix go vet errors

7683cd4

openshift-ci bot requested review from metalcycling and tardieu November 15, 2023 10:43

ronensc mentioned this pull request Dec 7, 2023

Skipping MCAD CPU Preemption Test #696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `go vet` to the Makefile and fix its errors #691

Add `go vet` to the Makefile and fix its errors #691

ronensc commented Nov 15, 2023

openshift-ci bot commented Nov 15, 2023

ronensc commented Nov 28, 2023 •

edited

ronensc commented Nov 28, 2023

anishasthana commented Nov 29, 2023

dgrove-oss commented Nov 29, 2023

asm582 commented Nov 29, 2023

asm582 commented Nov 29, 2023

dgrove-oss commented Nov 29, 2023

ronensc commented Dec 4, 2023

Add go vet to the Makefile and fix its errors #691

Are you sure you want to change the base?

Add go vet to the Makefile and fix its errors #691

Conversation

ronensc commented Nov 15, 2023

Issue link

What changes have been made

Verification steps

Checks

openshift-ci bot commented Nov 15, 2023

ronensc commented Nov 28, 2023 • edited

ronensc commented Nov 28, 2023

anishasthana commented Nov 29, 2023

dgrove-oss commented Nov 29, 2023

asm582 commented Nov 29, 2023

asm582 commented Nov 29, 2023

dgrove-oss commented Nov 29, 2023

ronensc commented Dec 4, 2023

Add `go vet` to the Makefile and fix its errors #691

Add `go vet` to the Makefile and fix its errors #691

ronensc commented Nov 28, 2023 •

edited