Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate workflows #37

Merged
merged 15 commits into from
Sep 25, 2023
Merged

Consolidate workflows #37

merged 15 commits into from
Sep 25, 2023

Conversation

Tyrrrz
Copy link
Contributor

@Tyrrrz Tyrrrz commented Sep 21, 2023

This is my attempt to consolidate workflows from 2 to 1.

Highlights:

  • Only one workflow, high step reuse.
  • Makes use of multiple jobs. This is useful when the workflow fails, as this setup allows us to rerun only individual parts of the pipeline, instead of the whole thing.
  • Makes use of artifacts to share data between jobs, without having to rebuild or even checkout.
  • Runs tests on Windows and Ubuntu, just like before. The other steps only run on Ubuntu to avoid wasting resources.
  • More delicate permission control (job level instead of workflow level).

Please review this with high detail because it's hard to test locally 😅

@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 21, 2023

Should've opened it as a draft first. I'll fix the issues tomorrow. But the idea is there

@Tyrrrz Tyrrrz changed the title Consolidate workflows WIP: Consolidate workflows Sep 21, 2023
@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 22, 2023

So it turns out sharing artifacts between .NET jobs isn't as simple as I'd hoped. .NET creates a lot of files outside of the build directory and even outside of the ~/.nuget directory. Ultimately, even if we get it right, it's going to be gigabytes of data that will take a while to upload/download between jobs -- at which point we might as well repeat some steps instead.

In the current iteration I have the following setup:

  • format job that clones the repo and runs dotnet format to check for formatting issues
  • test job that clones the repo and runs dotnet test on several platforms
  • pack job that clones the repo and runs dotnet pack to generate nupkg files, and upload them as artifacts
  • deploy job that pulls the nupkg files from the artifacts and pushes them to the registry

All jobs, except deploy run in parallel. The deploy job doesn't, because it needs to make sure all the other jobs succeeded before proceeding. Even though this setup leads to some repeated work (checkout, building and restoring), the parallelization benefit will likely outweigh in, resulting in faster overall execution times. The other benefit of parallelization is that we can detect multiple issues at once (for example if formatting fails, we can still see test fails too).

@Tyrrrz Tyrrrz changed the title WIP: Consolidate workflows Consolidate workflows Sep 22, 2023
Copy link
Member

@justindbaur justindbaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong for liking this but I like the 3 stage restore/build/[test|pack] (restore, build --no-restore, <action> --no-restore --no-build) because then I can more easily blame each stage for how long they are taking and can track how long each steps take to blame them and evaluate possible improvements can be made. Because of that I know that restore on Windows generally takes much longer than it's equivalent step on Ubuntu.

In the same vein, before I had only tested --framework 462 on Windows because I considered the other TFM's covered by Ubuntu. With windows generally taking longer but also having slightly less to do the hope was that we can shorten the feedback loop a little bit.

Interested in your thoughts though.

@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 23, 2023

@justindbaur thanks for the review 🙂

Maybe I'm wrong for liking this but I like the 3 stage restore/build/[test|pack] (restore, build --no-restore, <action> --no-restore --no-build) because then I can more easily blame each stage for how long they are taking and can track how long each steps take to blame them and evaluate possible improvements can be made. Because of that I know that restore on Windows generally takes much longer than it's equivalent step on Ubuntu.

Having separate stages is still possible in this setup. The main differentiating factor is that, in this setup, those stages will be repeated across some jobs. If the goal is to measure performance of individual operations, then that can certainly be achieved – there will be more steps per job, but overall the situation won't change.

Also, note that Windows runners are generally very slow. There's a recurring issue with just the checkout operation taking several minutes: actions/checkout#1186

In the same vein, before I had only tested --framework 462 on Windows because I considered the other TFM's covered by Ubuntu. With windows generally taking longer but also having slightly less to do the hope was that we can shorten the feedback loop a little bit.

As you can see in the original workflow, the actual restore/build/test process wasn't the issue -- checkout was what took the longest amount of time:

image

I don't think the overhead of the extra target is significant, but I can update the workflow to only run tests on FrameworkIdentifier == '.NETFramework' when on Windows.

@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 23, 2023

So, to summarize:

  • We can do separate restore/build/test stages as part of the same job for better monitoring of where time is spent. This will not affect the general workflow setup.
  • We can add a way to only run netfx tests on Windows, also without major changes. I don't think this will help much because Windows runners are just obnoxiously slow as they are.

If we want to retain the maximum efficiency that the previous workflow setup offered, we will have to merge all of the steps (except deploy, but at that point there's no point keeping that separate) into one job. That way we'll lose the various benefits I mentioned.

Note that currently, the duration of the workflow is essentially the duration of its slowest job (deploy is negligible), which is dotnet test on Windows. So optimizing the build process to avoid redundant restores/builds might not make a noticeable difference in the great scheme of things.

What are your thoughts @justindbaur?

@justindbaur
Copy link
Member

@Tyrrrz When I was first setting up the jobs for multi targeting I was seeing the restore step take up a lot more time on windows, this was a particularly bad one. I would suspect that the dotnet test on windows in this PR is almost fully it actually doing restore implicitly.

I would agree that only running FrameworkIdentifier == '.NETFramework' tests right now will not have a negligible affect and that's why I'm okay leaving it off right now but that is mostly because we skip all the tests in CI right now. Down the road I'd love to boot up a test container running a special version of the server for tests. Then, I would expect running 3 vs 1 to have a somewhat noticeable impact.

So I think please do separate out the stages, but you don't need to only run framework tests on windows right now. We can easily drop that in down the road but also I bet it's a little MSBuild Condition work on $(ContinuousIntegrationBuild ) that you can do pretty quickly.

@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 23, 2023

@justindbaur Thanks for the feedback. I'll adapt the PR on Monday to incorporate your suggestions and tag you again :)

Copy link
Member

@abergs abergs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good!

@Tyrrrz
Copy link
Contributor Author

Tyrrrz commented Sep 25, 2023

@justindbaur please review again.

Highlights:

  1. Restore/build/[test/pack] stages are split now
  2. Windows only runs .NET Framework targets now

Copy link
Member

@justindbaur justindbaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the changes.

@Tyrrrz Tyrrrz merged commit b64edff into main Sep 25, 2023
5 checks passed
@Tyrrrz Tyrrrz deleted the optimize-workflows branch September 25, 2023 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants