add validation for duplicate template and node names #1054

crflynn · 2024-05-04T02:51:58Z

Pull Request Checklist

Fixes #
Tests added
Documentation/examples added
Good commit messages and/or PR title

Description of PR
Currently, hera lacks validation for duplicate template names, causing templates with duplicate names to be missing when rendering to yaml. Hera also lacks validation for duplicate node names, which results in rendered yaml that is invalid when submitted to argo-workflows.

This PR adds validation for both situations, preventing the user from rendering incorrect or invalid yaml when Workflows contain multiple templates or nodes with the same name by raising a TemplateNameConflict or NodeNameConflict, respectively.

Note that the order of operations has been adjusted in _HeraContext.add_sub_node. This change was required to continue to support workflows with recursive references.

Signed-off-by: crflynn <flynn@simplebet.io>

codecov · 2024-05-04T03:03:07Z

Codecov Report

Attention: Patch coverage is 96.87500% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (d553546) to head (8cd8447).
Report is 53 commits behind head on main.

Files	Patch %	Lines
src/hera/workflows/_context.py	85.7%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #1054     +/-   ##
=======================================
+ Coverage   81.7%   81.9%   +0.2%     
=======================================
  Files         54      57      +3     
  Lines       4208    4319    +111     
  Branches     889     914     +25     
=======================================
+ Hits        3439    3540    +101     
- Misses       574     578      +4     
- Partials     195     201      +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

elliotgunton · 2024-05-07T18:19:45Z

Thanks for the contribution @crflynn! Although I'm not entirely opposed to the change, I am hesitant about starting to recreate all of Argo's validation logic (1500+ lines!).

Trying to understand your use-case/motivation for the change --

Is your use-case mainly coming from the developer experience? Or are you hitting problems with gitops flows where the YAML is already rendered and intended to deploy?
Are you doing things in the Workflow definition that are only possible in Hera, such as conditionally adding templates/tasks to the Workflow definition?
Are you unable to run an argo lint on the cluster (via hera or CLI) before submitting? (During tests?)

I think this change is small and helpful enough that we'd accept it, but it would be good to understand the motivation behind the change and to figure out in advance how far we want to go with matching Argo's on-cluster-validation (cc @samj1912 @flaviuvadan).

crflynn · 2024-05-13T20:38:44Z

We found that the most common issue of users was name conflicts like these, which wouldn’t manifest until execution time when they would error on the controller, so we added some validation at build time to reduce iterations. I figured it would be helpful upstream but completely understand the reasoning behind not wanting to go down this path with hera.

For context:

We have a downstream library that wraps hera and provides more functionality. The main features are:

some basic validation including name conflicts like this
rendering helm templates in addition to plain yaml, i.e. escaping argo templating for helm compatibility
providing a local runner
providing a CronWorkflowTemplate abstraction that builds a WorkflowTemplate and a CronWorkflow that references it. This allows us to submit these workflows with custom arguments via the UI (can’t do this with CronWorkflow in the UI afaik)

among other features for local development.

We build and deploy our workflows using helm charts. This allows us to template things like the schedule, env vars, or entire WorkflowTemplates by environment. It also gives us the benefits of helm like helm rollbacks and shipping the workflows in a declarative way together with other kubernetes resources, e.g. ConfigMaps, Secrets, RBAC, and associated applications.

I gave argo lint a try and with helm it would look something like this where we helm template first and then lint the output.

helm template . --output-dir output
argo lint output --output simple --kinds=cronworkflows
argo lint output --output simple --kinds=workflowtemplates

We are only using CronWorkflows and WorkflowTemplates.

It seems to lint our CronWorkflows or objects that reference other templates we would have to expose the API to our CI, which we don’t do, so this just results in authorization errors.

Linting the WorkflowTemplates results in json: unknown field "hooks" on every template, despite the hooks functioning just fine during operation. (might be a bug on the cli?)

Perhaps what we are doing is a bit unconventional, I'm not sure, but it has been working fairly well despite the lack of validation in hera. To a degree, the lack of validation also helps us, since sometimes we template things like the cron schedule with helm, whereas strict cron string validation would actually break how we are using it in this particular case.

elliotgunton · 2024-05-14T09:05:29Z

Thanks for the detailed response! Really helpful to understand how Hera is used and built upon 😄 I think I'm happy to approve the intent of the PR, will take a closer look at the code now (i.e. the add_sub_node changes)! 🚀

elliotgunton

Hey @crflynn! Sorry for the delay. The PR looks good overall, some nitpicks and things slightly unrelated to your changes, the only real requests are to add a test and remove another. As it's been a few weeks I'd be happy to wrap up the PR for the fixes and resolve the changes to get it merged (I'll do this tomorrow if no reply or you confirm your preference either way).

elliotgunton · 2024-05-30T08:35:36Z