Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add validation for duplicate template and node names #1054

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

crflynn
Copy link
Contributor

@crflynn crflynn commented May 4, 2024

Pull Request Checklist

Description of PR
Currently, hera lacks validation for duplicate template names, causing templates with duplicate names to be missing when rendering to yaml. Hera also lacks validation for duplicate node names, which results in rendered yaml that is invalid when submitted to argo-workflows.

This PR adds validation for both situations, preventing the user from rendering incorrect or invalid yaml when Workflows contain multiple templates or nodes with the same name by raising a TemplateNameConflict or NodeNameConflict, respectively.

Note that the order of operations has been adjusted in _HeraContext.add_sub_node. This change was required to continue to support workflows with recursive references.

Signed-off-by: crflynn <flynn@simplebet.io>
crflynn and others added 2 commits May 3, 2024 22:52
Signed-off-by: crflynn <flynn@simplebet.io>
Copy link

codecov bot commented May 4, 2024

Codecov Report

Attention: Patch coverage is 96.87500% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (d553546) to head (8cd8447).
Report is 53 commits behind head on main.

Files Patch % Lines
src/hera/workflows/_context.py 85.7% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1054     +/-   ##
=======================================
+ Coverage   81.7%   81.9%   +0.2%     
=======================================
  Files         54      57      +3     
  Lines       4208    4319    +111     
  Branches     889     914     +25     
=======================================
+ Hits        3439    3540    +101     
- Misses       574     578      +4     
- Partials     195     201      +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@samj1912 samj1912 added semver:patch A change requiring a patch version bump type:bug A general bug labels May 4, 2024
@elliotgunton
Copy link
Collaborator

elliotgunton commented May 7, 2024

Thanks for the contribution @crflynn! Although I'm not entirely opposed to the change, I am hesitant about starting to recreate all of Argo's validation logic (1500+ lines!).

Trying to understand your use-case/motivation for the change --

  • Is your use-case mainly coming from the developer experience? Or are you hitting problems with gitops flows where the YAML is already rendered and intended to deploy?
  • Are you doing things in the Workflow definition that are only possible in Hera, such as conditionally adding templates/tasks to the Workflow definition?
  • Are you unable to run an argo lint on the cluster (via hera or CLI) before submitting? (During tests?)

I think this change is small and helpful enough that we'd accept it, but it would be good to understand the motivation behind the change and to figure out in advance how far we want to go with matching Argo's on-cluster-validation (cc @samj1912 @flaviuvadan).

@crflynn
Copy link
Contributor Author

crflynn commented May 13, 2024

We found that the most common issue of users was name conflicts like these, which wouldn’t manifest until execution time when they would error on the controller, so we added some validation at build time to reduce iterations. I figured it would be helpful upstream but completely understand the reasoning behind not wanting to go down this path with hera.


For context:

We have a downstream library that wraps hera and provides more functionality. The main features are:

  • some basic validation including name conflicts like this
  • rendering helm templates in addition to plain yaml, i.e. escaping argo templating for helm compatibility
  • providing a local runner
  • providing a CronWorkflowTemplate abstraction that builds a WorkflowTemplate and a CronWorkflow that references it. This allows us to submit these workflows with custom arguments via the UI (can’t do this with CronWorkflow in the UI afaik)

among other features for local development.

We build and deploy our workflows using helm charts. This allows us to template things like the schedule, env vars, or entire WorkflowTemplates by environment. It also gives us the benefits of helm like helm rollbacks and shipping the workflows in a declarative way together with other kubernetes resources, e.g. ConfigMaps, Secrets, RBAC, and associated applications.

I gave argo lint a try and with helm it would look something like this where we helm template first and then lint the output.

helm template . --output-dir output
argo lint output --output simple --kinds=cronworkflows
argo lint output --output simple --kinds=workflowtemplates

We are only using CronWorkflows and WorkflowTemplates.

It seems to lint our CronWorkflows or objects that reference other templates we would have to expose the API to our CI, which we don’t do, so this just results in authorization errors.

Linting the WorkflowTemplates results in json: unknown field "hooks" on every template, despite the hooks functioning just fine during operation. (might be a bug on the cli?)

Perhaps what we are doing is a bit unconventional, I'm not sure, but it has been working fairly well despite the lack of validation in hera. To a degree, the lack of validation also helps us, since sometimes we template things like the cron schedule with helm, whereas strict cron string validation would actually break how we are using it in this particular case.

@elliotgunton elliotgunton added type:enhancement A general enhancement semver:minor A change requiring a minor version bump and removed semver:patch A change requiring a patch version bump type:bug A general bug labels May 14, 2024
@elliotgunton
Copy link
Collaborator

Thanks for the detailed response! Really helpful to understand how Hera is used and built upon 😄 I think I'm happy to approve the intent of the PR, will take a closer look at the code now (i.e. the add_sub_node changes)! 🚀

Copy link
Collaborator

@elliotgunton elliotgunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @crflynn! Sorry for the delay. The PR looks good overall, some nitpicks and things slightly unrelated to your changes, the only real requests are to add a test and remove another. As it's been a few weeks I'd be happy to wrap up the PR for the fixes and resolve the changes to get it merged (I'll do this tomorrow if no reply or you confirm your preference either way).

Comment on lines +13 to +14
def test_conflict_on_dag_and_script_with_same_name(self):
"""Dag and script can't have the same name."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just keep this test and remove test_conflict_on_multiple_scripts_with_same_name - as we're extracting the script name from the decorator args they are essentially testing the same thing, as the type of template doesn't matter

Suggested change
def test_conflict_on_dag_and_script_with_same_name(self):
"""Dag and script can't have the same name."""
def test_conflict_on_templates_with_same_name(self):
"""Multiple templates can't have the same name."""

"""Dag and script can't have the same name."""
name = "name-of-dag-and-task"

@script(name=name) # same dag/steps template name given out of context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the comment makes sense here as a name given within the context is the step/task name, while here it is the script template name

Suggested change
@script(name=name) # same dag/steps template name given out of context
@script(name=name)


def test_conflict_on_dag_and_script_with_same_name(self):
"""Dag and script can't have the same name."""
name = "name-of-dag-and-task"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = "name-of-dag-and-task"
name = "name-of-dag-and-script"

Comment on lines +52 to +78
def test_conflict_on_multiple_scripts_with_same_name(self):
"""Dags cannot have two scripts with the same name."""
name = "name-of-tasks"

@script(name=name) # same template name given out of context
def hello():
print("hello")

@script(name=name) # same template name given out of context
def world():
print("world")

with pytest.raises(TemplateNameConflict):
with WorkflowTemplate(
name="my-workflow",
entrypoint=name,
), DAG(name="dag"):
hello()
world()

with pytest.raises(TemplateNameConflict):
with WorkflowTemplate(
name="my-workflow",
entrypoint=name,
), Steps(name="steps"):
hello()
world()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed as it's testing the same logic as test_conflict_on_dag_and_script_with_same_name (template type doesn't matter)

Suggested change
def test_conflict_on_multiple_scripts_with_same_name(self):
"""Dags cannot have two scripts with the same name."""
name = "name-of-tasks"
@script(name=name) # same template name given out of context
def hello():
print("hello")
@script(name=name) # same template name given out of context
def world():
print("world")
with pytest.raises(TemplateNameConflict):
with WorkflowTemplate(
name="my-workflow",
entrypoint=name,
), DAG(name="dag"):
hello()
world()
with pytest.raises(TemplateNameConflict):
with WorkflowTemplate(
name="my-workflow",
entrypoint=name,
), Steps(name="steps"):
hello()
world()

), Steps(name=name):
example()

def test_no_conflict_on_dag_and_task_with_same_name(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test to confirm tasks with different names are okay? i.e. to test the if t != node.template: logic in src/hera/workflows/_context.py

Something like this:

Suggested change
def test_no_conflict_on_dag_and_task_with_same_name(self):
def test_no_conflict_on_tasks_with_different_names_using_same_template(self):
"""Task nodes can have different names for the same script template."""
name_1 = "task-1"
name_2 = "task-2"
@script()
def example():
print("hello")
with WorkflowTemplate(
name="my-workflow",
entrypoint=name,
), DAG(name=name):
example(name=name_1)
example(name=name_2)
def test_no_conflict_on_dag_and_task_with_same_name(self):

name="my-workflow",
entrypoint=name,
), Steps(name=name):
example(name=name) # named but referencing the "example" template
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
example(name=name) # named but referencing the "example" template
example(name=name) # step name same as steps template

"""Dags cannot have two scripts with the same name."""
name = "name-of-tasks"

@script(name=name) # same template name given out of context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@script(name=name) # same template name given out of context
@script(name=name) # same template name

def hello():
print("hello")

@script(name=name) # same template name given out of context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@script(name=name) # same template name given out of context
@script(name=name) # same template name

Comment on lines +79 to 82
# for particular types, the user invoked a decorated function e.g. `@script`
# inside a proper context. Here, we add the object to the overall workflow context, directly as a template,
# in case it is not found (based on the name). This helps users save on the number of templates that are
# added when using an object that is a `Script`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify/correct the comment

Suggested change
# for particular types, the user invoked a decorated function e.g. `@script`
# inside a proper context. Here, we add the object to the overall workflow context, directly as a template,
# in case it is not found (based on the name). This helps users save on the number of templates that are
# added when using an object that is a `Script`
# When the user invokes a decorated function e.g. `@script inside a sub-context (dag/steps),
# we also add the step/task's template to the overall workflow context, if it is not already added.

Comment on lines +94 to +101
try:
# here, we are trying to add a node to the last piece of context in the hopes that it is a subbable
pieces[-1]._add_sub(node)
except InvalidType:
# if the above fails, it means the user invoked a decorated function e.g. `@script`. Hence,
# the object needs to be added as a template to the piece of context at [-1]. This will be the case for
# DAGs and Steps
pieces[-1]._add_sub(node.template) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old comment here didn't really make sense to me, and checking out the code it turns out it's never hit in our tests. cc @flaviuvadan I think the backend logic changed such that we don't need the try/except (i.e. callers of _add_sub_node always pass in a valid node type as it's called in SubNodeMixin, and I don't see a way to pass an invalid type). Tried removing it and tests all pass.

Suggested change
try:
# here, we are trying to add a node to the last piece of context in the hopes that it is a subbable
pieces[-1]._add_sub(node)
except InvalidType:
# if the above fails, it means the user invoked a decorated function e.g. `@script`. Hence,
# the object needs to be added as a template to the piece of context at [-1]. This will be the case for
# DAGs and Steps
pieces[-1]._add_sub(node.template) # type: ignore
pieces[-1]._add_sub(node)

@crflynn
Copy link
Contributor Author

crflynn commented May 30, 2024

@elliotgunton thanks for the review. The changes all make sense to me. I probably won't have time to make them myself until next week, so feel free to resolve them if you have the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:minor A change requiring a minor version bump type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants