Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator #10153

Merged
merged 34 commits into from Sep 19, 2020

Conversation

yuqian90
Copy link
Contributor

@yuqian90 yuqian90 commented Aug 4, 2020

This PR introduces TaskGroup, which is a simple UI task grouping concept.

What this PR does:

  • TaskGroups can be collapsed/expanded in Graph View when clicked
  • TaskGroups can be nested
  • TaskGroups can be put upstream/downstream of tasks or other TaskGroups with >> and << operators
  • Search box, hovering, focusing in Graph View treats TaskGroup properly. E.g. searching for tasks also highlights TaskGroup that contains matching task_id. When TaskGroup is expanded/collapsed, the affected TaskGroup is put in focus and moved to the centre of the graph.

What this PR does not do:

  • This PR does not change or remove SubDagOperator. Although TaskGroup is intended as an alternative for SubDagOperator, deprecating SubDagOperator will need to be discussed/implemented in the future.
  • This PR only implemented TaskGroup handling in the Graph View. In places such as Tree View, it will look like as-if TaskGroup does not exist and all tasks are in the same flat DAG.

GitHub Issue: #8078
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator

This is the example_task_group that this PR adds:
image

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about how will this be supported in the new rest api so one can easily rebuild the same feature in a new UI.

airflow/utils/task_group.py Outdated Show resolved Hide resolved
airflow/utils/task_group.py Outdated Show resolved Hide resolved
@yuqian90
Copy link
Contributor Author

We should think about how will this be supported in the new rest api so one can easily rebuild the same feature in a new UI.

Hi @houqp Thanks for the suggestions. I think you bring up a good point. Currently TaskGroup is constructed from scratch by the web server based on the task_group_ids fields of tasks. This is probably not a very extensible way to do this. To make this more friendly for rest api, i think we need to serialize the TaskGroup and pass it along with the DAG object. Serializing the TaskGroup also allows adding some additional attributes to TaskGroup, such as ui_color, doc, etc which can be useful when shown on the web UI. What do you think?

@houqp
Copy link
Member

houqp commented Aug 15, 2020

@yuqian90 I think that's a good approach, this extra meta data needs to be added to more than one endpoint i think, but we can start with /dags/{dag_id}/tasks.

@yuqian90
Copy link
Contributor Author

@houqp thanks for your review. Here are the changes since your last review.
@kaxil I made some changes to serialized_objects.py to handle serialization/deserialization of dag.task_group. Could you take a look?

  • Added serialization of TaskGroup. dag.task_group is serialized as a dict.
  • Improved UI interaction:
    • The graph is moved to focus around the most recently expanded/collapsed node.
    • If text is typed into the searchbox, the TaskGroup nodes with matching children tasks are also highlighted
    • TaskGroup now has a tooltip showing a summary count of its children TaskInstance states.
  • Added typing hints to task_group.py.
  • Added tests.

houqp
houqp previously requested changes Aug 17, 2020
airflow/models/baseoperator.py Show resolved Hide resolved
airflow/models/baseoperator.py Outdated Show resolved Hide resolved
airflow/serialization/serialized_objects.py Outdated Show resolved Hide resolved
airflow/serialization/serialized_objects.py Outdated Show resolved Hide resolved
airflow/www/views.py Outdated Show resolved Hide resolved
@kaxil
Copy link
Member

kaxil commented Aug 17, 2020

Thanks for the PR, I will definitely take a look at it since Airflow 1.10.12 is released :)

@houqp
Copy link
Member

houqp commented Aug 17, 2020

Thanks @yuqian90 for the update, I left one comment around a change that I think we should be more careful about, the rest looks good to me 👍 would love to hear about other people's opinions on this.

airflow/models/baseoperator.py Outdated Show resolved Hide resolved
airflow/serialization/serialized_objects.py Outdated Show resolved Hide resolved
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs good documentation coverage too

@yuqian90 yuqian90 force-pushed the task_group branch 2 times, most recently from 857fb63 to 42c9931 Compare August 19, 2020 10:53
@yuqian90
Copy link
Contributor Author

This needs good documentation coverage too

Thanks. I added a section here:
https://github.com/apache/airflow/blob/545c69704ff7ad199ff58b7a01b3c60fc222b7c2/docs/concepts.rst#taskgroup

@mik-laj mik-laj mentioned this pull request Aug 24, 2020
@KevinYang21
Copy link
Member

Do you mind elaborate a bit more on the impact to TreeView please? Would it be rendered just like the subdag operators? Thanks.

@yuqian90
Copy link
Contributor Author

yuqian90 commented Aug 27, 2020

Do you mind elaborate a bit more on the impact to TreeView please? Would it be rendered just like the subdag operators? Thanks.

At the current implementation of this PR, the Tree View is not touched. In other words, the grouping is only relevant in Graph View. The Tree View will look like all tasks, including the ones in nested TaskGroups are on the same DAG (because that's what they are under the hood).

That said, we can always improve on it and create the grouping in Tree View too.

@yuqian90 yuqian90 force-pushed the task_group branch 2 times, most recently from deae87b to 23e22ef Compare August 31, 2020 07:08
@yuqian90
Copy link
Contributor Author

Hi, @houqp and reviewers, here are the latest changes since you last reviewed:

  • No more prefixing of task_id per the discussion here. The user is responsible for maintaining unique task_id across the entire DAG, just like before.
  • Per the discussion in the email "[AIP-34] Rewrite SubDagOperator" with @casassg, if group2 depends on group1, instead of drawing an edge between every task in group1 and every task in group2, the two functions task_group_to_dict() and dag_edges() now produce a dummy node for the TaskGroup and join all the edges onto that node. This greatly cuts down the number of edges and speeds up UI rendering many times. This is a UI-only change. Internally, all tasks in group2 still depend on all tasks in group1. The dummy nodes only appear on the UI. They don't exist internally.

This is what example_task_group.py now looks like. The circles between section_1.1 and section_1.2 are the dummy nodes I'm talking about. They only appear on the graph and not as actual tasks anywhere.

image

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Just tested it out, everything worked fine for me.

Awesome work @yuqian90 🎉 🚀

@kaxil kaxil merged commit 49c193f into apache:master Sep 19, 2020
@dimberman
Copy link
Contributor

@yuqian90 Heck yes!!! I'm so excited for this 🙌

@yuqian90
Copy link
Contributor Author

Is anyone interested in using TaskGroup in the next 1.10.* release? We can contribute our own cherry-picked commit too for v1-10-test since we are going to use it anyway ourselves.

@potiuk
Copy link
Member

potiuk commented Sep 19, 2020

Is anyone interested in using TaskGroup in the next 1.10.* release? We can contribute our own cherry-picked commit too for v1-10-test since we are going to use it anyway ourselves.

I think this would be great, so that people could test this before.

yuqian90 added a commit to yuqian90/airflow that referenced this pull request Sep 21, 2020
yuqian90 added a commit to yuqian90/airflow that referenced this pull request Sep 24, 2020
yuqian90 added a commit to yuqian90/airflow that referenced this pull request Sep 24, 2020
@BhuviTheDataGuy
Copy link

Its a nice feature any idea when it'll be available(GA release)?

@kaxil
Copy link
Member

kaxil commented Oct 14, 2020

Its a nice feature any idea when it'll be available(GA release)?

in Airflow 2.0 -- by the end of this year

yuqian90 added a commit to yuqian90/airflow that referenced this pull request Jan 11, 2021
@yuqian90 yuqian90 deleted the task_group branch May 29, 2021 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:webserver Webserver related Issues pinned Protect from Stalebot auto closing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants