Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml API: Day Zero tutorial notebook #27284

Merged
merged 48 commits into from
Apr 5, 2024

Conversation

bzablocki
Copy link
Contributor

I've created the first version of Getting Started with Yaml notebooks.
The target group are users who don't have a lot of experience with coding and/or beam; and want to quickly get started with writing pipelines in beam.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@bzablocki
Copy link
Contributor Author

R: @robertwb

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Attention: Patch coverage is 28.57143% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 72.15%. Comparing base (3aa78d2) to head (d4d2b4d).
Report is 3 commits behind head on master.

❗ Current head d4d2b4d differs from pull request most recent head 3da0b06. Consider uploading reports for the commit 3da0b06 to get more accurate results

Files Patch % Lines
sdks/python/apache_beam/yaml/yaml_provider.py 28.57% 30 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #27284       +/-   ##
===========================================
+ Coverage   38.53%   72.15%   +33.61%     
===========================================
  Files         698      691        -7     
  Lines      102361   101185     -1176     
===========================================
+ Hits        39447    73009    +33562     
+ Misses      61283    26563    -34720     
+ Partials     1631     1613       -18     
Flag Coverage Δ
go 53.94% <ø> (-0.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@svetakvsundhar svetakvsundhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tutorial! and cool idea :)

Copy link

@amotley amotley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, thanks for writing this Bartosz!

@@ -0,0 +1,424 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a landing page (similar to https://beam.apache.org/get-started/try-apache-beam/) which jumps to this collab page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great! Is this something @robertwb can arrange?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I imagine it's just a matter of making a new page similar to that one and adding to the TOC

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tech Writers can probably help answer this question too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me or @svetakvsundhar will work on this once this PR is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #27450

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, can you use the new YAML tags that Jeff created? https://bugdashboard.corp.google.com/app/tree;dashboardId=551658

examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@svetakvsundhar svetakvsundhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just add in the TODOs :)

Copy link

@amotley amotley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Bartosz, thanks again

@bzablocki
Copy link
Contributor Author

Hi @robertwb, could you take a look at this PR? Thanks!

Copy link
Contributor

@Polber Polber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bartosz! This looks really good. I left a couple comments, but overall seems to provide a great starting point. I think there will need to be some extra blocks around setup (installing dependencies, beam, etc.), but I see you opened a FR for that.

examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
sdks/python/apache_beam/yaml/yaml_provider.py Outdated Show resolved Hide resolved
…orial

# Conflicts:
#	sdks/python/apache_beam/yaml/yaml_provider.py
@bzablocki
Copy link
Contributor Author

Hi @Polber, thanks for the review. All the changes are applied now.
@VeronicaWasson, could you perhaps have another look at this notebook?

bzablocki and others added 2 commits December 21, 2023 13:53
Co-authored-by: Jeff Kinard <35542536+Polber@users.noreply.github.com>
@bzablocki
Copy link
Contributor Author

Hi @Polber, do you think we can merge it now?

Copy link
Contributor

@Polber Polber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bzablocki We are currently trying to get some docs onto the Beam website which I think would be a good idea to get submitted first so that users have docs to reference after trying out the day zero notebook.

While playing around with this notebook again, I noticed a couple more minor things, but no rush.

examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
Comment on lines 292 to 306
"pipeline = '''\n",
"pipeline:\n",
" type: chain\n",
" transforms:\n",
" - type: ReadFromCsv\n",
" config:\n",
" path: data/people.csv\n",
" - type: Filter\n",
" config:\n",
" language: python\n",
" keep: \"age >= 18\"\n",
" - type: LogForTesting\n",
"'''\n",
"save_to_file(pipeline, 'pipelines/pipeline-filter-01.yaml')\n",
"! python -m apache_beam.yaml.main --pipeline_spec_file=pipelines/pipeline-filter-01.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case I would move the run command to its own cell.

examples/notebooks/get-started/try-apache-beam-yaml.ipynb Outdated Show resolved Hide resolved
@bzablocki
Copy link
Contributor Author

Thanks for taking another look at this PR @Polber!

Hey @bzablocki We are currently trying to get some docs onto the Beam website which I think would be a good idea to get submitted first so that users have docs to reference after trying out the day zero notebook.

That makes a lot of sense, I agree it's better to release some docs first. Thanks for the update. I'll wait for the documentation and I'll update this PR with the relevant references once that is done.

@Polber
Copy link
Contributor

Polber commented Feb 28, 2024

@bzablocki The docs have been pushed to the beam site, if you want to link them, I think we can get this PR merged
https://beam.apache.org/documentation/sdks/yaml/

@kennknowles
Copy link
Member

Ping! Seems like this is ready to merge?

@bzablocki
Copy link
Contributor Author

Thanks for the ping, I'll submit an updated PR today.

@bzablocki
Copy link
Contributor Author

I updated the PR with documentation and a short explanation on why a pipeline with a transform from an expansion service (Filter-sql) logs output in a different format than a simple pipeline without an expansion service.

@robertwb
Copy link
Contributor

As discussed in person, it feels a bit awkward to have every command be a combination of save file + execute command in Python. Could we set the interpreter to shell instead?

@bzablocki
Copy link
Contributor Author

@Polber I converted saving the file to the built-in Jupyter's '%%writefile'. PR is ready for the final review/merge.

Copy link
Contributor

@Polber Polber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working so hard on this!

@robertwb robertwb merged commit a475fde into apache:master Apr 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants