Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GHA to copy completed notebooks #132

Conversation

sjspielman
Copy link
Member

⚠️ Stacked on #129
Closes #111

This PR adds a github action that will copy completed notebooks from a given tag of training-modules into this template repository (presumably after someone has copied the template!). All repos involved are public, so no special security things here, which is important since we might expect folks running external workshops to use this action.

I developed this workflow in one of my personal repositories so I could spam myself to my heart's content, hence only one commit here. I confirmed it worked for any of the three workshops, but we should still do a final.final round of testing here as well. I figure we may want to get any conceptual reviews implemented first though, so this action currently only runs manually. Once we're confident that this is how the GHA should actually look, I can add a PR target so it will run here.

I made a variety of choices along the way which we can discuss during review if there are strongly differing opinions. Here's what I did!

  • The workflow takes two inputs, and the first one is conveniently handled with a dropdown menu:
    • Which training module is being taught (this dictates which files to copy over). This defaults to the first choice item, which is a plain intro R workshop.
      • Note that I did not include bulk rnaseq here, but I can! I figured we are unlikely to teach it again, but that said there's nothing preventing external people from running one and it's a quick change to add in.
    • The training-modules repo tag of interest, which defaults to master.
  • The workflow then clones both repos: training-modules and this one. You have to be a little careful with paths here, but it seems to work ok! I opted for this approach to find the files to copy rather than maintaining something analogous to an exercise_list like we do for the exercise copying action.
    • Note that this repo gets cloned into a directory called main, which seemed like a good generic name for the repo we are PRing into (it won't always be training-specific-template, since this action will generally be run in a differently-named repo created from this template).
    • I decided this for two reasons: I'd rather keep this all in 1 repo, and also it seemed like a fun thing to try out.
  • Since there are two repos here, I had to use --global flags for all the git config settings.
    • If we don't want global username and email, I could alternatively cd main and then set local username/email if that is preferred.
  • Then, the workflow will make some arrays of which files need to be copied, and then they get curled into completed-notebooks/
    • Noting here that I originally looked for .nb.html files, but we have one .html (not a notebook) living in intro scRNA-seq, so I went more permissive to just .html.
  • Finally, file the PR!

One key question for reviewers: Would we rather see the bash code pulled out into it's own script to live in scripts/? I was on the fence at the beginning, but now I'm trending towards "yes".

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment here is that if you already checked out both repos, why use curl? My suggested solution may not be quite what you want, but it should be along the right lines, and could simplify things.

I also had a naming suggestion because main is so overloaded.

Another simplifying thought is that maybe just a make an array of modules and loop through it with something like

for module in $modules; do
  cp training-modules/$module/[0-9]*.html website/completed-notebooks/$module/.
done

(I added some specificity back by starting with a number)

.github/workflows/copy-completed-notebooks.yml Outdated Show resolved Hide resolved
Comment on lines 58 to 59
# the url for for this tag of the training-modules repo where files will be downloaded from
base_url=https://raw.githubusercontent.com/AlexsLemonade/training-modules/${{ github.event.inputs.training-modules-tag }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a URL here rather than just copying the files we already have checked out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, indeed.

.github/workflows/copy-completed-notebooks.yml Outdated Show resolved Hide resolved
.github/workflows/copy-completed-notebooks.yml Outdated Show resolved Hide resolved
Comment on lines 88 to 89
# note that at least one notebook is only .html, NOT .nb.html, so we'll just use .html
[[ $path != *.html ]] && echo "'$path' is not an .html file." && exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that you used *html for the search, not *.html; it would probably be better to be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using [0-9]*.html throughout for copying, and we no longer have searching, because the point is to use both repos if we have both repos!

@sjspielman
Copy link
Member Author

My main comment here is that if you already checked out both repos, why use curl?

🤦‍♀️ is my answer I think...!

- Call directory website, not main
- Bump PR action to @5, remove safe directory creation
- Stop curling, and start copying, because _of course_
- Add bulk RNAseq
- Fancier PR comment
- Use case statement to simplify the code while still only getting a given workshop's files
@sjspielman
Copy link
Member Author

After my first round of excitingly using a different strategy from exercise notebooks, and then almost impressively forgetting I was taking that strategy half-way through writing 🤪, here is round 2!

Again, this was tested in a separate private repo of mine to save us all from spam, hence the single commit (which to be clear I just taught a workshop about how this was a bad practice, but here we are!).

I directly incorporated suggestions myself, including main -> website (thanks, good call), bumping the action to @5, etc. I also did simplify the code, but I decided in the end to use a case statement for this; I think this is much easier to read than the if statements. So, we still get notebooks for a given workshop only (including bulk).

Let me know what you think now, but again, no rush on re-reviewing this today if that's not good for your schedule!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a little suggestion below for future maintenance ease, I think.

Semi-related thought: As we are doing all this rejiggering, is it time to update the default branch to main for these repos?


- name: Configure git
run: |
# Configure git for just the `website` repo local credentials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no problem with the global settings. It seems safer than changing directories.

target_path=website/completed-notebooks/

# Copy notebooks over depending on which training module is being taught
case "${{ github.event.inputs.training-module }}" in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten how ugly case statements in bash are, but I guess it kind of makes sense here.

But I did like my for loop idea to have only one cp statement.

case "${{ github.event.inputs.training-module }}" in
  "Introduction to R and Tidyverse")
     modules=("intro-to-R-tidyverse")
     ;;
  "Introduction to single-cell RNA-seq")
     modules=("intro-to-R-tidyverse" "scRNA-seq")
     ;;

and so on...

followed by (syntax correct this time)

for module in ${modules[@]}
do 
  cp training-modules/${module}/[0-9]*.html ${target_path}
done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see, I like this!

@sjspielman
Copy link
Member Author

Semi-related thought: As we are doing all this rejiggering, is it time to update the default branch to main for these repos?

I had thought about this at one point, but got nervous about finding all the links in all the places since it's spread across several repos. As long as we don't delete master (which I don't think you're suggesting anyways!), then previously-used links should work just fine I suppose. I might call this a "nice to have but let's circle back towards the end of this epic and see what the mess looks like."

@jashapiro
Copy link
Member

Semi-related thought: As we are doing all this rejiggering, is it time to update the default branch to main for these repos?

I had thought about this at one point, but got nervous about finding all the links in all the places since it's spread across several repos. As long as we don't delete master (which I don't think you're suggesting anyways!), then previously-used links should work just fine I suppose. I might call this a "nice to have but let's circle back towards the end of this epic and see what the mess looks like."

Yeah, I guess I was thinking that this repo might be the one to start with? Nothing should link back to it that I am aware of...

@sjspielman
Copy link
Member Author

@jashapiro was there anything else here? I think this was pretty much set?

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more little changes (which I caught better because of the simplified structure, so that's a win).

.github/workflows/copy-completed-notebooks.yml Outdated Show resolved Hide resolved
.github/workflows/copy-completed-notebooks.yml Outdated Show resolved Hide resolved
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjspielman sjspielman merged commit 78a48a8 into sjspielman/122-workshop-resources Sep 12, 2023
@sjspielman sjspielman deleted the sjspielman/111-copy-completed-notebooks-gha branch September 12, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants