Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskCluster docs for non-maintainers #586

Open
AmitMY opened this issue May 10, 2024 · 7 comments
Open

TaskCluster docs for non-maintainers #586

AmitMY opened this issue May 10, 2024 · 7 comments

Comments

@AmitMY
Copy link
Contributor

AmitMY commented May 10, 2024

It's me again, I've been trying to get this repository to work for over two years now.
This project still seems like the best machine translation project to train models for production, and for offline use.

Since you completely ditched snakemake, I'll try to get taskcluster to work again.

The documentation: https://mozilla.github.io/firefox-translations-training/task-cluster.html seems to be aimed at maintainers of this specific repository.
image

Could you please add some information for people outside of this repo?
I expect step 1 to be "Fork this repository",
and then step 2 to be how to set up this repo with task cluster.

@gregtatum
Copy link
Member

We've been invested heavily in getting the system working for Mozilla infrastructure which uses Taskcluster. Unfortunately from what I hear from our Taskcluster support team is that it's hard to stand up your own Taskcluster instance.

There are integrations with our infrastructure that allow project maintainers to push a branch to github.com/mozilla/firefox-translations-training and then from the decision task, trigger training. This runs all of the Mozilla managed infrastructure.

I'd really like to figure out a way to help you get this running. In our testing infrastructure we actually have a way to run the tasks through a run_task utility. Perhaps you could import that utility, into a python script and then process the full_task_graph.json to build a dependency graph. If you are running on your own managed machine it could be possible to just run everything locally on that machine.

I'd be happy to hop on a call to discuss this further, or specify what work it would take here.

You would probably want to wait until I finish up work in PR #568, as that will make the docker image and run_task abstraction just work.

If you run task preflight-check it will generate the artifacts/full-task-graph.json file which fully specifies the tasks.

@gregtatum
Copy link
Member

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

@AmitMY
Copy link
Contributor Author

AmitMY commented May 10, 2024

hmmm if run_task can be local, then in theory it can just run on a single machine. I do see one difficulty with that, which is that each task requires its own environment.


If I understand you right, you propose:

  1. task preflight-check to generate artifacts/full-task-graph.json
  2. Iterate over the DAG, and somehow run_task based on the information in the json.

Is this right?


To complicate it a little more, I would ideally want to try mimic your old snakemake solution on slurm:

  1. same
  2. submit slurm jobs for all currently "available" task. the task will dump its "status" (success/error) in some file
  3. whenever a job ends, it will trigger a re-check if new things are available to run

@eu9ene
Copy link
Collaborator

eu9ene commented May 10, 2024

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

@AmitMY and thank you for your continuous interest in our project! The complexity of the pipeline is quite high and it is indeed hard to spin it up. Some people from the University of Edinburgh and the University of Helsinki have successfully used it in the past on their infrastructure, so I think it's possible but requires some hacking. I hope we'll have resources to invest in making it more user-friendly in the future.

@gregtatum
Copy link
Member

which is that each task requires its own environment

In #568 I'm creating a virtual environment per requirements file, which could help solve that, and we're working on having a Docker environment where everything works.

@bhearsum
Copy link
Collaborator

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

The prerequisite for this would be spinning up a Taskcluster instance. This is not impossible (there's a handful of non-Mozilla installations already), but I get the sense that it's not a practical about for you, @AmitMY.

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

Yeah, if we want to the pipeline generally usable, this is the most practical way IMO. (Either to snakemake or it could even be a conversion to something like Metaflow. The main point is that we'd want something that can reliably dump the current DAG / task payloads into a more generally useful format.)

@sylvestre
Copy link

Also, depending on what you are trying to do, we could discuss about collaboration, access and support.
Don't hesitate to contact me : s@mozilla.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants