TaskCluster docs for non-maintainers #586

AmitMY · 2024-05-10T07:24:11Z

It's me again, I've been trying to get this repository to work for over two years now.
This project still seems like the best machine translation project to train models for production, and for offline use.

Since you completely ditched snakemake, I'll try to get taskcluster to work again.

The documentation: https://mozilla.github.io/firefox-translations-training/task-cluster.html seems to be aimed at maintainers of this specific repository.

Could you please add some information for people outside of this repo?
I expect step 1 to be "Fork this repository",
and then step 2 to be how to set up this repo with task cluster.

The text was updated successfully, but these errors were encountered:

gregtatum · 2024-05-10T16:32:34Z

We've been invested heavily in getting the system working for Mozilla infrastructure which uses Taskcluster. Unfortunately from what I hear from our Taskcluster support team is that it's hard to stand up your own Taskcluster instance.

There are integrations with our infrastructure that allow project maintainers to push a branch to github.com/mozilla/firefox-translations-training and then from the decision task, trigger training. This runs all of the Mozilla managed infrastructure.

I'd really like to figure out a way to help you get this running. In our testing infrastructure we actually have a way to run the tasks through a run_task utility. Perhaps you could import that utility, into a python script and then process the full_task_graph.json to build a dependency graph. If you are running on your own managed machine it could be possible to just run everything locally on that machine.

I'd be happy to hop on a call to discuss this further, or specify what work it would take here.

firefox-translations-training/tests/fixtures/__init__.py

Line 112 in 067ce65

def run_task(

You would probably want to wait until I finish up work in PR #568, as that will make the docker image and run_task abstraction just work.

If you run task preflight-check it will generate the artifacts/full-task-graph.json file which fully specifies the tasks.

gregtatum · 2024-05-10T16:44:22Z

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

AmitMY · 2024-05-10T17:06:11Z

hmmm if run_task can be local, then in theory it can just run on a single machine. I do see one difficulty with that, which is that each task requires its own environment.

If I understand you right, you propose:

task preflight-check to generate artifacts/full-task-graph.json
Iterate over the DAG, and somehow run_task based on the information in the json.

Is this right?

To complicate it a little more, I would ideally want to try mimic your old snakemake solution on slurm:

same
submit slurm jobs for all currently "available" task. the task will dump its "status" (success/error) in some file
whenever a job ends, it will trigger a re-check if new things are available to run

eu9ene · 2024-05-10T18:06:27Z

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

@AmitMY and thank you for your continuous interest in our project! The complexity of the pipeline is quite high and it is indeed hard to spin it up. Some people from the University of Edinburgh and the University of Helsinki have successfully used it in the past on their infrastructure, so I think it's possible but requires some hacking. I hope we'll have resources to invest in making it more user-friendly in the future.

gregtatum · 2024-05-10T18:30:06Z

which is that each task requires its own environment

In #568 I'm creating a virtual environment per requirements file, which could help solve that, and we're working on having a Docker environment where everything works.

bhearsum · 2024-05-13T10:55:21Z

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

The prerequisite for this would be spinning up a Taskcluster instance. This is not impossible (there's a handful of non-Mozilla installations already), but I get the sense that it's not a practical about for you, @AmitMY.

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

Yeah, if we want to the pipeline generally usable, this is the most practical way IMO. (Either to snakemake or it could even be a conversion to something like Metaflow. The main point is that we'd want something that can reliably dump the current DAG / task payloads into a more generally useful format.)

sylvestre · 2024-05-13T12:07:08Z

Also, depending on what you are trying to do, we could discuss about collaboration, access and support.
Don't hesitate to contact me : s@mozilla.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskCluster docs for non-maintainers #586

TaskCluster docs for non-maintainers #586

AmitMY commented May 10, 2024

gregtatum commented May 10, 2024

gregtatum commented May 10, 2024

AmitMY commented May 10, 2024

eu9ene commented May 10, 2024

gregtatum commented May 10, 2024

bhearsum commented May 13, 2024

sylvestre commented May 13, 2024

TaskCluster docs for non-maintainers #586

TaskCluster docs for non-maintainers #586

Comments

AmitMY commented May 10, 2024

gregtatum commented May 10, 2024

gregtatum commented May 10, 2024

AmitMY commented May 10, 2024

eu9ene commented May 10, 2024

gregtatum commented May 10, 2024

bhearsum commented May 13, 2024

sylvestre commented May 13, 2024