-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskCluster docs for non-maintainers #586
Comments
We've been invested heavily in getting the system working for Mozilla infrastructure which uses Taskcluster. Unfortunately from what I hear from our Taskcluster support team is that it's hard to stand up your own Taskcluster instance. There are integrations with our infrastructure that allow project maintainers to push a branch to I'd really like to figure out a way to help you get this running. In our testing infrastructure we actually have a way to run the tasks through a run_task utility. Perhaps you could import that utility, into a python script and then process the full_task_graph.json to build a dependency graph. If you are running on your own managed machine it could be possible to just run everything locally on that machine. I'd be happy to hop on a call to discuss this further, or specify what work it would take here.
You would probably want to wait until I finish up work in PR #568, as that will make the docker image and run_task abstraction just work. If you run |
To be explicit on this request. I don't know that we have these steps or clear recommendations. |
hmmm if If I understand you right, you propose:
Is this right? To complicate it a little more, I would ideally want to try mimic your old snakemake solution on slurm:
|
@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it. @AmitMY and thank you for your continuous interest in our project! The complexity of the pipeline is quite high and it is indeed hard to spin it up. Some people from the University of Edinburgh and the University of Helsinki have successfully used it in the past on their infrastructure, so I think it's possible but requires some hacking. I hope we'll have resources to invest in making it more user-friendly in the future. |
In #568 I'm creating a virtual environment per requirements file, which could help solve that, and we're working on having a Docker environment where everything works. |
The prerequisite for this would be spinning up a Taskcluster instance. This is not impossible (there's a handful of non-Mozilla installations already), but I get the sense that it's not a practical about for you, @AmitMY.
Yeah, if we want to the pipeline generally usable, this is the most practical way IMO. (Either to snakemake or it could even be a conversion to something like Metaflow. The main point is that we'd want something that can reliably dump the current DAG / task payloads into a more generally useful format.) |
Also, depending on what you are trying to do, we could discuss about collaboration, access and support. |
It's me again, I've been trying to get this repository to work for over two years now.
This project still seems like the best machine translation project to train models for production, and for offline use.
Since you completely ditched
snakemake
, I'll try to gettaskcluster
to work again.The documentation: https://mozilla.github.io/firefox-translations-training/task-cluster.html seems to be aimed at maintainers of this specific repository.
Could you please add some information for people outside of this repo?
I expect step 1 to be "Fork this repository",
and then step 2 to be how to set up this repo with task cluster.
The text was updated successfully, but these errors were encountered: