Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate yaml schema and configs #597

Open
eu9ene opened this issue May 14, 2024 · 2 comments
Open

Consolidate yaml schema and configs #597

eu9ene opened this issue May 14, 2024 · 2 comments
Labels
refactoring taskcluster Issues related to the Taskcluster implementation of the training pipeline

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented May 14, 2024

Currently, when adding/changing a new setting in a Taskcluster experiment config we have to update it in multiple places:

  • train action schema
  • CI default parameters schema
  • CI default parameters values
  • Reference production config in taskcluster/configs
  • CI yaml config in taskcluster/configs that we currently don't use

We should consolidate to :

  • production reference YAML config (same as now)
  • CI YAML config in taskcluster/configs instead of the one in parameters json
  • one YAML schema that's used for validation in train action and elsewhere
@eu9ene eu9ene added refactoring taskcluster Issues related to the Taskcluster implementation of the training pipeline labels May 14, 2024
@eu9ene
Copy link
Collaborator Author

eu9ene commented May 14, 2024

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

@eu9ene eu9ene mentioned this issue May 15, 2024
@bhearsum
Copy link
Collaborator

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

Yes - these are used to generate graphs with and without some changes applied, and generate a useful to see how a code change affects graphs.

I agree that this could probably be reworked to pull in at least some things from a separate place. One of the advantages of having these concrete files, though, is that it allows us to have multiple versions. At the moment, we just have two with more and fewer datasets, but we could have variants with and without opuscleaner/opustrainer, with and without publication, with various training continuation configurations, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring taskcluster Issues related to the Taskcluster implementation of the training pipeline
Projects
None yet
Development

No branches or pull requests

2 participants