Consolidate yaml schema and configs #597

eu9ene · 2024-05-14T22:55:25Z

Currently, when adding/changing a new setting in a Taskcluster experiment config we have to update it in multiple places:

train action schema
CI default parameters schema
CI default parameters values
Reference production config in taskcluster/configs
CI yaml config in taskcluster/configs that we currently don't use

We should consolidate to :

production reference YAML config (same as now)
CI YAML config in taskcluster/configs instead of the one in parameters json
one YAML schema that's used for validation in train action and elsewhere

The text was updated successfully, but these errors were encountered:

eu9ene · 2024-05-14T23:08:31Z

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

bhearsum · 2024-05-16T00:14:18Z

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

Yes - these are used to generate graphs with and without some changes applied, and generate a useful to see how a code change affects graphs.

I agree that this could probably be reworked to pull in at least some things from a separate place. One of the advantages of having these concrete files, though, is that it allows us to have multiple versions. At the moment, we just have two with more and fewer datasets, but we could have variants with and without opuscleaner/opustrainer, with and without publication, with various training continuation configurations, etc.

eu9ene added refactoring taskcluster Issues related to the Taskcluster implementation of the training pipeline labels May 14, 2024

eu9ene mentioned this issue May 15, 2024

Custom cleaning #547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate yaml schema and configs #597

Consolidate yaml schema and configs #597

eu9ene commented May 14, 2024

eu9ene commented May 14, 2024

bhearsum commented May 16, 2024

Consolidate yaml schema and configs #597

Consolidate yaml schema and configs #597

Comments

eu9ene commented May 14, 2024

eu9ene commented May 14, 2024

bhearsum commented May 16, 2024