Add file overrides to the training continuation, and refactor the implementation #543

gregtatum · 2024-04-29T19:55:48Z

Edit: I added file overrides for this, to support vocabs and other file name mismatches. For instance in en-fi the final model was not a best-chrf but a best-perplexity. This allows for working around problems in the config itself.

I was looking into how training continuation was working, and was confused by some of the mis-direction in the code with iterators and dict comprehension. I refactored the code a bit to understand how things were working. I added some more validation with type-friendly dataclass and enum. I also wrote a few more docs on what was going on.

I didn't end up finishing a test for this, as I didn't want to spend more time on it, but I manually checked the artifacts/full-task-graph.json after running task preflight-check.

The code produces the equivalent mounts as the original code. I also filed #542 as I realized that ensemble training wasn't actually working.

taskcluster/translations_taskgraph/transforms/training_continuation.py

gregtatum · 2024-04-29T19:56:59Z

taskcluster/translations_taskgraph/transforms/training_continuation.py

+
+    if len(pretrained_model.urls) != 1:
+        raise Exception(
+            "Multiple URLs are currently not supported for pretrained models. See Issue #542"


I filed a follow-up rather than fix it here.

eu9ene

Would it be possible to write a small unit test for the artifact selection logic as we added more functionality to it?

As I wrote in another issue the OPUS-MT anomaly was a one-time thing and we don't plan to support it in the immediate future, but the vocabs are usually in a separate folder for older models so the ability to explicitly specify the path adds some convenience. I just copied things to have the expected structure in the past.

Also, I just remembered that OPUS-MT models required some special preprocessing, but I guess our student should be fine. We should double-check this PR.

gregtatum · 2024-04-30T20:49:45Z

Yeah, I'll add a test. Originally I hadn't since it was just a refactor, but this now also adds functionality.

gabrielBusta

Looks good to me. FWIW - Ben is refactoring this mounts stuff on #546

bhearsum · 2024-05-01T15:19:28Z

Looks good to me. FWIW - Ben is refactoring this mounts stuff on #546

We'll see! I'm not sure if I will be refactoring that in advance, or following up later. In any case, this can land and I will deal with any rebasing needed in my patch(es).

gregtatum requested a review from gabrielBusta April 29, 2024 19:55

gregtatum requested a review from a team as a code owner April 29, 2024 19:55

gregtatum commented Apr 29, 2024

View reviewed changes

gregtatum force-pushed the training-continuation branch from 7580c85 to 05c7cc5 Compare April 30, 2024 16:50

gregtatum changed the title ~~Refactor the training continuation code~~ Add file overrides to the training continuation, and refactor the implementation Apr 30, 2024

gregtatum requested a review from eu9ene April 30, 2024 16:51

eu9ene reviewed Apr 30, 2024

View reviewed changes

gabrielBusta approved these changes May 1, 2024

View reviewed changes

gregtatum added 3 commits May 15, 2024 09:53

Refactor the training continuation code

effbe2c

Add file overrides for training continuation

45780d5

Remove erroneous chunks

69831bd

gregtatum force-pushed the training-continuation branch from 05c7cc5 to 69831bd Compare May 15, 2024 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file overrides to the training continuation, and refactor the implementation #543

Add file overrides to the training continuation, and refactor the implementation #543

gregtatum commented Apr 29, 2024 •

edited

gregtatum Apr 29, 2024

eu9ene left a comment •

edited

gregtatum commented Apr 30, 2024

gabrielBusta left a comment

bhearsum commented May 1, 2024

Add file overrides to the training continuation, and refactor the implementation #543

Are you sure you want to change the base?

Add file overrides to the training continuation, and refactor the implementation #543

Conversation

gregtatum commented Apr 29, 2024 • edited

gregtatum Apr 29, 2024

Choose a reason for hiding this comment

eu9ene left a comment • edited

Choose a reason for hiding this comment

gregtatum commented Apr 30, 2024

gabrielBusta left a comment

Choose a reason for hiding this comment

bhearsum commented May 1, 2024

gregtatum commented Apr 29, 2024 •

edited

eu9ene left a comment •

edited