Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file overrides to the training continuation, and refactor the implementation #543

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gregtatum
Copy link
Member

@gregtatum gregtatum commented Apr 29, 2024

Edit: I added file overrides for this, to support vocabs and other file name mismatches. For instance in en-fi the final model was not a best-chrf but a best-perplexity. This allows for working around problems in the config itself.


I was looking into how training continuation was working, and was confused by some of the mis-direction in the code with iterators and dict comprehension. I refactored the code a bit to understand how things were working. I added some more validation with type-friendly dataclass and enum. I also wrote a few more docs on what was going on.

I didn't end up finishing a test for this, as I didn't want to spend more time on it, but I manually checked the artifacts/full-task-graph.json after running task preflight-check.

The code produces the equivalent mounts as the original code. I also filed #542 as I realized that ensemble training wasn't actually working.

@gregtatum gregtatum requested a review from a team as a code owner April 29, 2024 19:55

if len(pretrained_model.urls) != 1:
raise Exception(
"Multiple URLs are currently not supported for pretrained models. See Issue #542"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a follow-up rather than fix it here.

@gregtatum gregtatum changed the title Refactor the training continuation code Add file overrides to the training continuation, and refactor the implementation Apr 30, 2024
@gregtatum gregtatum requested a review from eu9ene April 30, 2024 16:51
Copy link
Collaborator

@eu9ene eu9ene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to write a small unit test for the artifact selection logic as we added more functionality to it?

As I wrote in another issue the OPUS-MT anomaly was a one-time thing and we don't plan to support it in the immediate future, but the vocabs are usually in a separate folder for older models so the ability to explicitly specify the path adds some convenience. I just copied things to have the expected structure in the past.

Also, I just remembered that OPUS-MT models required some special preprocessing, but I guess our student should be fine. We should double-check this PR.

@gregtatum
Copy link
Member Author

Yeah, I'll add a test. Originally I hadn't since it was just a refactor, but this now also adds functionality.

Copy link
Member

@gabrielBusta gabrielBusta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. FWIW - Ben is refactoring this mounts stuff on #546

@bhearsum
Copy link
Collaborator

bhearsum commented May 1, 2024

Looks good to me. FWIW - Ben is refactoring this mounts stuff on #546

We'll see! I'm not sure if I will be refactoring that in advance, or following up later. In any case, this can land and I will deal with any rebasing needed in my patch(es).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants