-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file overrides to the training continuation, and refactor the implementation #543
base: main
Are you sure you want to change the base?
Conversation
|
||
if len(pretrained_model.urls) != 1: | ||
raise Exception( | ||
"Multiple URLs are currently not supported for pretrained models. See Issue #542" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed a follow-up rather than fix it here.
7580c85
to
05c7cc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to write a small unit test for the artifact selection logic as we added more functionality to it?
As I wrote in another issue the OPUS-MT anomaly was a one-time thing and we don't plan to support it in the immediate future, but the vocabs are usually in a separate folder for older models so the ability to explicitly specify the path adds some convenience. I just copied things to have the expected structure in the past.
Also, I just remembered that OPUS-MT models required some special preprocessing, but I guess our student should be fine. We should double-check this PR.
Yeah, I'll add a test. Originally I hadn't since it was just a refactor, but this now also adds functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. FWIW - Ben is refactoring this mounts
stuff on #546
We'll see! I'm not sure if I will be refactoring that in advance, or following up later. In any case, this can land and I will deal with any rebasing needed in my patch(es). |
05c7cc5
to
69831bd
Compare
Edit: I added file overrides for this, to support vocabs and other file name mismatches. For instance in
en-fi
the final model was not abest-chrf
but abest-perplexity
. This allows for working around problems in the config itself.I was looking into how training continuation was working, and was confused by some of the mis-direction in the code with iterators and dict comprehension. I refactored the code a bit to understand how things were working. I added some more validation with type-friendly dataclass and enum. I also wrote a few more docs on what was going on.
I didn't end up finishing a test for this, as I didn't want to spend more time on it, but I manually checked the artifacts/full-task-graph.json after running
task preflight-check
.The code produces the equivalent mounts as the original code. I also filed #542 as I realized that ensemble training wasn't actually working.