You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Over in #580 we've been working on getting spot instances re-enabled. That part works correctly now, and training is continued from the last checkpoint, but I noticed in our latest run that reporting is turned off when resuming.
[task 2024-05-13T23:07:22.900Z] wandb: Currently logged in as: moz-translations-wandb-bot (moz-translations). Use `wandb login --relogin` to force relogin
[task 2024-05-13T23:07:23.803Z] wandb: wandb version 0.17.0 is available! To upgrade, please run:
[task 2024-05-13T23:07:23.803Z] wandb: $ pip install wandb --upgrade
[task 2024-05-13T23:07:23.803Z] wandb: Tracking run with wandb version 0.16.1
[task 2024-05-13T23:07:23.803Z] wandb: Run data is saved locally in /home/ubuntu/tasks/task_171564135626786/checkouts/vcs/pipeline/train/wandb/run-20240513_230722-eeqsrpg6
[task 2024-05-13T23:07:23.803Z] wandb: Run `wandb offline` to turn off syncing.
[task 2024-05-13T23:07:23.805Z] wandb: Syncing run backwards
[task 2024-05-13T23:07:23.805Z] wandb: ⭐️ View project at https://wandb.ai/moz-translations/en-ru
[task 2024-05-13T23:07:23.805Z] wandb: 🚀 View run at https://wandb.ai/moz-translations/en-ru/runs/eeqsrpg6
[task 2024-05-14T00:51:23.329Z] [tracking WARNING] This run already exists on W&B: [<Run moz-translations/en-ru/eeqsrpg6 (running)>]. No data will be published.
Note that when resuming we usually will end up redoing some work that happens after the last checkpoint, but before the termination happened -- I'm not sure the best way to handle this in W&B - I imagine others will have a better idea.
The text was updated successfully, but these errors were encountered:
Over in #580 we've been working on getting spot instances re-enabled. That part works correctly now, and training is continued from the last checkpoint, but I noticed in our latest run that reporting is turned off when resuming.
In run #1 we had:
But in run #2 we got:
Note that when resuming we usually will end up redoing some work that happens after the last checkpoint, but before the termination happened -- I'm not sure the best way to handle this in W&B - I imagine others will have a better idea.
The text was updated successfully, but these errors were encountered: