Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[taskcluster:error] Error uploading "public/build/tmp/aln.fwd" artifact. ext.certificate.expiry < now #582

Open
eu9ene opened this issue May 9, 2024 · 3 comments
Labels
taskcluster Issues related to the Taskcluster implementation of the training pipeline

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented May 9, 2024

New Taskcluster error: https://firefox-ci-tc.services.mozilla.com/tasks/aJKhHjeGSzmLPU7p9V8sLg/runs/10/logs/public/logs/live.log

@eu9ene eu9ene added the taskcluster Issues related to the Taskcluster implementation of the training pipeline label May 9, 2024
@bhearsum
Copy link
Collaborator

What this looks like is the artifact is taking a very, very long time to upload, so long that the credentials being used to upload it have expired. It looks like we managed to upload 2.8GB of it, and it also looks like it tracks with the overall corpus size...so presumably a large file is expected.

This is probably a freak occurence for two reasons:

  • It looks like we were uploading at ~2MB/sec, which is quite slow AFAIK. I assume this is just GCP gremlins.
  • This happened on docker-worker, which AFAIK doesn't refresh its credentials.

We probably can't do anything about the first thing. We just moved these tasks (and other docker-worker tasks) to generic-worker in #561), which does refresh its credentials. So, even if we hit slow uploads again it shouldn't cause a failure.

@eu9ene
Copy link
Collaborator Author

eu9ene commented May 29, 2024

Another instance of this in our production run: https://firefox-ci-tc.services.mozilla.com/tasks/GhxiBn7ARVej3AUHwb9h1w

@bhearsum
Copy link
Collaborator

Unfortunately, I don't think there's anything tractable here other than switching to generic-worker :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
taskcluster Issues related to the Taskcluster implementation of the training pipeline
Projects
None yet
Development

No branches or pull requests

2 participants