Fix memory leak with CTC training script on Chinese languages #30358
+8
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a memory leak that occurs during the evaluation phase with the CTC training script when attempting to train a CTC model on a language with a very large vocabulary size. I ran into this issue while training for Chinese, but it may affect other languages as well. The problem arises because all the logits are accumulated on the GPU when running the evaluation, even though it is batched, so all the logits for the entire evaluation set are accumulated in the GPU. This is not too much of a problem for most languages, but it quickly runs out of CUDA memory when the vocabulary size is as large as it is in the case for Chinese.
Some community discussion of this problem here: https://discuss.huggingface.co/t/cuda-out-of-memory-when-using-trainer-with-compute-metrics/2941
The bug may be reproduced using this command:
cc @sanchit-gandhi