Fix use_cache for xla fsdp #30353

alanwaketan · 2024-04-19T18:47:58Z

What does this PR do?

use_cache cannot be used with gradient checkpointing. In PyTorch/XLA, we have to rely on our own gradient checkpointing function instead of the upstream one. Somehow, transformers regress and couldn't recognize our gradient checkpointing anymore. This PR fixes it.

Fixes #30155

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr @amyeroberts

alanwaketan · 2024-04-19T18:48:43Z

I don't know how to fix the issue with the use_cache parameter in the modeling code. Is that widely used?

HuggingFaceDocBuilderDev · 2024-04-19T21:04:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Hi @alanwaketan, thanks for handling this!

I think this is OK - it's consistent with logic in transformers modeling code, @muellerzr to confirm it's fine for trainer.

I don't know how to fix the issue with the use_cache parameter in the modeling code. Is that widely used?

@alanwaketan Could you specify which modeling code you're referring to?

Re the failing tests - there was a fix pushed on main. Rebasing should resolve

alanwaketan · 2024-04-22T19:44:13Z

Thanks, @amyeroberts. In most of the modeling code, I saw this use_cache is passed as a parameter:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L949

And then there is this check happened in the modeling code:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L967-L971

Since we currently still cannot directly use the upstream grad ckpt, we cannot relay on the gradient_checkpointing flag and then reuse this logic. And I also don't want to add a new flag and modify all modeling codes. lol

muellerzr

Agreed should be fine to me. Thanks!

alanwaketan · 2024-04-23T16:50:28Z

Thanks, @muellerzr. Can someone help me merge this?

amyeroberts · 2024-04-23T17:01:31Z

I can merge. Thanks again for fixing this!

alanwaketan · 2024-04-23T17:02:24Z

Thanks, @amyeroberts

amyeroberts approved these changes Apr 22, 2024

View reviewed changes

alanwaketan added 2 commits April 22, 2024 19:39

Fix use_cache for xla fsdp

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

03e60d9

Fix linters

Loading
Loading status checks…

6169d02

alanwaketan force-pushed the use_cache branch from e8e1f88 to 6169d02 Compare April 22, 2024 19:40

muellerzr approved these changes Apr 23, 2024

View reviewed changes

amyeroberts merged commit 12c39e5 into huggingface:main Apr 23, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix use_cache for xla fsdp #30353

Fix use_cache for xla fsdp #30353

alanwaketan commented Apr 19, 2024 •

edited

Loading

alanwaketan commented Apr 19, 2024

HuggingFaceDocBuilderDev commented Apr 19, 2024

amyeroberts left a comment

alanwaketan commented Apr 22, 2024 •

edited

Loading

muellerzr left a comment

alanwaketan commented Apr 23, 2024

amyeroberts commented Apr 23, 2024

alanwaketan commented Apr 23, 2024

Fix use_cache for xla fsdp #30353

Fix use_cache for xla fsdp #30353

Conversation

alanwaketan commented Apr 19, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

alanwaketan commented Apr 19, 2024

HuggingFaceDocBuilderDev commented Apr 19, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

alanwaketan commented Apr 22, 2024 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

alanwaketan commented Apr 23, 2024

amyeroberts commented Apr 23, 2024

alanwaketan commented Apr 23, 2024

alanwaketan commented Apr 19, 2024 •

edited

Loading

alanwaketan commented Apr 22, 2024 •

edited

Loading