[SegGPT] Fix loss calculation #30421

EduardoPach · 2024-04-23T11:56:58Z

What does this PR do?

This PR fixes #30419 and ensures that the loss is being correctly calculated.

While working on this PR I not only noticed that SegGptLoss was broken, but it was being incorrectly calculated (shame on SegGpt contributor 😞). Proposed solution include:

Passing labels to SegGptModel to correctly perform the forward pass when training in In-Context Painting style
Changed SegGptLoss forward method and its docstrings accordingly so that the output is the same as the one obtained in the original implementation.

Note

While running test_modeling_seggpt with is_training = True I found that gradient_checkpointing is also not working due to type_token_semantic parameter that is not used in the forward pass and is controlled by the embedding_type the model's forward and by default we use the type_token_instance just like the original implementation. Hence, we could probably move the embedding_type to config to allow gradient_checkpointing or remove it entirely as in the original implementation is not clear what is the use case for type_token_semantic

c.c. @amyeroberts

amyeroberts

Thanks for working on this!

Have we verified the loss value with that of the original model?

src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

EduardoPach · 2024-04-23T16:46:28Z

Thanks for working on this!

Have we verified the loss value with that of the original model?

Yeap, also added some tests to make sure this won't be an issue again 🙂.

Regarding gradinet_checkpoint while training, should this be resolved in this PR or a new one? It will probably involve re-uploading the checkpoint to the hub as we would need to take the type_token_semantic parameter from the weights

amyeroberts · 2024-04-24T09:47:19Z

Regarding gradinet_checkpoint while training, should this be resolved in this PR or a new one? It will probably involve re-uploading the checkpoint to the hub as we would need to take the type_token_semantic parameter from the weights

This should be done in a separate PR.

amyeroberts

Thanks for fixing!

Fixed main train issues

5b001b8

EduardoPach force-pushed the fix-seggpt-loss branch from 3b78bdb to 5b001b8 Compare April 23, 2024 12:11

amyeroberts reviewed Apr 23, 2024

View reviewed changes

src/transformers/models/seggpt/modeling_seggpt.py Outdated Show resolved Hide resolved

EduardoPach and others added 5 commits April 23, 2024 17:00

Added loss test

34be5d6

Update src/transformers/models/seggpt/modeling_seggpt.py

c274676

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Added missing labels arg in SegGptModel forward

Loading
Loading status checks…

b89941a

Fixed typo

Loading
Loading status checks…

db8adda

Added slow test to test loss calculation

Loading
Loading status checks…

8a74151

amyeroberts approved these changes Apr 24, 2024

View reviewed changes

amyeroberts merged commit d26c141 into huggingface:main Apr 24, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SegGPT] Fix loss calculation #30421

[SegGPT] Fix loss calculation #30421

EduardoPach commented Apr 23, 2024

amyeroberts left a comment

EduardoPach commented Apr 23, 2024

amyeroberts commented Apr 24, 2024

amyeroberts left a comment

[SegGPT] Fix loss calculation #30421

[SegGPT] Fix loss calculation #30421

Conversation

EduardoPach commented Apr 23, 2024

What does this PR do?

Note

amyeroberts left a comment

Choose a reason for hiding this comment

EduardoPach commented Apr 23, 2024

amyeroberts commented Apr 24, 2024

amyeroberts left a comment

Choose a reason for hiding this comment