Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JetMoE model #30005

Merged
merged 122 commits into from
May 14, 2024
Merged

Add JetMoE model #30005

merged 122 commits into from
May 14, 2024

Conversation

yikangshen
Copy link
Contributor

What does this PR do?

Add support to JetMoE architecture by Yikang Shen and MyShell AI.
JetMoE is a new sparsely activated architecture inspired by the ModuleFormer. Each JetMoE block consists of two MoE layers: a mixture of Attention Heads and a Mixture of MLP Experts. Given the input tokens, JetMoE activates a subset of its experts to process them. This sparse activation schema enables JetMoE to achieve much better training throughput than similar-sized dense models.

Who can review?

@ArthurZucker and @younesbelkada

yikangshen and others added 8 commits May 9, 2024 20:53
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@ArthurZucker
Copy link
Collaborator

Feel free to ping me whenever for another review! 🤗

@yikangshen
Copy link
Contributor Author

Feel free to ping me whenever for another review! 🤗

Thanks @ArthurZucker. I have updated the code according to your suggestions. I hope the extra comments will make the code more clear.

@ArthurZucker
Copy link
Collaborator

Thanks! having a look 😉

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Looks great! Thanks a lot for adressing all the comments and taking it into account! Left 2 nits but good to merge!

@ArthurZucker
Copy link
Collaborator

Failing test is unrelated, should I merge? 🔥

@yikangshen
Copy link
Contributor Author

Failing test is unrelated, should I merge? 🔥

Yes! I have tested offline with the following command:
RUN_SLOW=1 python -m pytest tests/models/jetmoe/test_modeling_jetmoe.py -vv
and all the tests are passed.

@ArthurZucker ArthurZucker merged commit ccdabc5 into huggingface:main May 14, 2024
22 of 24 checks passed
@ArthurZucker
Copy link
Collaborator

Congrats for this great work! We'll do a release on Thursday!

@yikangshen
Copy link
Contributor Author

Congrats for this great work! We'll do a release on Thursday!

Thanks a lot for the review and comments! @ArthurZucker @gante @younesbelkada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants