Add JetMoE model #30005

yikangshen · 2024-04-02T18:38:24Z

What does this PR do?

Add support to JetMoE architecture by Yikang Shen and MyShell AI.
JetMoE is a new sparsely activated architecture inspired by the ModuleFormer. Each JetMoE block consists of two MoE layers: a mixture of Attention Heads and a Mixture of MLP Experts. Given the input tokens, JetMoE activates a subset of its experts to process them. This sparse activation schema enables JetMoE to achieve much better training throughput than similar-sized dense models.

Who can review?

@ArthurZucker and @younesbelkada

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-05-10T06:04:26Z

Feel free to ping me whenever for another review! 🤗

yikangshen · 2024-05-10T17:32:07Z

Feel free to ping me whenever for another review! 🤗

Thanks @ArthurZucker. I have updated the code according to your suggestions. I hope the extra comments will make the code more clear.

ArthurZucker · 2024-05-13T06:48:05Z

Thanks! having a look 😉

ArthurZucker

🔥 Looks great! Thanks a lot for adressing all the comments and taking it into account! Left 2 nits but good to merge!

src/transformers/models/jetmoe/configuration_jetmoe.py

src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-05-14T10:20:46Z

Failing test is unrelated, should I merge? 🔥

yikangshen · 2024-05-14T14:22:55Z

Failing test is unrelated, should I merge? 🔥

Yes! I have tested offline with the following command:
RUN_SLOW=1 python -m pytest tests/models/jetmoe/test_modeling_jetmoe.py -vv
and all the tests are passed.

ArthurZucker · 2024-05-14T14:32:20Z

Congrats for this great work! We'll do a release on Thursday!

yikangshen · 2024-05-14T14:54:08Z

Congrats for this great work! We'll do a release on Thursday!

Thanks a lot for the review and comments! @ArthurZucker @gante @younesbelkada

yikangshen added 30 commits April 2, 2024 00:46

init jetmoe code

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

b63bcaf

Merge branch 'huggingface:main' into main

03f646e

update archive maps

Loading
Loading status checks…

ed52b57

remove flax import

Loading
Loading status checks…

150cd93

fix import error

436a44c

update README

bcf597f

ruff fix

5c0400e

update readme

e61d131

fix

57b13eb

update config

1f27ad4

fix issue

2ea5542

merge files

109a8c2

fix model bug

21a4c2d

fix test

9d542ac

auto fix

c5092b4

model size

41f2436

add comments

3052ce8

fix form

539cfb9

add flash attention support

0f6af1d

fix attention head number

165e20d

fix init

68633f9

fix support list

d39a0e9

sort auto mapping

ef62bf3

fix test

c0a3076

fix docs

4d79ce6

update test

e5336b5

fix test

67aedd1

fix test

c87de94

change variable name

2f02e7e

fix config

fc39dcc

yikangshen and others added 8 commits May 9, 2024 20:53

Update src/transformers/models/jetmoe/modeling_jetmoe.py

077e46a

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

193a9ef

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

14512fc

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

0bbfc87

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

ecb0337

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

7f6d529

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

add comments and reformat config

4b327ba

fix format

7f44751

yikangshen added 5 commits May 10, 2024 10:37

fix format

6c0ea95

Merge branch 'main' into main

a9e2c22

fix format

9c8081d

update test

cf17204

update doc string in config

41d1a70

ArthurZucker approved these changes May 13, 2024

View reviewed changes

src/transformers/models/jetmoe/configuration_jetmoe.py Outdated Show resolved Hide resolved

src/transformers/models/jetmoe/modeling_jetmoe.py Outdated Show resolved Hide resolved

src/transformers/models/jetmoe/modeling_jetmoe.py Show resolved Hide resolved

yikangshen and others added 7 commits May 13, 2024 10:57

Update src/transformers/models/jetmoe/modeling_jetmoe.py

58e5627

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

update config doc

8341eea

Merge branch 'main' of https://github.com/yikangshen/transformers

71a2939

update attention cache

9e8b759

Merge branch 'huggingface:main' into main

5c21dfe

fix format

1b8ed08

fix copy

060af34

ArthurZucker merged commit ccdabc5 into huggingface:main May 14, 2024
22 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JetMoE model #30005

Add JetMoE model #30005

yikangshen commented Apr 2, 2024

ArthurZucker commented May 10, 2024

yikangshen commented May 10, 2024

ArthurZucker commented May 13, 2024

ArthurZucker left a comment

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

Add JetMoE model #30005

Add JetMoE model #30005

Conversation

yikangshen commented Apr 2, 2024

What does this PR do?

Who can review?

ArthurZucker commented May 10, 2024

yikangshen commented May 10, 2024

ArthurZucker commented May 13, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024