Add ViTamin models #2169

Beckschen · 2024-05-05T06:54:27Z

Add the ViTamin model, which is trained on public DataComp-1B using OpenCLIP framework and obtains 82.9% zero-shot ImageNet-1K accuracy with 436M parameters. It achieves the state-of-the-art performance on zero-shot image classification, multi-modal retrieval, open-vocabulary detection and segmentation, and large multi-model models.

The code of ViTamin models are modified from vision_transformer_hybrid.py in the timm codebase.

This ViTamin work has been accepted to CVPR 2024 (https://arxiv.org/pdf/2404.02132).

HuggingFaceDocBuilderDev · 2024-05-05T15:38:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rwightman · 2024-05-05T16:50:02Z

@Beckschen thanks, probably a few more changes before the tests pass, if you get stuck I can help in a few days, for starter current failure, the dataclass init needs to use the default factory pattern as here: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/maxxvit.py#L137`

…g to Ross

Beckschen · 2024-05-14T19:11:37Z

Thanks very much, Ross @rwightman ! I've fixed the issue with the dataclass initialization. Could you please review it before proceeding with the merge? Thanks again!

rwightman · 2024-06-04T00:20:33Z

@Beckschen this required more changes so I've continued in another PR #2193 (which pulls these commits and adds my own), including an addition to the base vit model for xlarge (disable pos embed). I think it's working now but haven't done extensive checks... can add support to OpenCLIP now fairly easily, easier to verify it's correct there.

Beckschen · 2024-06-07T20:27:02Z

I'm truly grateful for your help, @rwightman ! I saw there are changes regarding the compatibility with vision_transformer.py and vision_transformer_hybrid.py . Thanks again!

The version is designed to support both timm and OpenCLIP. Thanks for merging the model configs in OpenCLIP.

Thanks again, @rwightman !

Best regards,
Jieneng

add ViTamin models

99d4c7d

rwightman mentioned this pull request May 13, 2024

Add FlashInternImage models #2167

Open

the dataclass init needs to use the default factory pattern, accordin…

df304ff

…g to Ross

Beckschen added 2 commits May 17, 2024 06:48

Add link to model weights on Hugging Face

530fb49

Add link to model weights on Hugging Face

7a2ad6b

rwightman mentioned this pull request Jun 4, 2024

More ViTamin changes #2193

Merged

rwightman mentioned this pull request Jun 6, 2024

Mega merge #2196

Merged

rwightman merged commit b2c0aeb into huggingface:main Jun 7, 2024
4 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ViTamin models #2169

Add ViTamin models #2169

Beckschen commented May 5, 2024

HuggingFaceDocBuilderDev commented May 5, 2024

rwightman commented May 5, 2024

Beckschen commented May 14, 2024 •

edited

rwightman commented Jun 4, 2024 •

edited

Beckschen commented Jun 7, 2024 •

edited

Add ViTamin models #2169

Add ViTamin models #2169

Conversation

Beckschen commented May 5, 2024

HuggingFaceDocBuilderDev commented May 5, 2024

rwightman commented May 5, 2024

Beckschen commented May 14, 2024 • edited

rwightman commented Jun 4, 2024 • edited

Beckschen commented Jun 7, 2024 • edited

Beckschen commented May 14, 2024 •

edited

rwightman commented Jun 4, 2024 •

edited

Beckschen commented Jun 7, 2024 •

edited