Request for msoco Dataset Reader for Multimodal Model Training #5474

leedaehan-kev · 2024-05-19T10:59:57Z

Describe the question.

Hello, NVIDIA DALI team

For training multimodal models such as BLIP2, CLIP, and SEED, an image-text pair dataset is essential. However, it appears that DAli currently lacks a reader for the msoco dataset, which is crucial for preprocessing such datasets.

(coco_karpathy_train.json etc..)

Considering the frequent use of pair datasets in multimodal tasks, this feature is extremely important. Are there any plans to develop this reader? Or is it possible that it has already been developed, and I might have missed it in the documentation?

Thank you for your attention.

Best regards,

Daehan

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

JanuszL · 2024-05-19T18:52:43Z

Hi @leedaehan-kev,

Thank you for reaching out. As we see more and more custom data reading patterns and more variety of data set formats it is not feasible to cover them all. That is why we developed and recommend using the external source operator. It allows to define data reading in python efficiently.

leedaehan-kev added the question Further information is requested label May 19, 2024

dali-automaton assigned JanuszL May 20, 2024

leedaehan-kev closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for msoco Dataset Reader for Multimodal Model Training #5474

Request for msoco Dataset Reader for Multimodal Model Training #5474

leedaehan-kev commented May 19, 2024 •

edited

JanuszL commented May 19, 2024

Request for msoco Dataset Reader for Multimodal Model Training #5474

Request for msoco Dataset Reader for Multimodal Model Training #5474

Comments

leedaehan-kev commented May 19, 2024 • edited

Describe the question.

Check for duplicates

JanuszL commented May 19, 2024

leedaehan-kev commented May 19, 2024 •

edited