You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For training multimodal models such as BLIP2, CLIP, and SEED, an image-text pair dataset is essential. However, it appears that DAli currently lacks a reader for the msoco dataset, which is crucial for preprocessing such datasets.
(coco_karpathy_train.json etc..)
Considering the frequent use of pair datasets in multimodal tasks, this feature is extremely important. Are there any plans to develop this reader? Or is it possible that it has already been developed, and I might have missed it in the documentation?
Thank you for your attention.
Best regards,
Daehan
Check for duplicates
I have searched the open bugs/issues and have found no duplicates for this bug report
The text was updated successfully, but these errors were encountered:
Thank you for reaching out. As we see more and more custom data reading patterns and more variety of data set formats it is not feasible to cover them all. That is why we developed and recommend using the external source operator. It allows to define data reading in python efficiently.
Describe the question.
Hello, NVIDIA DALI team
For training multimodal models such as BLIP2, CLIP, and SEED, an image-text pair dataset is essential. However, it appears that DAli currently lacks a reader for the msoco dataset, which is crucial for preprocessing such datasets.
(coco_karpathy_train.json etc..)
Considering the frequent use of pair datasets in multimodal tasks, this feature is extremely important. Are there any plans to develop this reader? Or is it possible that it has already been developed, and I might have missed it in the documentation?
Thank you for your attention.
Best regards,
Daehan
Check for duplicates
The text was updated successfully, but these errors were encountered: