Skip to content
@opendatalab

OpenDataLab

OpenDataLab provides access to numerous significant open-source datasets.
 
OpenDataLab website HOT      OpenXLab platform TRY IT OUT
 

English🌎|简体中文🀄

OpenDataLab Provide ecology for high-quality datasets for community. It provides:

Extensive open data resources

● High-speed and simple way to access open datasets
● Large scale open datasets resources
● 1200+ Open datasets for Computer Vision\Large Model
● 200+ Open datasets by CVPR
● Categorized datasets for hot topics

Open-source data processing toolkits

● Data acquisition toolkits supporting large datasets
● Data acquisition toolkits supporting kinds of tasks
● Open source intelligent Toolbox for Labeling

Dataset description language

● Format standardization
● DSDL: Dataset Description Language
● Define a CV dataset by DSDL
● OpenDataLab Standardized 100+ CV Datasets

Check our tutorials videos (in Chinese) to get started.


In September this year, we have upgraded and launched the function of authors uploading datasets independently. We hereby invite you to participate in using it to better promote your open source datasets, AI research results, etc., so that more people can access, obtain and use your dataset.

This is an introduction to the dataset autonomous upload function 【help doc】,You can create and share your dataset according to our guidelines.

If you have any questions or obstacles, please feel free to contact us OpenDataLab@pjlab.org.cn.

Popular repositories

  1. WanJuan1.0 WanJuan1.0 Public

    万卷1.0多模态语料

    408 23

  2. labelU labelU Public

    Data annotation toolbox supports image, audio and video data.

    Python 180 25

  3. VIGC VIGC Public

    AAAI 2024: Visual Instruction Generation and Correction

    Python 68 3

  4. laion5b-downloader laion5b-downloader Public

    Python 58 5

  5. opendatalab-python-sdk opendatalab-python-sdk Public

    SDK of OpenDataLab - https://opendatalab.org.cn

    Python 52 4

  6. CLIP-Parrot-Bias CLIP-Parrot-Bias Public

    Parrot Captions Teach CLIP to Spot Text

    Python 50 2

Repositories

Showing 10 of 24 repositories
  • LabelLLM Public
    0 Apache-2.0 0 0 0 Updated May 14, 2024
  • Python 19 Apache-2.0 3 2 0 Updated May 13, 2024
  • UniMERNet Public

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Jupyter Notebook 34 Apache-2.0 2 0 0 Updated May 6, 2024
  • labelU-Kit Public

    Data annotation component library --provided as NPM packages

    TypeScript 35 Apache-2.0 10 1 0 Updated Apr 25, 2024
  • labelU Public

    Data annotation toolbox supports image, audio and video data.

    Python 180 25 4 0 Updated Apr 23, 2024
  • WanJuan2.0-WanJuan-CC Public

    WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。

    5 0 0 0 Updated Apr 18, 2024
  • H2RSVLM Public

    H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

    32 Apache-2.0 1 1 0 Updated Apr 1, 2024
  • MLS-BRN Public

    [CVPR 2024] 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

    17 Apache-2.0 0 1 0 Updated Mar 28, 2024
  • CHARM Public

    Chinese commonsense benchmark for LLMs

    9 Apache-2.0 1 0 0 Updated Mar 22, 2024
  • VIGC Public

    AAAI 2024: Visual Instruction Generation and Correction

    Python 68 Apache-2.0 3 1 0 Updated Feb 4, 2024