Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection

Implementation of paper: "Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection". The implementation is mainly based on an Efficient, Flexible, and General deep learning framework, namely EFG.

Overview

Installation
Data
Get Started
Main Results
Citation

1. Installation

1.1 Prerequisites

gcc 5 - 7
python >= 3.6
cuda >= 10.2
pytorch >= 1.6

# spconv
spconv_cu11{X} (set X according to your cuda version)

# waymo_open_dataset
## python 3.6
waymo-open-dataset-tf-2-1-0==1.2.0

## python 3.7, 3.8
waymo-open-dataset-tf-2-4-0==1.3.1

1.2 Build from source

git clone https://github.com/Eaphan/STEMD.git
cd STEMD
pip install -v -e .
# set logging path to save model checkpoints, training logs, etc.
echo "export EFG_CACHE_DIR=/path/to/your/logs/dir" >> ~/.bashrc

2. Data

2.1 Data Preparation - Waymo

# download waymo dataset v1.2.0 (or v1.3.2, etc)
gsutil -m cp -r \
  "gs://waymo_open_dataset_v_1_2_0_individual_files/testing" \
  "gs://waymo_open_dataset_v_1_2_0_individual_files/training" \
  "gs://waymo_open_dataset_v_1_2_0_individual_files/validation" \
  .

# extract frames from tfrecord to pkl
CUDA_VISIBLE_DEVICES=-1 python cli/data_preparation/waymo/waymo_converter.py --record_path "/path/to/waymo/training/*.tfrecord" --root_path "/path/to/waymo/train/"
CUDA_VISIBLE_DEVICES=-1 python cli/data_preparation/waymo/waymo_converter.py --record_path "/path/to/waymo/validation/*.tfrecord" --root_path "/path/to/waymo/val/"

# create softlink to datasets
cd /path/to/STEMD/datasets; ln -s /path/to/waymo/dataset/root waymo; cd ..
# create data summary and gt database from extracted frames
python cli/data_preparation/waymo/create_data.py --root-path datasets/waymo --split train --nsweeps 4
python cli/data_preparation/waymo/create_data.py --root-path datasets/waymo --split val --nsweeps 4

2.2 Data Preparation - nuScenes

# create softlink to datasets
cd /path/to/STEMD/datasets; ln -s /path/to/nuscenes/dataset/root nuscenes; cd ..
python cli/data_preparation/nuscenes/create_data.py --root-path datasets/nuscenes --version v1.0-trainval --nsweeps 31  # 1 sample frame + 30 sweeps frame (1.5s)

3. Get Started

3.1 Training & Evaluation

cd playground/detection.3d/waymo/stemd/STEMD.waymo.resnet18.cdn.epoch12

efg_run --num-gpus x  # default 1
efg_run --num-gpus x task [train | val | test]
efg_run --num-gpus x --resume
efg_run --num-gpus x dataloader.num_workers 0  # dynamically change options in config.yaml

Models will be evaluated automatically at the end of training. Or,

efg_run --num-gpus x task val

4. Main Results

All models are trained and evaluated on 8 x NVIDIA A100 GPUs.

Waymo Open Dataset - 3D Object Detection (val L2- mAP/mAPH)

Methods	Frames	Schedule	VEHICLE	PEDESTRIAN	CYCLIST
STEMD	4	12	72.4/72.0	78.0/74.7	78.0/76.9

5. Citation

@misc{zhu2023efg,
    title={EFG: An Efficient, Flexible, and General deep learning framework that retains minimal},
    author={EFG Contributors},
    howpublished = {\url{https://github.com/poodarchu/efg}},
    year={2023}
}
@article{zhang2023spatial,
  title={Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection},
  author={Zhang, Yifan and Zhu, Zhiyu and Hou, Junhui and Wu, Dapeng},
  journal={arXiv preprint arXiv:2307.00347},
  year={2023}
}
@inproceedings{zhu2023conquer,
  title={Conquer: Query contrast voxel-detr for 3d object detection},
  author={Zhu, Benjin and Wang, Zhe and Shi, Shaoshuai and Xu, Hang and Hong, Lanqing and Li, Hongsheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9296--9305},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
cli		cli
datasets		datasets
efg		efg
playground		playground
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

License

Eaphan/STEMD

Folders and files

Latest commit

History

Repository files navigation

Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection

Overview

1. Installation

1.1 Prerequisites

1.2 Build from source

2. Data

2.1 Data Preparation - Waymo

2.2 Data Preparation - nuScenes

3. Get Started

3.1 Training & Evaluation

4. Main Results

Waymo Open Dataset - 3D Object Detection (val L2- mAP/mAPH)

5. Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages