Skip to content

Latest commit

 

History

History
299 lines (267 loc) · 13.3 KB

README.md

File metadata and controls

299 lines (267 loc) · 13.3 KB

Introduction

This repo gives you a tutorial on how to use a custom backbone for Panoptic-DeepLab with Detectron2.

Installation

  • Install Detectron2 following the instructions.
  • Install panopticapi by: pip install git+https://github.com/cocodataset/panopticapi.git.
  • Note: you will need to install the latest Detectron2 after commit id fa1bc0. The latest v0.3 release of Detectron2 does not support DepthwiseSeparableConv2d and COCO dataset.

Demo

Visualization of Panoptic-DeepLab predictions from demo.py. Visualization of Panoptic-DeepLab

Dataset

Detectron2 has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
  coco/
  lvis/
  cityscapes/
  VOC20{07,12}/

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.

First, prepare the Cityscapes dataset following this expected dataset structure

cityscapes/
  gtFine/
    train/
      aachen/
        color.png, instanceIds.png, labelIds.png, polygons.json,
        labelTrainIds.png
      ...
    val/
    test/
    cityscapes_panoptic_train.json
    cityscapes_panoptic_train/
    cityscapes_panoptic_val.json
    cityscapes_panoptic_val/
    cityscapes_panoptic_test.json
    cityscapes_panoptic_test/
  leftImg8bit/
    train/
    val/
    test/

Install cityscapes scripts by:

pip install git+https://github.com/mcordts/cityscapesScripts.git

Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:

CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py

Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:

CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py

Backbone pre-trained weights

You probably need to use convert-pretrain-model-to-d2.py to convert your pre-trained backbone to the correct format first.

For Xception-65:

# download your pretrained model:
wget https://github.com/LikeLy-Journey/SegmenTron/releases/download/v0.1.0/tf-xception65-270e81cf.pth -O x65.pth
# run the conversion
python convert-pretrain-model-to-d2.py x65.pth x65.pkl

For HRNet-48:

# download your pretrained model:
wget https://optgaw.dm.files.1drv.com/y4mWNpya38VArcDInoPaL7GfPMgcop92G6YRkabO1QTSWkCbo7djk8BFZ6LK_KHHIYE8wqeSAChU58NVFOZEvqFaoz392OgcyBrq_f8XGkusQep_oQsuQ7DPQCUrdLwyze_NlsyDGWot0L9agkQ-M_SfNr10ETlCF5R7BdKDZdupmcMXZc-IE3Ysw1bVHdOH4l-XEbEKFAi6ivPUbeqlYkRMQ -O h48.pth
# run the conversion
python convert-pretrain-model-to-d2.py h48.pth h48.pkl

Panoptic-DeepLab example

Note: the only difference is we rename train_net.py to train_panoptic_deeplab.py.

Training

To train a model with 8 GPUs run:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Benchmark network speed

If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True

Detectron2 code structure

The decoder for Panoptic-DeepLab is defined in this file: https://github.com/facebookresearch/detectron2/blob/master/projects/Panoptic-DeepLab/panoptic_deeplab/panoptic_seg.py.
It includes both semantic branch and instance branch.

Cityscapes Panoptic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Regular Conv2d in ASPP and Decoder

Method Backbone Output
resolution
PQ SQ RQ mIoU AP download
Panoptic-DeepLab X65-DC5 1024×2048 62.6 81.5 75.7 79.4 32.8 model
Panoptic-DeepLab HRNet-48 1024×2048 63.3 82.2 76.0 80.3 35.9 model

Note:

  • X65: Xception-65. It is converted from TensorFlow model. You need to convert it with convert-pretrained-model-to-d2.py first.
  • DC5 means using dilated convolution in res5.
  • HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with convert-pretrained-model-to-d2.py first.
  • This implementation currently uses a much heavier head (with regular Conv2d) than the original paper.
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.

DepthwiseSeparableConv2d in ASPP and Decoder

Method Backbone Output
resolution
PQ SQ RQ mIoU AP download
Panoptic-DeepLab (DSConv) X65-DC5 1024×2048 61.4 81.4 74.3 79.8 32.6 model
Panoptic-DeepLab (DSConv) HRNet-48 1024×2048 63.4 81.9 76.4 80.6 36.2 model

Note:

  • This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.

COCO Panoptic Segmentation

COCO models are trained with ImageNet pretraining.

DepthwiseSeparableConv2d in ASPP and Decoder

Method Backbone Output
resolution
PQ SQ RQ Box AP Mask AP download
Panoptic-DeepLab (DSConv) X65-DC5 640×640 36.7 77.4 45.8 19.9 20.5 model
Panoptic-DeepLab (DSConv) HRNet-48 640×640 37.8 78.1 46.9 21.6 22.3 model

Note:

  • These results are trained with old COCO config files (with MAX_SIZE_TRAIN set to 640 instead of 960), I will try to update these numbers as soon as I have machines to train models
  • This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.
  • The reproduced numbers are still lower than the original paper, this is probably due to slightly different data preprocessing.

DeepLab example

Note: the only difference is we rename train_net.py to train_deeplab.py.

Training

To train a model with 8 GPUs run:

python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Cityscapes Semantic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Method Backbone Output
resolution
mIoU download
DeepLabV3+ X65-DC5 1024×2048 80.1 model
DeepLabV3+ HRNet-48 1024×2048 80.9 model

Note:

  • X65: Xception-65. It is converted from TensorFlow model. You need to convert it with convert-pretrained-model-to-d2.py first.
  • DC5 means using dilated convolution in res5.
  • HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with convert-pretrained-model-to-d2.py first.