GitHub - yangsenius/TransPose: PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

Introduction

TransPose is a human pose estimation model based on a CNN feature extractor, a Transformer Encoder, and a prediction head. Given an image, the attention layers built in Transformer can efficiently capture long-range spatial relationships between keypoints and explain what dependencies the predicted keypoints locations highly rely on.

[arxiv 2012.14214] [paper] [demo-notebook]

TransPose: Keypoint Localization via Transformer, Sen Yang, Zhibin Quan, Mu Nie, Wankou Yang, ICCV 2021

Model Zoo

We choose two types of CNNs as the backbone candidates: ResNet and HRNet. The derived convolutional blocks are ResNet-Small, HRNet-Small-W32, and HRNet-Small-W48.

Model	Backbone	#Attention layers	d	h	#Heads	#Params	AP (coco val gt bbox)	Download
TransPose-R-A3	ResNet-S	3	256	1024	8	5.2Mb	73.8	model
TransPose-R-A4	ResNet-S	4	256	1024	8	6.0Mb	75.1	model
TransPose-H-S	HRNet-S-W32	4	64	128	1	8.0Mb	76.1	model
TransPose-H-A4	HRNet-S-W48	4	96	192	1	17.3Mb	77.5	model
TransPose-H-A6	HRNet-S-W48	6	96	192	1	17.5Mb	78.1	model

Quick use

Try out the Web Demo:

You can directly load TransPose-R-A4 or TransPose-H-A4 models with pretrained weights on COCO train2017 dataset from Torch Hub, simply by:

import torch

tpr = torch.hub.load('yangsenius/TransPose:main', 'tpr_a4_256x192', pretrained=True)
tph = torch.hub.load('yangsenius/TransPose:main', 'tph_a4_256x192', pretrained=True)

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Model	Input size	FPS*	GFLOPs	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
TransPose-R-A3	256x192	141	8.0	0.717	0.889	0.788	0.680	0.786	0.771	0.930	0.836	0.727	0.835
TransPose-R-A4	256x192	138	8.9	0.726	0.891	0.799	0.688	0.798	0.780	0.931	0.845	0.735	0.844
TransPose-H-S	256x192	45	10.2	0.742	0.896	0.808	0.706	0.810	0.795	0.935	0.855	0.752	0.856
TransPose-H-A4	256x192	41	17.5	0.753	0.900	0.818	0.717	0.821	0.803	0.939	0.861	0.761	0.865
TransPose-H-A6	256x192	38	21.8	0.758	0.901	0.821	0.719	0.828	0.808	0.939	0.864	0.764	0.872

Note:

we computed the average FPS* of testing 100 samples from coco val dataset (with batchsize=1) on a single NVIDIA 2080Ti GPU. The FPS may fluctuate up and down at different tests.
We trained our different models on different hardware platforms: 1 x RTX2080Ti GPUs (TP-R-A4), 4 x TiTan XP GPUs (TP-H-S, TP-H-A4), and 4 x Tesla P40 GPUs (TP-H-A6).

Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset

Model	Input size	#Params	GFLOPs	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
TransPose-H-S	256x192	8.0M	10.2	0.734	0.916	0.811	0.701	0.793	0.786	0.950	0.856	0.745	0.843
TransPose-H-A4	256x192	17.3M	17.5	0.747	0.919	0.822	0.714	0.807	0.799	0.953	0.866	0.758	0.854
TransPose-H-A6	256x192	17.5M	21.8	0.750	0.922	0.823	0.713	0.811	0.801	0.954	0.867	0.759	0.859

Visualization

Jupyter Notebook Demo

Given an input image, a pretrained TransPose model, and the predicted locations, we can visualize the spatial dependencies of the predicted locations with threshold for the attention scores.

TransPose-R-A4 with threshold=0.00

TransPose-R-A4 with threshold=0.01

TransPose-H-A4 with threshold=0.00

TransPose-H-A4 with threshold=0.00075

Getting started

Installation

Clone this repository, and we'll call the directory that you cloned as ${POSE_ROOT}
```
git clone https://github.com/yangsenius/TransPose.git
```
Install PyTorch>=1.6 and torchvision>=0.7 from the PyTorch official website
Install package dependencies. Make sure the python environment >=3.7
```
pip install -r requirements.txt
```
Make output (training models and files) and log (tensorboard log) directories under ${POSE_ROOT} & Make libs
```
mkdir output log
cd ${POSE_ROOT}/lib
make
```

Download pretrained models from the releases of this repo to the specified directory

${POSE_ROOT}
 `-- models
     `-- pytorch
         |-- imagenet
         |   |-- hrnet_w32-36af842e.pth
         |   |-- hrnet_w48-8ef0771d.pth
         |   |-- resnet50-19c8e357.pth
         |-- transpose_coco
         |   |-- tp_r_256x192_enc3_d256_h1024_mh8.pth
         |   |-- tp_r_256x192_enc4_d256_h1024_mh8.pth
         |   |-- tp_h_32_256x192_enc4_d64_h128_mh1.pth
         |   |-- tp_h_48_256x192_enc4_d96_h192_mh1.pth
         |   |-- tp_h_48_256x192_enc6_d96_h192_mh1.pth

Data Preparation

We follow the steps of HRNet to prepare the COCO train/val/test dataset and the annotations. The detected person results are downloaded from OneDrive or GoogleDrive. Please download or link them to ${POSE_ROOT}/data/coco/, and make them look like this:

${POSE_ROOT}/data/coco/
|-- annotations
|   |-- person_keypoints_train2017.json
|   `-- person_keypoints_val2017.json
|-- person_detection_results
|   |-- COCO_val2017_detections_AP_H_56_person.json
|   `-- COCO_test-dev2017_detections_AP_H_609_person.json
`-- images
	|-- train2017
	|   |-- 000000000009.jpg
	|   |-- ... 
	`-- val2017
		|-- 000000000139.jpg
		|-- ...

Traing & Testing

Testing on COCO val2017 dataset

python tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml TEST.USE_GT_BBOX True

Training on COCO train2017 dataset

python tools/train.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml

Acknowledgements

Great thanks for these papers and their open-source codes：HRNet, DETR, DarkPose

License

This repository is released under the MIT LICENSE.

Citation

If you find this repository useful please give it a star 🌟 or consider citing our work:

@inproceedings{yang2021transpose,
  title={TransPose: Keypoint Localization via Transformer},
  author={Yang, Sen and Quan, Zhibin and Nie, Mu and Yang, Wankou},
  booktitle={IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
experiments/coco		experiments/coco
lib		lib
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attention_map_image_dependency_transposeh_thres_0.0.jpg		attention_map_image_dependency_transposeh_thres_0.0.jpg
attention_map_image_dependency_transposeh_thres_0.00075.jpg		attention_map_image_dependency_transposeh_thres_0.00075.jpg
attention_map_image_dependency_transposer_thres_0.0.jpg		attention_map_image_dependency_transposer_thres_0.0.jpg
attention_map_image_dependency_transposer_thres_0.01.jpg		attention_map_image_dependency_transposer_thres_0.01.jpg
demo.ipynb		demo.ipynb
hubconf.py		hubconf.py
requirements.txt		requirements.txt
transpose_architecture.png		transpose_architecture.png
visualize.py		visualize.py

License

yangsenius/TransPose

Folders and files

Latest commit

History

Repository files navigation

Introduction

Model Zoo

Quick use

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset

Visualization

Getting started

Installation

Data Preparation

Traing & Testing

Testing on COCO val2017 dataset

Training on COCO train2017 dataset

Acknowledgements

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages