Deepstream-YOLO-Pose

YOLO-Pose accelerated with TensorRT and multi-streaming with Deepstream SDK

System Requirements

Python 3.8
- Should be already installed with Ubuntu 20.04
Ubuntu 20.04
CUDA 11.4 (Jetson)
TensorRT 8+

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

JetPack 5.1.1 / 5.1
NVIDIA DeepStream SDK
- Download and install from https://developer.nvidia.com/deepstream-download
DeepStream-Yolo

Deepstream Python Biding

Deepstream Python Biding

Gst-python and GstRtspServer

Installing GstRtspServer and introspection typelib
```
sudo apt update
sudo apt install python3-gi python3-dev python3-gst-1.0 -y
sudo apt-get install libgstrtspserver-1.0-0 gstreamer1.0-rtsp
```
For gst-rtsp-server (and other GStreamer stuff) to be accessible in Python through gi.require_version(), it needs to be built with gobject-introspection enabled (libgstrtspserver-1.0-0 is already). Yet, we need to install the introspection typelib package:
```
sudo apt-get install libgirepository1.0-dev
sudo apt-get install gobject-introspection gir1.2-gst-rtsp-server-1.0
```

Prepare YOLO-Pose Model

YOLO-pose architecture

source : YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

YOLOv7
- Gwencong/yolov7-pose-tensorrt
- nanmi/yolov7-pose
  - support single batch only
  - Some problems with /YoloLayer_TRT_v7.0/build/libyolo.so
    - The detection box is not synchronized with the screen on Jetson
YOLOv8

Prepare YOLOv8 TensorRT Engine

Choose yolov8-pose for better operator optimization of ONNX model
Base on triple-Mu/YOLOv8-TensorRT/Pose.md
The yolov8-pose model conversion route is : YOLOv8 PyTorch model -> ONNX -> TensorRT Engine

Notice !!! ⚠️ This repository don't support TensorRT API building !!!

0. Get `yolov8s-pose.pt`

https://github.com/ultralytics/ultralytics

Benchmark of YOLOv8-Pose

See Pose Docs for usage examples with these models.

Model	size ^(pixels)	mAP^pose 50-95	mAP^pose 50	Speed ^{CPU ONNX (ms)}	Speed ^{A100 TensorRT (ms)}	params ^(M)	FLOPs ^(B)
YOLOv8n-pose	640	50.4	80.1	131.8	1.18	3.3	9.2
YOLOv8s-pose	640	60.0	86.2	233.2	1.42	11.6	30.2
YOLOv8m-pose	640	65.0	88.8	456.3	2.00	26.4	81.0
YOLOv8l-pose	640	67.6	90.0	784.5	2.59	44.4	168.6
YOLOv8x-pose	640	69.2	90.2	1607.1	3.73	69.4	263.2
YOLOv8x-pose-p6	1280	71.6	91.2	4088.7	10.04	99.1	1066.4

mAP^val values are for single-model single-scale on COCO Keypoints val2017 dataset.
Reproduce by yolo val pose data=coco-pose.yaml device=0
Speed averaged over COCO val images using an Amazon EC2 P4d instance.
Reproduce by yolo val pose data=coco8-pose.yaml batch=1 device=0|cpu
Source : ultralytics

wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt

1. Pytorch Model to Onnx Model

Export Orin ONNX model by ultralytics You can leave this repo and use the original ultralytics repo for onnx export.
CLI tools(yolo command from "ultralytics.com")
- Recommended in your server to get faster speed ⚡
- ref : ultralytics.com/modes/export
- Usage(after pip3 install ultralytics):
```
yolo export model=yolov8s-pose.pt format=onnx device=0 \
            imgsz=640 \
            dynamic=true \
            simplify=true
```
After executing the above command, you will get an engine named yolov8s-pose.onnx too.

Move your Onnx Model to egdge device in specific path

put model on your edge device

sudo chmod u+rwx -R /opt/nvidia/deepstream/deepstream/samples/models # Add Write and execute permissions 
sudo mkdir -p tao_pretrained_models/YOLOv8-TensorRT 
sudo chmod u+rwx -R tao_pretrained_models/YOLOv8-TensorRT 

mv -v <path_of_your_yolov8-pose_model> /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT/yolov8s-pose-dy-sim-640.onnx

[Optional] Execute `netron yolov8s-pose.onnx` to view the model architecture

Check Model Ouputs
- Note that the number of anchors for YOLOv8-Pose is 56
  - bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
- The number of anchors of YOLOv7-Pose is 57
  - bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
Model registration information of YOLOv8S-Pose
- INPUTS : (batch, channel, height, width)
- OUTPUTS : (batch, anchors, max_outpus)

2. Onnx to TensorRT Engine with dynamic_batch

⚠️ Must be bound to a hardware device, please put it on your edge device(It's a long wait ⌛)
Specify parameters such as -minShapes --optShapes --maxShapes to set dynamic batch processing.

cd /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT 
sudo /usr/src/tensorrt/bin/trtexec --verbose \
    --onnx=yolov8s-pose-dy-sim-640.onnx \
    --fp16 \
    --workspace=4096 \
    --minShapes=images:1x3x640x640 \
    --optShapes=images:12x3x640x640 \
    --maxShapes=images:16x3x640x640 \
    --saveEngine=yolov8s-pose-dy-sim-640.engine

3. Test and Check Tensortrt Engine

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s-pose-dy.engine

or test with multi batch for dynamic shaped onnx model
- --shapes=spec Set input shapes for dynamic shapes inference inputs.
```
/usr/src/tensorrt/bin/trtexec  \
    --loadEngine=yolov8s-pose-dy-sim-640.engine \
    --shapes=images:12x3x640x640 
```
- Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine

model	device	size	batch	fps	ms
yolov8s-pose.engine	AGX Xavier	640	1	40.6	24.7
yolov8s-pose.engine	AGX Xavier	640	12	12.1	86.4
yolov8s-pose.engine	AGX Orin	640	1	258.8	4.2
yolov8s-pose.engine	AGX Orin	640	12	34.8	33.2
yolov7w-pose.engine*	AGX Xavier	960	1	19.0	52.1
yolov7w-pose.engine*	AGX Orin	960	1	61.1	16.8
yolov7w-pose.pt	AGX Xavier	960	1	14.4	59.8
yolov7w-pose.pt	AGX Xavier	960	1	11.8	69.4

* yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
test .engine(TensorRT) model with trtexec command.
test .pt model with Pytorch (with 15s video) for baseline.
NMS not included in all test

Basic usage

Download Ripository

git clone https://github.com/YunghuiHsu/deepstream-yolo-pose.git

To run the app with default settings:

NVInfer with rtsp inputs

python3 deepstream_YOLOv8-Pose_rtsp.py \ 
   -i  rtsp://sample_1.mp4 \
       rtsp://sample_2.mp4 \ 
       rtsp://sample_N.mp4  \

eg: loop with local file inputs

python3 deepstream_YOLOv8-Pose_rtsp.py \
    -i file:///home/ubuntu/video1.mp4 file:///home/ubuntu/video2.mp4 \
    -config dstest1_pgie_YOLOv8-Pose_config.txt \
    --file-loop

Default RTSP streaming location:
- rtsp://<server IP>:8554/ds-test
- VLC Player on client suggested(Camera Streaming and Multimedia)

Note:

if -g/--pgie : uses nvinfer as default. (['nvinfer', 'nvinferserver']).
-config/--config-file : need to be provided for custom models.
--file-loop : option can be used to loop input files after EOS.
--conf-thres : Objec Confidence Threshold
--iou-thres : IOU Threshold for NMS

This sample app is derived from NVIDIA-AI-IOT/deepstream_python_apps/apps and adds customization features

Includes following :
- Accepts multiple sources
- Dynamic batch model(YOLO-POSE)
- Accepts RTSP stream as input and gives out inference as RTSP stream
- NVInfer GPU inference engine
- NVInferserver GPU inference engine(Not yet tested)
- MultiObjectTracker(NVTracker)
- Automatically adjusts the tensor shape of the loaded input and output (NvDsInferTensorMeta)
- Extract the stream metadata, ~~image data~~ from the batched buffer of Gst-nvinfer
  
  source : deepstream-imagedata-multistream

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
configs		configs
imgs		imgs
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deepstream_YOLOv7-Pose_YoloLayer.py		deepstream_YOLOv7-Pose_YoloLayer.py
deepstream_YOLOv8-Pose_rtsp.py		deepstream_YOLOv8-Pose_rtsp.py

License

YunghuiHsu/deepstream-yolo-pose

Folders and files

Latest commit

History

Repository files navigation

Deepstream-YOLO-Pose

System Requirements

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

Deepstream Python Biding

Gst-python and GstRtspServer

Prepare YOLO-Pose Model

Prepare YOLOv8 TensorRT Engine

0. Get yolov8s-pose.pt

1. Pytorch Model to Onnx Model

[Optional] Execute netron yolov8s-pose.onnx to view the model architecture

2. Onnx to TensorRT Engine with dynamic_batch

3. Test and Check Tensortrt Engine

Basic usage

Download Ripository

To run the app with default settings:

Acknowledgements

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

0. Get `yolov8s-pose.pt`

[Optional] Execute `netron yolov8s-pose.onnx` to view the model architecture