Skip to content

DirtyHarryLYL/HOI-Learning-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

HOI-Learning-List

Some recent (2015-now) Human-Object Interaction Learning studies. If you find any errors or problems, please don't hesitate to comment.

A list of Transfomer-based vision works: https://github.com/DirtyHarryLYL/Transformer-in-Vision.

Image Dataset/Benchmark

More...

Video HOI Datasets

3D HOI Datasets

Survey

  • Human object interaction detection: Design and survey (Image and Vision Computing 2022), [Paper]

Method

HOI Image Generation

  • VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing (arXiv 2024.5) [Paper] [Code]

  • Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [Paper]

  • Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [Paper]

HOI Recognition: Image-based, to recognize all the HOIs in one image.

More...

Unseen or zero-shot learning (image-level recognition).

  • HTS (ICIP 2023) [Paper]

  • ICompass (ICCV2021) [Paper], [Code]

  • Compositional Learning for Human Object Interaction (ECCV2018) [Paper]

  • Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]

More...

HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.

More...

Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).

More...

Video HOI methods

  • SPDTP (arXiv, Jun 2022), [Paper]

  • V-HOI (arXiv, Jun 2022), [Paper]

  • Detecting Human-Object Relationships in Videos (ICCV2021) [Paper]

  • STIGPN (Aug 2021), [Paper], [Code]

  • VidHOI (May 2021), [Paper]

  • LIGHTEN (ACMMM2020) [Paper] [Code]

  • Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [Paper]

  • Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [Code] [Paper]

  • GPNN (ECCV2018) [Code] [Paper]

More...

3D HOI Reconstruction/Generation/Understanding

Result

Proposed by TIN (TPAMI version, Transferable Interactiveness Network). It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling). It has a more severe long-tailed data distribution thus is more difficult.

Detector: COCO pre-trained

Method mAP
iCAN 11.00
iCAN+NIS 13.13
TIN 15.38

HICO-DET:

1) Detector: COCO pre-trained

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
Shen et al. WACV2018 6.46 4.24 7.12 - - -
HO-RCNN WACV2018 7.81 5.37 8.54 10.41 8.94 10.85
InteractNet CVPR2018 9.94 7.16 10.77 - - -
Turbo AAAI2019 11.40 7.30 12.60 - - -
GPNN ECCV2018 13.11 9.34 14.23 - - -
Xu et. al ICCV2019 14.70 13.26 15.13 - - -
iCAN BMVC2018 14.84 10.45 16.15 16.26 11.33 17.73
Wang et. al. ICCV2019 16.24 11.16 17.75 17.73 12.78 19.21
Lin et. al IJCAI2020 16.63 11.30 18.22 19.22 14.56 20.61
Functional (suppl) AAAI2020 16.96 11.73 18.52 - - -
Interactiveness CVPR2019 17.03 13.42 18.11 19.17 15.51 20.26
No-Frills ICCV2019 17.18 12.17 18.68 - - -
RPNN ICCV2019 17.35 12.78 18.71 - - -
PMFNet ICCV2019 17.46 15.65 18.00 20.34 17.47 21.20
SIGN ICME2020 17.51 15.31 18.53 20.49 17.53 21.51
Interactiveness-optimized CVPR2019 17.54 13.80 18.65 19.75 15.70 20.96
Liu et.al. arXiv 17.55 20.61 - - - -
Wang et al. ECCV2020 17.57 16.85 17.78 21.00 20.74 21.08
UnionDet arXiv2023 17.58 11.72 19.33 19.76 14.68 21.27
In-GraphNet IJCAI-PRICAI 2020 17.72 12.93 19.31 - - -
HOID CVPR2020 17.85 12.85 19.34 - - -
MLCNet ICMR2020 17.95 16.62 18.35 22.28 20.73 22.74
SAG arXiv 18.26 13.40 19.71 - - -
Sarullo et al. arXiv 18.74 - - - - -
DRG ECCV2020 19.26 17.74 19.71 23.40 21.75 23.89
Analogy ICCV2019 19.40 14.60 20.90 - - -
VCL ECCV2020 19.43 16.55 20.29 22.00 19.09 22.87
VS-GATs arXiv 19.66 15.79 20.81 - - -
VSGNet CVPR2020 19.80 16.05 20.91 - - -
PFNet CVM 20.05 16.66 21.07 24.01 21.09 24.89
ATL(w/ COCO) CVPR2021 20.08 15.57 21.43 - - -
FCMNet ECCV2020 20.41 17.34 21.56 22.04 18.97 23.12
ACP ECCV2020 20.59 15.92 21.98 - - -
PD-Net ECCV2020 20.81 15.90 22.28 24.78 18.88 26.54
SG2HOI ICCV2021 20.93 18.24 21.78 24.83 20.52 25.32
TIN-PAMI TAPMI2021 20.93 18.95 21.32 23.02 20.96 23.42
ATL CVPR2021 21.07 16.79 22.35 - - -
PMN arXiv 21.21 17.60 22.29 - - -
IPGN TIP2021 21.26 18.47 22.07 - - -
DJ-RN CVPR2020 21.34 18.53 22.18 23.69 20.64 24.60
OSGNet IEEE Access 21.40 18.12 22.38 - - -
K-BAN arXiv2022 21.48 16.85 22.86 24.29 19.09 25.85
SCG+ODM ECCV2022 21.50 17.59 22.67 - - -
DIRV AAAI2021 21.78 16.38 23.39 25.52 20.84 26.92
SCG ICCV2021 21.85 18.11 22.97 - - -
HRNet TIP2021 21.93 16.30 23.62 25.22 18.75 27.15
ConsNet ACMMM2020 22.15 17.55 23.52 26.57 20.8 28.3
SKGHOI arXiv2023 22.61 15.87 24.62 - - -
IDN NeurIPS2020 23.36 22.47 23.63 26.43 25.01 26.85
QAHOI-Res50 arXiv2021 24.35 16.18 26.80 - - -
DOQ CVPR2022 25.97 26.09 25.93 - - -
STIP CVPR2022 28.81 27.55 29.18 32.28 31.07 32.64

2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)

The finetuned detector would learn to only detect the interactive humans and objects (with interactiveness), thus suppressing many wrong pairings (non-interactive human-object pairs) and boosting the performance.

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
UniDet ECCV2020 17.58 11.72 19.33 19.76 14.68 21.27
IP-Net CVPR2020 19.56 12.79 21.58 22.05 15.77 23.92
RR-Net arXiv 20.72 13.21 22.97 - - -
PPDM (paper) CVPR2020 21.10 14.46 23.09 - - -
PPDM (github-hourglass104) CVPR2020 21.73/21.94 13.78/13.97 24.10/24.32 24.58/24.81 16.65/17.09 26.84/27.12
Functional AAAI2020 21.96 16.43 23.62 - - -
SABRA-Res50 arXiv 23.48 16.39 25.59 28.79 22.75 30.54
VCL ECCV2020 23.63 17.21 25.55 25.98 19.12 28.03
ATL CVPR2021 23.67 17.64 25.47 26.01 19.60 27.93
PST ICCV2021 23.93 14.98 26.60 26.42 17.61 29.05
SABRA-Res50FPN arXiv 24.12 15.91 26.57 29.65 22.92 31.65
ATL(w/ COCO) CVPR2021 24.50 18.53 26.28 27.23 21.27 29.00
IDN NeurIPS2020 24.58 20.33 25.86 27.89 23.64 29.16
FCL CVPR2021 24.68 20.03 26.07 26.80 21.61 28.35
HOTR CVPR2021 25.10 17.34 27.42 - - -
FCL+VCL CVPR2021 25.27 20.57 26.67 27.71 22.34 28.93
OC-Immunity AAAI2022 25.44 23.03 26.16 27.24 24.32 28.11
ConsNet-F ACMMM2020 25.94 19.35 27.91 30.34 23.4 32.41
SABRA-Res152 arXiv 26.09 16.29 29.02 31.08 23.44 33.37
QAHOI-Res50 arXiv2021 26.18 18.06 28.61 - - -
Zou et al. CVPR2021 26.61 19.15 28.84 29.13 20.98 31.57
SKGHOI arXiv2023 26.95 21.28 28.56 - - -
RGBM arXiv2022 27.39 21.34 29.20 30.87 24.20 32.87
GTNet arXiv 28.03 22.73 29.61 29.98 24.13 31.73
K-BAN arXiv2022 28.83 20.29 31.31 31.05 21.41 33.93
AS-Net CVPR2021 28.87 24.25 30.25 31.74 27.07 33.14
QPIC-Res50 CVPR2021 29.07 21.85 31.23 31.68 24.14 33.93
GGNet CVPR2021 29.17 22.13 30.84 33.50 26.67 34.89
QPIC-CPC CVPR2022 29.63 23.14 31.57 - - -
QPIC-Res101 CVPR2021 29.90 23.92 31.69 32.38 26.06 34.27
SCG ICCV2021 29.26 24.61 30.65 32.87 27.89 34.35
MHOI TCSVT2022 29.67 24.37 31.25 31.87 27.28 33.24
PhraseHOI AAAI2022 30.03 23.48 31.99 33.74 27.35 35.64
CDT TNNLS 2023 30.48 25.48 32.37 - - -
SQAB Displays2023 30.82 24.92 32.58 33.58 27.19 35.49
MSTR CVPR2022 31.17 25.31 32.92 34.02 28.83 35.57
SSRT CVPR2022 31.34 24.31 33.32 - - -
OCN AAAI2022 31.43 25.80 33.11 - - -
SCG+ODM ECCV2022 31.65 24.95 33.65 - - -
DT CVPR2022 31.75 27.45 33.03 34.50 30.13 35.81
ParSe (COCO) NeurIPS2022 31.79 26.36 33.41 - - -
CATN (w/ Bert) CVPR2022 31.86 25.15 33.84 34.44 27.69 36.45
SQA ICASSP2023 31.99 29.88 32.62 35.12 32.74 35.84
CDN NeurIPS2021 32.07 27.19 33.53 34.79 29.48 36.38
STIP CVPR2022 32.22 28.15 33.43 35.29 31.43 36.45
DEFR arXiv2021 32.35 33.45 32.02 - - -
PQNet-L mmasia2022 32.45 27.80 33.84 35.28 30.72 36.64
CDN-s+HQM ECCV2022 32.47 28.15 33.76 - - -
UPT CVPR2022 32.62 28.62 33.81 36.08 31.41 37.47
OpenCat CVPR2023 32.68 28.42 33.75 - - -
Iwin ECCV2022 32.79 27.84 35.40 35.84 28.74 36.09
RLIP-ParSe (VG+COCO) NeurIPS2022 32.84 26.85 34.63 - - -
PR-Net arXiv2023 32.86 28.03 34.30 - - -
MUREN CVPR2023 32.87 28.67 34.12 35.52 30.88 36.91
SDT arXiv2022 32.97 28.49 34.31 36.32 31.90 37.64
HODN TMM2023 33.14 28.54 34.52 35.86 31.18 37.26
SG2HOI arxXiv2023 33.14 29.27 35.72 35.73 32.01 36.43
PDN PR2023 33.18 27.95 34.75 35.86 30.57 37.43
DOQ CVPR2022 33.28 29.19 34.50 - - -
IF CVPR2022 33.51 30.30 34.46 36.28 33.16 37.21
ICDT ICANN2023 34.01 27.60 35.92 36.29 29.88 38.21
PSN arXiv2023 34.02 29.44 35.39 - - -
KI2HOI arXiv2024 34.20 32.26 36.10 37.85 35.89 38.78
VIL+ ACMMM2023 34.21 30.58 35.30 37.67 34.88 38.50
Multi-Step ACMMM2023 34.42 30.03 35.73 37.71 33.74 38.89
OBPA-Net PRCV2023 34.63 32.83 35.16 36.78 35.38 38.04
MLKD WACV2024 34.69 31.12 35.74 - - -
HOICLIP CVPR2023 34.69 31.12 35.74 37.61 34.47 38.54
PViC w/ detr ICCV2023 34.69 32.14 35.45 38.14 35.38 38.97
GEN-VLKT+SCA arXiv2023 34.79 31.80 35.68 - - -
SBM PRCV2023 34.92 31.67 35.85 38.79 35.43 39.60
(w/ CLIP) CVPR2022 34.95 31.18 36.08 38.22 34.36 39.37
SOV-STG (res101) arXiv2023 35.01 30.63 36.32 37.60 32.77 39.05
PartMap ECCV2022 35.15 33.71 35.58 37.56 35.87 38.06
GFIN NN2023 35.28 31.91 36.29 38.80 35.48 39.79
CLIP4HOI NeurIPS2023 35.33 33.95 35.74 37.19 35.27 37.77
LOGICHOI NeurIPS2023 35.47 32.03 36.22 38.21 35.29 39.03
QAHOI-Swin-Large-ImageNet-22K arXiv2021 35.78 29.80 37.56 37.59 31.66 39.36
DPADN AAAI2024 35.91 35.82 35.94 38.99 39.61 38.80
-L + CQL CVPR2023 36.03 33.16 36.89 38.82 35.51 39.81
HOICLIP+DP-HOI CVPR2024 36.56 34.36 37.22 - - -
AGER ICCV2023 36.75 33.53 37.71 39.84 35.58 40.23
FGAHOI arXiv2023 37.18 30.71 39.11 38.93 31.93 41.02
ViPLO CVPR2023 37.22 35.45 37.75 40.61 38.82 41.15
RmLR ICCV2023 37.41 28.81 39.97 38.69 31.27 40.91
HCVC arXiv2023 37.54 37.01 37.78 39.98 39.01 40.32
ADA-CM ICCV2023 38.40 37.52 38.66 - - -
UniVRD w/ extra data+VLM arXiv2023 38.61 33.39 40.16 - - -
SCTC AAAI2024 39.12 36.09 39.87 - - -
UniHOI NeurIPS2023 40.95 40.27 41.32 43.26 43.12 43.25
DiffHOI w/ syn data arXiv2023 41.50 39.96 41.96 43.62 41.41 44.28
SOV-STG (swin-l) arXiv2023 43.35 42.25 43.69 45.53 43.62 46.11
PViC w/ h-detr (swin-l) ICCV2023 44.32 44.61 44.24 47.81 48.38 47.64
RLIPv2-ParSeDA w/ extra data ICCV2023 45.09 43.23 45.64 - - -

3) Ground Truth human-object pair boxes (only evaluating HOI recognition)

Method Pub Full(def) Rare(def) None-Rare(def)
iCAN BMVC2018 33.38 21.43 36.95
Interactiveness CVPR2019 34.26 22.90 37.65
Analogy ICCV2019 34.35 27.57 36.38
ATL CVPR2021 43.32 33.84 46.15
IDN NeurIPS2020 43.98 40.27 45.09
ATL(w/ COCO) CVPR2021 44.27 35.52 46.89
FCL CVPR2021 45.25 36.27 47.94
GTNet arXiv 46.45 35.10 49.84
SCG ICCV2021 51.53 41.01 54.67
K-BAN arXiv2022 52.99 34.91 58.40
ConsNet ACMMM2020 53.04 38.79 57.3
ViPLO CVPR2023 62.09 59.26 62.93

4) Interactiveness detection (interactive or not + pair box detection):

Method Pub HICO-DET V-COCO
TIN++ TPAMI2022 14.35 29.36
PPDM CVPR2020 27.34 -
QPIC CVPR2021 32.96 38.33
CDN NeurIPS2021 33.55 40.13
PartMap ECCV2022 38.74 43.61

5) Enhanced with HAKE:

Method Pub Full(def) Rare(def) None-Rare(def) Full(ko) Rare(ko) None-Rare(ko)
iCAN BMVC2018 14.84 10.45 16.15 16.26 11.33 17.73
iCAN + HAKE-HICO-DET CVPR2020 19.61 (+4.77) 17.29 20.30 22.10 20.46 22.59
Interactiveness CVPR2019 17.03 13.42 18.11 19.17 15.51 20.26
Interactiveness + HAKE-HICO-DET CVPR2020 22.12 (+5.09) 20.19 22.69 24.06 22.19 24.62
Interactiveness + HAKE-Large CVPR2020 22.66 (+5.63) 21.17 23.09 24.53 23.00 24.99

6) Zero-Shot HOI detection:

Unseen action-object combination scenario (UC)
Method Pub Detector Unseen(def) Seen(def) Full(def)
Shen et al. WACV2018 COCO 5.62 - 6.26
Functional AAAI2020 HICO-DET 11.31 ± 1.03 12.74 ± 0.34 12.45 ± 0.16
ConsNet ACMMM2020 COCO 16.99 ± 1.67 20.51 ± 0.62 19.81 ± 0.32
CDT TNNLS 2023 - 18.06 23.34 20.72
EoID AAAI2023 - 23.01±1.54 30.39±0.40 28.91±0.27
HOICLIP CVPR2023 - 25.53 34.85 32.99
KI2HOI arXiv2024 - 27.43 35.76 34.56
CLIP4HOI NeurIPS2023 - 27.71 33.25 32.11
VCL (NF-UC) ECCV2020 HICO-DET 16.22 18.52 18.06
ATL(w/ COCO) ((NF-UC)) CVPR2021 HICO-DET 18.25 18.78 18.67
FCL (NF-UC) CVPR2021 HICO-DET 18.66 19.55 19.37
RLIP-ParSe (NF-UC) NeurIPS2022 COCO, VG 20.27 27.67 26.19
SCL arxiv HICO-DET 21.73 25.00 24.34
OpenCat(NF-UC) CVPR2023 HICO-DET 23.25 28.04 27.08
GEN-VLKT* (NF-UC) CVPR2022 HICO-DET 25.05 23.38 23.71
EoID (NF-UC) AAAI2023 HICO-DET 26.77 26.66 26.69
HOICLIP (NF-UC) CVPR2023 HICO-DET 26.39 28.10 27.75
LOGICHOI (NF-UC) NeurIPS2023 - 26.84 27.86 27.95
Wu et.al. (NF-UC) AAAI2024 - 27.35 22.09 23.14
UniHOI (NF-UC) NeurIPS2023 - 28.45 32.63 31.79
KI2HOI (NF-UC) arXiv2024 - 28.89 28.31 27.77
DiffHOI w/ syn data (NF-UC) arXiv2023 HICO-DET + syn data 29.45 31.68 31.24
HCVC (NF-UC) arXiv2023 - 28.44 31.35 30.77
CLIP4HOI (NF-UC) NeurIPS2023 - 31.44 28.26 28.90
VCL (RF-UC) ECCV2020 HICO-DET 10.06 24.28 21.43
ATL(w/ COCO) ((RF-UC)) CVPR2021 HICO-DET 9.18 24.67 21.57
FCL (RF-UC) CVPR2021 HICO-DET 13.16 24.23 22.01
SCL (RF-UC) arxiv HICO-DET 19.07 30.39 28.08
RLIP-ParSe (RF-UC) NeurIPS2022 COCO, VG 19.19 33.35 30.52
GEN-VLKT* (RF-UC) CVPR2022 HICO-DET 21.36 32.91 30.56
OpenCat(RF-UC) CVPR2023 HICO-DET 21.46 33.86 31.38
Wu et.al. (RF-UC) AAAI2024 - 23.32 30.09 28.53
HOICLIP (RF-UC) CVPR2023 HICO-DET 25.53 34.85 32.99
LOGICHOI (RF-UC) NeurIPS2023 - 25.97 34.93 33.17
KI2HOI (RF-UC) arXiv2024 - 26.33 35.79 34.10
CLIP4HOI NeurIPS2023 - 28.47 35.48 34.08
UniHOI (RF-UC) NeurIPS2023 - 28.68 33.16 32.27
DiffHOI w/ syn data (RF-UC) arXiv2023 HICO-DET + syn data 28.76 38.01 36.16
HCVC (RF-UC) arXiv2023 - 30.95 37.16 35.87
RLIPv2-ParSeDA (RF-UC) ICCV2023 VG, COCO, O365 31.23 45.01 42.26
  • * indicates large Visual-Language model pretraining, \eg, CLIP.
  • For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.
Zero-shot* HOI detection without fine-tuning (NF)
Method Pub Backbone Dataset Detector Full Rare Non-Rare
RLIP-ParSeD NeurIPS2022 ResNet-50 COCO + VG DDETR 13.92 11.20 14.73
RLIP-ParSe NeurIPS2022 ResNet-50 COCO + VG DETR 15.40 15.08 15.50
RLIPv2-ParSeDA ICCV2023 Swin-L VG+COCO+O365 DDETR 23.29 27.97 21.90
  • * indicates a formulation that assesses the generalization of a pre-training model to unseen distributions, proposed in RLIP. zero-shot follows the terminology from CLIP.
Unseen object scenario (UO)
Method Pub Detector Full(def) Seen(def) Unseen(def)
Functional AAAI2020 HICO-DET 13.84 14.36 11.22
FCL CVPR2021 HICO-DET 19.87 20.74 15.54
ConsNet ACMMM2020 COCO 20.71 20.99 19.27
Wu et.al. AAAI2024 - 27.73 27.87 27.05
ATL CVPR2021 - 15.11 21.54 20.47
GEN-VLKT CVPR2022 - 10.51 28.92 25.63
LOGICHOI NeurIPS2023 - 15.67 30.42 28.23
HOICLIP CVPR2023 - 16.20 30.99 28.53
KI2HOI arXiv2024 - 16.50 31.70 28.84
HCVC arXiv2023 - 16.78 33.31 30.53
CLIP4HOI NeurIPS2023 - 31.79 32.73 32.58
Unseen action scenario (UA)
Method Pub Detector Full(def) Seen(def) Unseen(def)
ConsNet ACMMM2020 COCO 19.04 20.02 14.12
CDT TNNLS 2023 - 19.68 21.45 15.17
Wu et.al. AAAI2024 - 26.43 28.13 17.92
EoID AAAI2023 - 29.22 30.46 23.04
Unseen action scenario (UV), results from EoID
Method Pub Detector Unseen(def) Seen(def) Full(def)
GEN-VLKT CVPR2022 - 20.96 30.23 28.74
EoID AAAI2023 - 22.71 30.73 29.61
HOICLIP CVPR2023 - 24.30 32.19 31.09
LOGICHOI NeurIPS2023 - 24.57 31.88 30.77
HCVC arXiv2023 - 24.69 36.11 34.51
KI2HOI arXiv2024 - 25.20 32.95 31.85
CLIP4HOI NeurIPS2023 - 26.02 31.14 30.42
UniHOI NeurIPS2023 - 26.05 36.78 34.68
Another setting
Method Pub Unseen Seen Full
Shen et. al. WACV2018 5.62 - 6.26
Functional AAAI2020 10.93 12.60 12.26
VCL ECCV2020 10.06 24.28 21.43
ATL CVPR2021 9.18 24.67 21.57
FCL CVPR2021 13.16 24.23 22.01
THID (w/ CLIP) CVPR2022 15.53 24.32 22.96
EoID AAAI2023 22.04 31.39 29.52
GEN-VLKT CVPR2022 21.36 32.91 30.56

7) Few-Shot HOI detection:

1% HICO-Det Data used in fine-tuning
Method Pub Backbone Dataset Detector Data Full Rare Non-Rare
RLIP-ParSeD NeurIPS2022 ResNet-50 COCO + VG DDETR 1% 18.30 16.22 18.92
RLIP-ParSe NeurIPS2022 ResNet-50 COCO + VG DETR 1% 18.46 17.47 18.76
RLIPv2-ParSeDA ICCV2023 Swin-L VG+COCO+O365 DDETR 1% 32.22 31.89 32.32
10% HICO-Det Data used in fine-tuning
Method Pub Backbone Dataset Detector Data Full Rare Non-Rare
RLIP-ParSeD NeurIPS2022 ResNet-50 COCO + VG DDETR 10% 22.09 15.89 23.94
RLIP-ParSe NeurIPS2022 ResNet-50 COCO + VG DETR 10% 22.59 20.16 23.32
RLIPv2-ParSeDA ICCV2023 Swin-L VG+COCO+O365 DDETR 10% 37.46 34.75 38.27

8) Weakly-supervised HOI detection:

Method Pub Backbone Dataset Detector Full Rare Non-Rare
Explanation-HOI ECCV2020 ResNeXt101 COCO FRCNN 10.63 8.71 11.20
MX-HOI WACV2021 ResNet-101 COCO FRCNN 16.14 12.06 17.50
PPR-FCN (from Weakly-HOI-CLIP) ICCV2017 ResNet-50, CLIP COCO FRCNN 17.55 15.69 18.41
Align-Former BMVC2021 ResNet-101 - - 20.85 18.23 21.64
Weakly-HOI-CLIP ICLR2023 ResNet-101, CLIP COCO FRCNN 25.70 24.52 26.05
OpenCat CVPR 2023 DETR - - 25.82 24.35 26.19

Detector: COCO pre-trained

Method mAP
iCAN 8.14
Interactiveness 8.22
Analogy(reproduced) 9.72
DJ-RN 10.37
OC-Immunity 10.45
Method Pub Non-Rare Unseen Seen Full
JSR ECCV2020 10.01 6.10 2.34 6.08
CHOID ICCV2021 10.93 6.63 2.64 6.64
QPIC CVPR2021 16.95 10.84 6.21 11.12
THID (w/ CLIP) CVPR2022 17.67 12.82 10.04 13.26

V-COCO: Scenario1

1) Detector: COCO pre-trained or one-stage detector

Method Pub AP(role)
Gupta et al. arXiv 31.8
InteractNet CVPR2018 40.0
Turbo AAAI2019 42.0
GPNN ECCV2018 44.0
UniVRD w/ extra data+VLM arXiv2023 45.19
iCAN BMVC2018 45.3
Xu et. al CVPR2019 45.9
Wang et. al. ICCV2019 47.3
UniDet ECCV2020 47.5
Interactiveness CVPR2019 47.8
Lin et. al IJCAI2020 48.1
VCL ECCV2020 48.3
Zhou et. al. CVPR2020 48.9
In-GraphNet IJCAI-PRICAI 2020 48.9
Interactiveness-optimized CVPR2019 49.0
TIN-PAMI TAPMI2021 49.1
IP-Net CVPR2020 51.0
DRG ECCV2020 51.0
RGBM arXiv2022 51.7
VSGNet CVPR2020 51.8
PMN arXiv 51.8
PMFNet ICCV2019 52.0
Liu et.al. arXiv 52.28
FCL CVPR2021 52.35
PD-Net ECCV2020 52.6
Wang et.al. ECCV2020 52.7
PFNet CVM 52.8
Zou et al. CVPR2021 52.9
SIGN ICME2020 53.1
ACP ECCV2020 52.98 (53.23)
FCMNet ECCV2020 53.1
HRNet TIP2021 53.1
SGCN4HOI IEEESMC2022 53.1
ConsNet ACMMM2020 53.2
IDN NeurIPS2020 53.3
SG2HOI ICCV2021 53.3
OSGNet IEEE Access 53.43
SABRA-Res50 arXiv 53.57
K-BAN arXiv2022 53.70
IPGN TIP2021 53.79
AS-Net CVPR2021 53.9
RR-Net arXiv 54.2
SCG ICCV2021 54.2
HOKEM arXiv2023 54.6
SABRA-Res50FPN arXiv 54.69
GGNet CVPR2021 54.7
MLCNet ICMR2020 55.2
HOTR CVPR2021 55.2
DIRV AAAI2021 56.1
UnionDet arXiv2023 56.2
SABRA-Res152 arXiv 56.62
PhraseHOI AAAI2022 57.4
GTNet arXiv 58.29
QPIC-Res101 CVPR2021 58.3
ADA-CM ICCV2023 58.57
QPIC-Res50 CVPR2021 58.8
ICDT ICANN2023 59.4
CATN (w/ fastText) CVPR2022 60.1
FGAHOI arXiv2023 60.5
Iwin ECCV2022 60.85
UPT-ResNet-101-DC5 CVPR2022 61.3
CDT TNNLS 2023 61.43
SBM PRCV2023 61.5
SDT arXiv2022 61.8
OpenCat CVPR2023 61.9
MSTR CVPR2022 62.0
ViPLO CVPR2023 62.2
Multi-Step ACMMM2023 62.4
PViC w/ detr ICCV2023 62.8
PR-Net arXiv2023 62.9
IF CVPR2022 63.0
ParMap ECCV2022 63.0
QPIC-CPC CVPR2022 63.1
DOQ CVPR2022 63.5
HOICLIP CVPR2023 63.5
GEN-VLKT (w/ CLIP) CVPR2022 63.58
SG2HOI arxXiv2023 63.6
QPIC+HQM ECCV2022 63.6
SOV-STG arXiv2023 63.9
KI2HOI arXiv2024 63.9
CDN NeurIPS2021 63.91
PViC w/ h-detr (swin-l) ICCV2023 64.1
OBPA-Net PRCV2023 64.1
RmLR ICCV2023 64.17
RLIP-ParSe (COCO+VG) NeurIPS2022 64.2
LOGICHOI NeurIPS2023 64.4
MHOI TCSVT2022 64.5
GEN-VLKT+SCA arXiv2023 64.5
PDN PR2023 64.7
ParSe (COCO) NeurIPS2022 64.8
SSRT CVPR2022 65.0
SQAB Displays2023 65.0
OCN AAAI2022 65.3
SQA ICASSP2023 65.4
AGER ICCV2023 65.68
DiffHOI arXiv2023 65.7
PSN arXiv2023 65.9
DPADN AAAI2024 62.62
STIP CVPR2022 66.0
DT CVPR2022 66.2
CLIP4HOI NeurIPS2023 66.3
GENs+DP-HOI CVPR2024 66.6
GEN-VLKT-L + CQL CVPR2023 66.8
HODN TMM2023 67.0
VIL+DisTR ACMMM2023 67.6
UniHOI NeurIPS2023 68.05
SCTC AAAI2024 68.2
HCVC arXiv2023 68.4
MUREN CVPR2023 68.8
GFIN NN2023 70.1
RLIPv2-ParSeDA w/ extra data ICCV2023 72.1

2) Enhanced with HAKE:

Method Pub AP(role)
iCAN CVPR2019 45.3
iCAN + HAKE-Large (transfer learning) CVPR2020 49.2 (+3.9)
Interactiveness CVPR2019 47.8
Interactiveness + HAKE-Large (transfer learning) CVPR2020 51.0 (+3.2)

3) Weakly-supervised HOI detection:

Method Pub Backbone Dataset Detector AP(role)-S1 AP(role)-S2
Weakly-HOI-CLIP ICLR2023 ResNet-101, CLIP COCO FRCNN 44.74 49.97

based on V-COCO

Method Pub Full Seen Unseen
VCL ECCV2020 23.53 8.29 35.36
ATL(w/ COCO) CVPR2021 23.40 8.01 35.34

HICO

1) Default

Method mAP
R*CNN 28.5
Girdhar et.al. 34.6
Mallya et.al. 36.1
RAM++ LLM 37.6
Pairwise 39.9
RelViT 40.12
DEFR-base 44.1
OpenTAP 51.7
DEFR-CLIP 60.5
HTS 60.5
DEFR/16 CLIP 65.6

2) Enhanced with HAKE:

Method mAP
Mallya et.al. 36.1
Mallya et.al.+HAKE-HICO 45.0 (+8.9)
Pairwise 39.9
Pairwise+HAKE-HICO 45.9 (+6.0)
Pairwise+HAKE-Large 46.3 (+6.4)