Skip to content

jacobmarks/image-deduplication-plugin

Repository files navigation

Image Deduplication Plugin

This plugin is a Python plugin that streamlines image deduplication workflows!

With this plugin, you can:

  • Find exact duplicate images using a hash function
  • Find near duplicate images using an embedding model and similarity threshold
  • View and interact with duplicate images in the App
  • Remove all duplicates, or keep a representative image from each duplicate set

Watch On Youtube

Video Thumbnail

Installation

fiftyone plugins download https://github.com/jacobmarks/image-deduplication-plugin

Operators

find_approximate_duplicate_images

find_approx_dups

This operator finds near-duplicate images in a dataset using a specified similarity index paired with either a distance threshold or a fraction of samples to mark as duplicates.

find_exact_duplicate_images

find_exact_dups

This operator finds exact duplicate images in a dataset using a hash function.

display_approximate_duplicate_groups

display_approx_dups

This operator displays the images in a dataset that are near-duplicates of each other, grouped together.

display_exact_duplicate_groups

display_exact_dups

This operator displays the images in a dataset that are exact duplicates of each other, grouped together.

remove_all_approximate_duplicates

remove_approx_dups

This operator removes all near-duplicate images from a dataset.

remove_all_exact_duplicates

remove_exact_dups

This operator removes all exact duplicate images from a dataset.

deduplicate_approximate_duplicates

dedup_approx_dups

This operator removes near-duplicate images from a dataset, keeping a representative image from each duplicate set.

deduplicate_exact_duplicates

dedup_exact_dups

This operator removes exact duplicate images from a dataset, keeping a representative image from each duplicate set.

Releases

No releases published

Packages

No packages published

Languages