Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing with all existing hashes in DB #45

Open
rw4lll opened this issue Jun 10, 2019 · 3 comments
Open

Comparing with all existing hashes in DB #45

rw4lll opened this issue Jun 10, 2019 · 3 comments

Comments

@rw4lll
Copy link

rw4lll commented Jun 10, 2019

Hi! It is works fine if I need to compare exactly two images. But what if I have 3 millions hashes in DB and need to compare my current hash with each other in database (find all similar from images that already exists). Is there any solution except direct comparing? May be it's possible to use something similar to distance function right in database?

@Alex-Gramm
Copy link

You can use query like this:

select * from images where BIT_COUNT(X'3c3e0e1a3a1e1e1e' ^ images.hash) < 5

@rw4lll
Copy link
Author

rw4lll commented Jun 14, 2019

Thanks!

@rw4lll
Copy link
Author

rw4lll commented Jun 15, 2019

We are using postgres for our project, so it doesn't have bit_count method. I did custom function for that:

CREATE OR REPLACE FUNCTION hash_distance(hash1 varchar, hash2 varchar) RETURNS integer AS $$
        DECLARE
			bit_hash1 bit varying := hash1;
			bit_hash2 bit varying := hash2;
			length_hash1 int:= length(hash1);
			length_hash2 int:= length(hash2);
			max_length int:= GREATEST(length_hash1, length_hash2);
		BEGIN
				IF length_hash1 < max_length THEN
    				bit_hash1:=LPAD(bit_hash1::text, max_length, '0')::bit varying;
				END IF;
				
				IF length_hash2 < max_length THEN
    				bit_hash2:=LPAD(bit_hash2::text, max_length, '0')::bit varying;
				END IF;
				
				RETURN length(replace((bit_hash1 # bit_hash2)::text, '0', ''));
        END;
$$ LANGUAGE plpgsql;

select hash_distance('100000', '11010011');
It's based on logic of difference hash. Not sure how fast is it, but may be it helps for somebody in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants