Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Content-Code for binary data. #89

Open
titusz opened this issue Jul 10, 2020 · 1 comment
Open

Implement Content-Code for binary data. #89

titusz opened this issue Jul 10, 2020 · 1 comment

Comments

@titusz
Copy link
Member

titusz commented Jul 10, 2020

We could extract printable strings (with different encodings) from all kinds of binary data like executables or custom binary formats with https://github.com/getreu/stringsext ... and create a text similarity signature.

The question is if we still call this Content-ID-Text of if we create a custom Content-ID-Binary that signals that text was extracted from a binary format without any format-specific structured parsing.

@lrosenthol
Copy link

What about defining the algorithm that would be used instead of a specific implementation? It would help to look at various binary formats and see what is the most important aspects to apply to the ID.

@titusz titusz changed the title Implement Content-ID for binary data. Implement Content-Code for binary data. Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants