Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning for specific bodies of text #1

Open
alexose opened this issue Mar 27, 2022 · 1 comment
Open

Tuning for specific bodies of text #1

alexose opened this issue Mar 27, 2022 · 1 comment

Comments

@alexose
Copy link

alexose commented Mar 27, 2022

First off, very cool work! I spent this morning spinning up a clientside demo to show some friends.

Second, I'm hoping to use this library to compress english text messages. Given that most text messages are short phrases (e.g., "Be there soon", "How are you", "Where are you"), I thought I could feed these common phrases into the algorithm. I noticed this line in the docs:

Please refer to unishox_compress_lines and unishox_decompress_lines functions in the library to make use of these. However in most cases, the default Simple API provides optimimum compression.

These functions don't appear to exist. Any tips for me?

@siara-cc
Copy link
Owner

Hi, Thank you for sharing this. I am glad that you find it useful!
I am sorry if it is a bit misleading but the documentation refers to the C API and the _lines part of it refers to compression of a C Array and not about compressing common lines.
However, Unishox2 allows for 6 strings such as those you have mentioned using the usx_freq_seq parameter:

function unishox2_compress(input, len, out, usx_hcodes, usx_hcode_lens, usx_freq_seq, usx_templates)

Even without defining such common phrases, Unishox2 will compress them to the extent possible.

Unishox2 is designed for low memory systems such as Microcontrollers and does not use a codebook like smaz, but I hope to do so in Unishox3 to achieve the best compression possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants