Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Questions #202

Open
unphased opened this issue Apr 8, 2024 · 5 comments
Open

Performance Questions #202

unphased opened this issue Apr 8, 2024 · 5 comments

Comments

@unphased
Copy link

unphased commented Apr 8, 2024

I'm testing this tool out and i saw some behavior that was not expected

  • I manually imported around 200k entries from one of my custom history files. two of these entries correspond to each actual command history item, but that's alright, it makes for a bigger test case anyway
  • performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.
  • After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.
  • After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

I dont want to give up just yet so I'm curious if anyone can contextualize the behavior i observed.

@ddworken
Copy link
Owner

Thank you for the feedback!

performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.

Thanks for raising this, this is something that I personally haven't run into since my total history size is relatively small (~30k commands). I'll spend some time working on setting up benchmarks for this and will see if I can improve performance here at all.

After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.

Yeah, it is all encrypted so it is impossible for anything else to read it. But if you'd rather not have this, see the "Offline Install Without Syncing" section in the readme. This way you can install it in a 100% offline mode without any syncing support whatsoever.

After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

Hmm, interesting. This is unexpected, so I'll also plan on taking a look at this.

ddworken added a commit that referenced this issue Apr 14, 2024
ddworken added a commit that referenced this issue Apr 15, 2024
…#202 (#204)

* Fix double-syncing error where devices receive entries from themselves

* Fix incorrect error message

* Add TODO

* Update TestESubmitThenQuery after making query more efficient

* Update TestDeletionRequests and remove unnecessary asserts

* Swap server_test.go to using require

* Fix incorrect require due to typo
@ddworken
Copy link
Owner

Looping back on this, I'm happy to say that:

  1. Searching performance should be significantly improved by ba21e1c. This will technically only improve performance in cases where there are many results (so it will still be slow if you're searching through 200k entries for only 1 matching result), but this should significantly improve the UX. I'm also planning on experimenting with sqlite's FTS/trigram support to see if we can improve this more.
  2. The issue of re-downloading entries that came from the given device is fixed by Fix double-syncing error where devices receive entries from themselves #202 #204.

@unphased
Copy link
Author

That's awesome! Thank you.

@ddworken ddworken changed the title couple questions Performance Questions Apr 16, 2024
@hongyi-zhao
Copy link

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

@ddworken
Copy link
Owner

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

Since hishtory runs entirely on the client-side and is end-to-end encrypted, postgres isn't a great fit. Postgres is generally meant to be run on a server where it contains all the data, so it isn't a good fit for the hishtory use case where the server only stores encrypted blobs and has no visibility into the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants