Performance Questions #202

unphased · 2024-04-08T08:24:45Z

I'm testing this tool out and i saw some behavior that was not expected

I manually imported around 200k entries from one of my custom history files. two of these entries correspond to each actual command history item, but that's alright, it makes for a bigger test case anyway
performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.
After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.
After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

I dont want to give up just yet so I'm curious if anyone can contextualize the behavior i observed.

ddworken · 2024-04-14T16:35:19Z

Thank you for the feedback!

performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.

Thanks for raising this, this is something that I personally haven't run into since my total history size is relatively small (~30k commands). I'll spend some time working on setting up benchmarks for this and will see if I can improve performance here at all.

After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.

Yeah, it is all encrypted so it is impossible for anything else to read it. But if you'd rather not have this, see the "Offline Install Without Syncing" section in the readme. This way you can install it in a 100% offline mode without any syncing support whatsoever.

After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

Hmm, interesting. This is unexpected, so I'll also plan on taking a look at this.

… a full table scan (for #202)

…#202 (#204) * Fix double-syncing error where devices receive entries from themselves * Fix incorrect error message * Add TODO * Update TestESubmitThenQuery after making query more efficient * Update TestDeletionRequests and remove unnecessary asserts * Swap server_test.go to using require * Fix incorrect require due to typo

ddworken · 2024-04-15T05:58:11Z

Looping back on this, I'm happy to say that:

Searching performance should be significantly improved by ba21e1c. This will technically only improve performance in cases where there are many results (so it will still be slow if you're searching through 200k entries for only 1 matching result), but this should significantly improve the UX. I'm also planning on experimenting with sqlite's FTS/trigram support to see if we can improve this more.
The issue of re-downloading entries that came from the given device is fixed by Fix double-syncing error where devices receive entries from themselves #202 #204.

unphased · 2024-04-15T06:13:08Z

That's awesome! Thank you.

hongyi-zhao · 2024-04-19T13:06:10Z

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

ddworken · 2024-04-21T01:21:26Z

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

Since hishtory runs entirely on the client-side and is end-to-end encrypted, postgres isn't a great fit. Postgres is generally meant to be run on a server where it contains all the data, so it isn't a good fit for the hishtory use case where the server only stores encrypted blobs and has no visibility into the data.

ddworken added a commit that referenced this issue Apr 14, 2024

Add benchmarking for searching for #202

7e4ca84

ddworken added a commit that referenced this issue Apr 14, 2024

Add index of start time so that queries with a LIMIT clause can avoid…

ba21e1c

… a full table scan (for #202)

ddworken changed the title ~~couple questions~~ Performance Questions Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Questions #202

Performance Questions #202

unphased commented Apr 8, 2024 •

edited

ddworken commented Apr 14, 2024

ddworken commented Apr 15, 2024

unphased commented Apr 15, 2024

hongyi-zhao commented Apr 19, 2024

ddworken commented Apr 21, 2024

Performance Questions #202

Performance Questions #202

Comments

unphased commented Apr 8, 2024 • edited

ddworken commented Apr 14, 2024

ddworken commented Apr 15, 2024

unphased commented Apr 15, 2024

hongyi-zhao commented Apr 19, 2024

ddworken commented Apr 21, 2024

unphased commented Apr 8, 2024 •

edited