-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More options for history completion entries order #5481
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I like the relative simplicity of it. I've got a few comments, and I think we could use some more tests in test_histcategory.py
.
Thanks. I'm definitely adding tests later on if we all agree on this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll let @The-Compiler give the final sign-off.
Hmm, some of the environments now fail because of the UPSERT syntax. It seems I checked all the sqlite versions while thinking "we need 3.8 for upsert", when actually we need 3.24... I even said that myself in the previous comment! 🤦 So it turns out we can't use it after all. Back to insert+update it is... |
@The-Compiler I believe UPSERT would avoid looking up the record twice (once to find the duplicate on insert, then again to update the row on the update). Hopefully that isn't a big deal on a primary key lookup. |
Usually there is a performance improvement when talking to a RDBMS over the network, but I wasn't sure about SQLite so I ran some tests: >>> def insert_update():
... q = c.execute("insert or ignore into CompletionHistory values ('url', 'title', 1, 1, 1)")
... if q.rowcount == 0:
... c.execute("update CompletionHistory set url='url', title='title', last_atime=1, visits=100, frecency=1 where url='url'")
...
>>> def upsert():
... c.execute("insert into CompletionHistory values ('url', 'title', 1, 1, 1) on conflict(url) do update set url='url', title='title', last_atime=1, visits=1, frecency=1 where url='url'")
...
>>> from timeit import timeit
>>> import sqlite3
>>> conn=sqlite3.connect("/tmp/qutebrowser-basedir/data/history.sqlite")
>>> c=conn.cursor()
>>> timeit('insert_update()', globals=globals())
7.934143277000658
>>> timeit('upsert()', globals=globals())
5.431534209999882 >>> from qutebrowser.misc import sql
>>> def insert_update():
... q = sql.Query("insert or ignore into CompletionHistory values ('url', 'title', 1, 1, 1)").run()
... if q.rows_affected() == 0:
... sql.Query("update CompletionHistory set url='url', title='title', last_atime=1, visits=100, frecency=1 where url='url'").run()
...
>>> def upsert():
... sql.Query("insert into CompletionHistory values ('url', 'title', 1, 1, 1) on conflict(url) do update set url='url', title='title', last_atime=1, visits=1, frecency=1 where url='url'").run()
...
>>> from timeit import timeit
>>> timeit('insert_update()', globals=globals(), setup='from qutebrowser.misc import sql;sql.init("/tmp/qutebrowser-basedir/data/history.sqlite")')
QSqlDatabasePrivate::addDatabase: duplicate connection name 'qt_sql_default_connection', old connection removed.
289.1367021160004
>>> timeit('upsert()', globals=globals(), setup='from qutebrowser.misc import sql;sql.init("/tmp/qutebrowser-basedir/data/history.sqlite")')
QSqlDatabasePrivate::addDatabase: duplicate connection name 'qt_sql_default_connection', old connection removed.
209.05686401799994 So it seems there is some improvement. By the way, the gain is proportionally the same in both cases (~30% faster) but oh boy does it take longer when using
I'm not sure this is the case because the |
Since I was around I fixed the conflicts here. I've been using this patch since writing it by the way, and so far everything works perfectly. @treeshateorcs if you are interested in this feature (I saw your comment in the old PR) and you are able to apply custom patches, it would be great if you could give this a try. |
thank you, i just applied the patch. i need to give it a few days |
hmm, for some reason the patch can be applied against git, but not release? |
@treeshateorcs It happens - |
been working great for me so far |
@lufte please update this to current master, i tried myself but failed |
You got it @treeshateorcs! |
is there a chance this will be merged? it broke again. i can't believe i'm the only one interested in this feature. to me it's essential p.s. also it seems like this patch alters the database or something, because i'm not able to use qutebrowser with my current data and config files without it. it just crashes |
I'll go back to merging older PRs once v2.0.0 is released - but this isn't exactly on the top of my priorities, see the PR backlog project board. |
Glad to hear this feature is moving. Lack of it force me to use my own build of QB v1.5.2. I got tired to constantly rebase my patch, so I just live with very old (2 years) version or QB. :) |
The upgrade to v2.0 forced me to refactor a good part of this work so I decided to squash all the old commits into one and re-apply it on top of master. Sorry about the loss of history (note to self: I left the old branch in https://github.com/lufte/qutebrowser/tree/issue601old) but this makes it easier to keep the patch up to date every time a new conflict is introduced, without having merge commits everywhere. @treeshateorcs You should be able to apply the patch on top of master or v2.0.2. Indeed, this patch alters the database, but it only adds columns and indexes. I'm not sure why the same database wouldn't work with upstream. If you have more details on this please let me know, because it affects one of the changes I had to make now. @The-Compiler While working on this I noticed a potential problem with the new way of setting |
That should be fixed in 2b47bd0 already, no? |
I was talking about re-creating the actual table (to add the new columns), not only re-creating its records. Unless we make it so |
Ah, I see... I'm not sure if this is really fixable in a sane way before #6039, given that everything is scattered around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things I noticed in a first pass. Now that I got some context again, this indeed seems a very nice and simple solution for the problem - great work! 👍
If you want, it'd be great if you could add some type annotations, especially for arguments and return values in any new/changed functions. It's by no means required though, it's just something I started doing for new code.
I'll answer to some of the earlier discussion comments about upsert and #6039 too, but let me do that in a separate comment.
qutebrowser/config/configdata.yml
Outdated
- recency: Entries are sorted from most recently visited to least | ||
recently visited. | ||
- frequency: Entries are sorted from most visited to least visited. | ||
- frecency: Entries are sorted using a combination of recency and | ||
frequency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While those terms are probably the correct technical terms, I wonder if they aren't a bit confusing (and also hard to read/distinguish), especially for non-English users.
Maybe they should be something like most-recent
, most-frequent
, combined
or so instead?
@mschilli87 any opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they are too bad, unless for what I imagine is the typical qutebrowser user, but I'm happy to hear more opinions.
Related issues:
For #5551, I still think it'd be nice to use As for #6039, like you said:
I guess it'd be nice to have that fixed, especially since the rebuild logic is getting a bit more complex here. However, I don't think I want to block this PR based on that. Is this something you can see yourself working on, @lufte? If so, maybe let's focus on getting this in first, but make sure #6039 is fixed before the next release? Also, I'm guessing this will potentially conflict with URL and Title changes trigger history entry updates by andrewcarlotti · Pull Request #5337 · qutebrowser/qutebrowser (cc @andrewcarlotti). I'm afraid I still haven't had a chance to take a closer look at that one. I'd prefer to merge this first, since I've already looked over it now. If you're still interested to get yours in, @andrewcarlotti, I guess it might make sense for me to look at that one next. If not, I totally understand and I'll pick up the loose ends, but that'll probably take a bit longer then. |
Agreed. I'm not paricularly worried about this specially because this occurs when updating history instead of querying it, which is when performance problems would be more noticeable (as in big history + typing quickly). I say let's leave this for after.
This one looks more important though. I'm happy to work on #6039 first. Not sure how much of an effort it is though, I would need to take a look at the code. Do you think it's a lot?
If I start with #6039 we also leave some time to maybe get this one in. That's another conflict that I can deal with later along with the others, in one go. |
#6039 is in now - so I'm assuming you'll take care of updating this accordingly and fixing the conflicts. Let me know once this is updated and I'll take a look - however, I'll be away for most of next week. |
Correct! I'm working on it now. I'll ping you when it's ready. |
@@ -177,7 +183,6 @@ def __init__(self, database: sql.Database, progress: HistoryProgress, | |||
# Store the last saved url to avoid duplicate immediate saves. | |||
self._last_url = None | |||
|
|||
self.completion = CompletionHistory(database, parent=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to move this to the bottom because otherwise it will try to create indexes on columns it doesn't yet have. Maybe index creation should happen inside SqlTable.__init__
instead of it being a separate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we need something like that, otherwise now every time we re-create CompletionHistory without user version changing, its indexes wont be re-created. I'll come back to this during the week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a force
parameter to create_index
, with the same behavior as the one in _create_table
. I think this is enough for now.
@@ -1156,6 +1156,30 @@ completion.cmd_history_max_items: | |||
|
|||
0: no history / -1: unlimited | |||
|
|||
completion.web_history.sort_criterion: | |||
default: recency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we set frecency as the new default?
I went through all the old changes and re-applied them on top of the new master. I left a few open discussions that still need solving, and afterwards this would be ready for review again. |
Thanks for the update! I'll take another look soon. There's a lot going on in the next month (including some desperately needed 1.5 weeks of holidays, at least) so it might still take a bit of time I'm afraid, but this definitely is the next non-trivial PR I want to look at. |
thank you for still working on this! @lufte |
Just keeping my forked version up to date 😅 |
this patch no longer applies against master (nor a previous version of this patch that i had saved previously) :( edit: nevermind, i had messed up the previous version of the patch, now it works alright |
Yes, according to this PR, there are many conflicts against master at the moment. The patch is valid with the latest release tags though, which is what I apply it against. I will update it once again when Qt 6 is merged and we're ready to merge this into master. |
Hi @lufte, P. S. |
I've been lazy but I should do it. There is a rain forecast for tomorrow here, so maybe that will be a good opportunity :) I'll try to apply it against 3.1. |
@lufte, thanks for updating PR. Here is my case. Year ago Web Telegram link has been changed from I've tweaked your formula to limit number of visits down to
Is it worth to add one more configuration property next to |
@vchimishuk ideally that's what frecency_bonus is for. If you feel URLs with too many old visits are still too relevant, maybe the default value for frecency_bonus is too high. If you halve it, does the new URL rank higher? |
@lufte, nope. Even 6 hours bonus gives old URL 10 years of credits, which means it will be hard to beat it by any new URL. Limiting number of visits in the formula works the best for me. Probably it can add extra flexibility and improve user experience. |
This is based on the work of @vchimishuk in #4403.
I started by reviewing that pull request and testing it, but I figured that if we used a simpler formula for frecency we wouldn't need the periodic calculation nor the threading stuff. So this approach uses the last visit timestamp plus a bonus for each visit as the frecency. This way the frecency doesn't depend on the current time and can be calculated just once when inserting (or updating) the record in
CompletionHistory
.I also added
frequency
as an option as well since it was trivial. I will test this for a bit and add unit tests afterwards.Fixes #601.