More options for history completion entries order #5481

lufte · 2020-06-02T03:43:36Z

This is based on the work of @vchimishuk in #4403.

I started by reviewing that pull request and testing it, but I figured that if we used a simpler formula for frecency we wouldn't need the periodic calculation nor the threading stuff. So this approach uses the last visit timestamp plus a bonus for each visit as the frecency. This way the frecency doesn't depend on the current time and can be calculated just once when inserting (or updating) the record in CompletionHistory.

I also added frequency as an option as well since it was trivial. I will test this for a bit and add unit tests afterwards.

Fixes #601.

rcorre

Looks good! I like the relative simplicity of it. I've got a few comments, and I think we could use some more tests in test_histcategory.py.

qutebrowser/browser/history.py

qutebrowser/completion/models/histcategory.py

lufte · 2020-06-03T13:59:32Z

Looks good! I like the relative simplicity of it. I've got a few comments, and I think we could use some more tests in test_histcategory.py.

Thanks. I'm definitely adding tests later on if we all agree on this approach.

rcorre

LGTM! I'll let @The-Compiler give the final sign-off.

tests/unit/browser/test_history.py

lufte · 2020-06-07T21:29:15Z

Hmm, some of the environments now fail because of the UPSERT syntax. It seems I checked all the sqlite versions while thinking "we need 3.8 for upsert", when actually we need 3.24... I even said that myself in the previous comment! 🤦

So it turns out we can't use it after all. Back to insert+update it is...

The-Compiler · 2020-06-10T14:56:29Z

@lufte @rcorre Could you please enlighten me on what benefits UPSERT / fa2b280 would bring? Is it just slightly nicer code, or are there other benefits (performance?) as well?

rcorre · 2020-06-11T02:18:26Z

@The-Compiler I believe UPSERT would avoid looking up the record twice (once to find the duplicate on insert, then again to update the row on the update). Hopefully that isn't a big deal on a primary key lookup.

lufte · 2020-06-11T03:15:30Z

Usually there is a performance improvement when talking to a RDBMS over the network, but I wasn't sure about SQLite so I ran some tests:

>>> def insert_update():
...  q = c.execute("insert or ignore into CompletionHistory values ('url', 'title', 1, 1, 1)")
...  if q.rowcount == 0:
...   c.execute("update CompletionHistory set url='url', title='title', last_atime=1, visits=100, frecency=1 where url='url'")
... 
>>> def upsert():
...  c.execute("insert into CompletionHistory values ('url', 'title', 1, 1, 1) on conflict(url) do update set url='url', title='title', last_atime=1, visits=1, frecency=1 where url='url'")
... 
>>> from timeit import timeit
>>> import sqlite3
>>> conn=sqlite3.connect("/tmp/qutebrowser-basedir/data/history.sqlite")
>>> c=conn.cursor()
>>> timeit('insert_update()', globals=globals())
7.934143277000658
>>> timeit('upsert()', globals=globals())
5.431534209999882

>>> from qutebrowser.misc import sql
>>> def insert_update():                                                                                                                               
...  q = sql.Query("insert or ignore into CompletionHistory values ('url', 'title', 1, 1, 1)").run()                                                   
...  if q.rows_affected() == 0:                                                                                                                        
...    sql.Query("update CompletionHistory set url='url', title='title', last_atime=1, visits=100, frecency=1 where url='url'").run()
...    
>>> def upsert():
...  sql.Query("insert into CompletionHistory values ('url', 'title', 1, 1, 1) on conflict(url) do update set url='url', title='title', last_atime=1, visits=1, frecency=1 where url='url'").run()
...  
>>> from timeit import timeit
>>> timeit('insert_update()', globals=globals(), setup='from qutebrowser.misc import sql;sql.init("/tmp/qutebrowser-basedir/data/history.sqlite")')
QSqlDatabasePrivate::addDatabase: duplicate connection name 'qt_sql_default_connection', old connection removed.
289.1367021160004
>>> timeit('upsert()', globals=globals(), setup='from qutebrowser.misc import sql;sql.init("/tmp/qutebrowser-basedir/data/history.sqlite")')
QSqlDatabasePrivate::addDatabase: duplicate connection name 'qt_sql_default_connection', old connection removed.
209.05686401799994

So it seems there is some improvement. By the way, the gain is proportionally the same in both cases (~30% faster) but oh boy does it take longer when using QtSql. What could be going on there? I believe both tests run with autocommit so that shouldn't be it.

@The-Compiler I believe UPSERT would avoid looking up the record twice (once to find the duplicate on insert, then again to update the row on the update). Hopefully that isn't a big deal on a primary key lookup.

I'm not sure this is the case because the DO UPDATE part requires its own WHERE clause, meaning that the update it's not done automatically on the record that failed and could even update multiple records.

lufte · 2020-12-04T01:07:13Z

Since I was around I fixed the conflicts here. I've been using this patch since writing it by the way, and so far everything works perfectly. @treeshateorcs if you are interested in this feature (I saw your comment in the old PR) and you are able to apply custom patches, it would be great if you could give this a try.

treeshateorcs · 2020-12-04T02:16:16Z

thank you, i just applied the patch. i need to give it a few days

treeshateorcs · 2020-12-04T20:05:40Z

hmm, for some reason the patch can be applied against git, but not release?

The-Compiler · 2020-12-04T20:08:31Z

@treeshateorcs It happens - master is heading towards v2.0.0 so there are a lot of bigger changes in there which aren't in any release so far.

treeshateorcs · 2020-12-15T09:39:38Z

been working great for me so far

treeshateorcs · 2021-01-03T20:11:02Z

@lufte please update this to current master, i tried myself but failed

lufte · 2021-01-06T19:38:16Z

You got it @treeshateorcs!

treeshateorcs · 2021-01-18T07:59:29Z

is there a chance this will be merged? it broke again. i can't believe i'm the only one interested in this feature. to me it's essential

p.s. also it seems like this patch alters the database or something, because i'm not able to use qutebrowser with my current data and config files without it. it just crashes

The-Compiler · 2021-01-18T09:49:45Z

I'll go back to merging older PRs once v2.0.0 is released - but this isn't exactly on the top of my priorities, see the PR backlog project board.

vchimishuk · 2021-01-28T21:00:56Z

Glad to hear this feature is moving. Lack of it force me to use my own build of QB v1.5.2. I got tired to constantly rebase my patch, so I just live with very old (2 years) version or QB. :)
Thank you, @lufte.

lufte · 2021-02-15T19:08:44Z

The upgrade to v2.0 forced me to refactor a good part of this work so I decided to squash all the old commits into one and re-apply it on top of master. Sorry about the loss of history (note to self: I left the old branch in https://github.com/lufte/qutebrowser/tree/issue601old) but this makes it easier to keep the patch up to date every time a new conflict is introduced, without having merge commits everywhere.

@treeshateorcs You should be able to apply the patch on top of master or v2.0.2. Indeed, this patch alters the database, but it only adds columns and indexes. I'm not sure why the same database wouldn't work with upstream. If you have more details on this please let me know, because it affects one of the changes I had to make now.

@The-Compiler While working on this I noticed a potential problem with the new way of setting PRAGMA user_version: if something goes wrong while re-creating CompletionHistory and qutebrowser crashes, user_version is already set to the newer number so the user will be left in an inconsistent state, and user_version will need to be set back manually to reproduce the issue.

The-Compiler · 2021-02-15T19:15:46Z

@The-Compiler While working on this I noticed a potential problem with the new way of setting PRAGMA user_version: if something goes wrong while re-creating CompletionHistory and qutebrowser crashes, user_version is already set to the newer number so the user will be left in an inconsistent state, and user_version will need to be set back manually to reproduce the issue

That should be fixed in 2b47bd0 already, no?

lufte · 2021-02-15T19:23:25Z

I was talking about re-creating the actual table (to add the new columns), not only re-creating its records. Unless we make it so force_rebuild always drops the table too... it shouldn't even be noticeable compared to the time it takes to rebuild all the rows.

The-Compiler · 2021-02-15T19:26:12Z

Ah, I see... I'm not sure if this is really fixable in a sane way before #6039, given that everything is scattered around history.py and sql.py at the moment...

The-Compiler

A couple of things I noticed in a first pass. Now that I got some context again, this indeed seems a very nice and simple solution for the problem - great work! 👍

If you want, it'd be great if you could add some type annotations, especially for arguments and return values in any new/changed functions. It's by no means required though, it's just something I started doing for new code.

I'll answer to some of the earlier discussion comments about upsert and #6039 too, but let me do that in a separate comment.

qutebrowser/browser/history.py

qutebrowser/completion/models/histcategory.py

The-Compiler · 2021-06-15T12:56:29Z

qutebrowser/config/configdata.yml

+      - recency: Entries are sorted from most recently visited to least
+            recently visited.
+      - frequency: Entries are sorted from most visited to least visited.
+      - frecency: Entries are sorted using a combination of recency and
+            frequency.


While those terms are probably the correct technical terms, I wonder if they aren't a bit confusing (and also hard to read/distinguish), especially for non-English users.

Maybe they should be something like most-recent, most-frequent, combined or so instead?

@mschilli87 any opinion on this?

I don't think they are too bad, unless for what I imagine is the typical qutebrowser user, but I'm happy to hear more opinions.

qutebrowser/misc/sql.py

tests/unit/completion/test_histcategory.py

The-Compiler · 2021-06-15T14:30:20Z

Related issues:

Minimal sqlite version / compatibility layer for UPSERT? · Issue #5551 · qutebrowser/qutebrowser
Use classes for the sql database · Issue #6039 · qutebrowser/qutebrowser
Qt SQL takes much longer compared to Python sqlite? · Issue #6548 · qutebrowser/qutebrowser

For #5551, I still think it'd be nice to use UPSERT here and have the logic in the SQL layer somehow, instead of the ignore=True thing we have now. I'll let you decide though, @lufte - if you believe it's simpler to keep things as-is for this PR and move to upsert (where available) later, that's fine by me as well. As long as #6548 is still around, I guess taking a stab at that would yield a more significant result than insert/update vs. upsert anyways.

As for #6039, like you said:

While working on this I noticed a potential problem with the new way of setting PRAGMA user_version: if something goes wrong while re-creating CompletionHistory and qutebrowser crashes, user_version is already set to the newer number so the user will be left in an inconsistent state, and user_version will need to be set back manually to reproduce the issue.

I guess it'd be nice to have that fixed, especially since the rebuild logic is getting a bit more complex here. However, I don't think I want to block this PR based on that. Is this something you can see yourself working on, @lufte? If so, maybe let's focus on getting this in first, but make sure #6039 is fixed before the next release?

Also, I'm guessing this will potentially conflict with URL and Title changes trigger history entry updates by andrewcarlotti · Pull Request #5337 · qutebrowser/qutebrowser (cc @andrewcarlotti). I'm afraid I still haven't had a chance to take a closer look at that one. I'd prefer to merge this first, since I've already looked over it now. If you're still interested to get yours in, @andrewcarlotti, I guess it might make sense for me to look at that one next. If not, I totally understand and I'll pick up the loose ends, but that'll probably take a bit longer then.

lufte · 2021-06-15T23:55:59Z

Related issues:

Minimal sqlite version / compatibility layer for UPSERT? · Issue #5551 · qutebrowser/qutebrowser

Use classes for the sql database · Issue #6039 · qutebrowser/qutebrowser

Qt SQL takes much longer compared to Python sqlite? · Issue #6548 · qutebrowser/qutebrowser

For #5551, I still think it'd be nice to use UPSERT here and have the logic in the SQL layer somehow, instead of the ignore=True thing we have now. I'll let you decide though, @lufte - if you believe it's simpler to keep things as-is for this PR and move to upsert (where available) later, that's fine by me as well. As long as #6548 is still around, I guess taking a stab at that would yield a more significant result than insert/update vs. upsert anyways.

Agreed. I'm not paricularly worried about this specially because this occurs when updating history instead of querying it, which is when performance problems would be more noticeable (as in big history + typing quickly). I say let's leave this for after.

As for #6039, like you said:

While working on this I noticed a potential problem with the new way of setting PRAGMA user_version: if something goes wrong while re-creating CompletionHistory and qutebrowser crashes, user_version is already set to the newer number so the user will be left in an inconsistent state, and user_version will need to be set back manually to reproduce the issue.

I guess it'd be nice to have that fixed, especially since the rebuild logic is getting a bit more complex here. However, I don't think I want to block this PR based on that. Is this something you can see yourself working on, @lufte? If so, maybe let's focus on getting this in first, but make sure #6039 is fixed before the next release?

This one looks more important though. I'm happy to work on #6039 first. Not sure how much of an effort it is though, I would need to take a look at the code. Do you think it's a lot?
If I start with this one but we keep it blocked until #6039 is done, then it sounds like I may need to get back here afterwards to take care of any potential conflict.

Also, I'm guessing this will potentially conflict with URL and Title changes trigger history entry updates by andrewcarlotti · Pull Request #5337 · qutebrowser/qutebrowser (cc @andrewcarlotti). I'm afraid I still haven't had a chance to take a closer look at that one. I'd prefer to merge this first, since I've already looked over it now. If you're still interested to get yours in, @andrewcarlotti, I guess it might make sense for me to look at that one next. If not, I totally understand and I'll pick up the loose ends, but that'll probably take a bit longer then.

If I start with #6039 we also leave some time to maybe get this one in. That's another conflict that I can deal with later along with the others, in one go.

The-Compiler · 2021-07-09T15:07:48Z

#6039 is in now - so I'm assuming you'll take care of updating this accordingly and fixing the conflicts. Let me know once this is updated and I'll take a look - however, I'll be away for most of next week.

lufte · 2021-07-10T14:30:34Z

#6039 is in now - so I'm assuming you'll take care of updating this accordingly and fixing the conflicts. Let me know once this is updated and I'll take a look - however, I'll be away for most of next week.

Correct! I'm working on it now. I'll ping you when it's ready.

lufte · 2021-07-11T02:07:23Z

qutebrowser/browser/history.py

@@ -177,7 +183,6 @@ def __init__(self, database: sql.Database, progress: HistoryProgress,
        # Store the last saved url to avoid duplicate immediate saves.
        self._last_url = None

-        self.completion = CompletionHistory(database, parent=self)


Had to move this to the bottom because otherwise it will try to create indexes on columns it doesn't yet have. Maybe index creation should happen inside SqlTable.__init__ instead of it being a separate method?

Actually, I think we need something like that, otherwise now every time we re-create CompletionHistory without user version changing, its indexes wont be re-created. I'll come back to this during the week.

I added a force parameter to create_index, with the same behavior as the one in _create_table. I think this is enough for now.

lufte · 2021-07-11T02:09:20Z

qutebrowser/config/configdata.yml

@@ -1156,6 +1156,30 @@ completion.cmd_history_max_items:

    0: no history / -1: unlimited

+completion.web_history.sort_criterion:
+  default: recency


Do we set frecency as the new default?

lufte · 2021-07-11T02:16:12Z

I went through all the old changes and re-applied them on top of the new master. I left a few open discussions that still need solving, and afterwards this would be ready for review again.

The-Compiler · 2021-07-19T17:30:50Z

Thanks for the update! I'll take another look soon. There's a lot going on in the next month (including some desperately needed 1.5 weeks of holidays, at least) so it might still take a bit of time I'm afraid, but this definitely is the next non-trivial PR I want to look at.

treeshateorcs · 2023-03-02T14:55:33Z

thank you for still working on this! @lufte

lufte · 2023-03-02T21:43:15Z

thank you for still working on this! @lufte

Just keeping my forked version up to date 😅

treeshateorcs · 2023-03-12T17:39:52Z

this patch no longer applies against master (nor a previous version of this patch that i had saved previously) :(

edit: nevermind, i had messed up the previous version of the patch, now it works alright

lufte · 2023-03-12T20:26:20Z

Yes, according to this PR, there are many conflicts against master at the moment. The patch is valid with the latest release tags though, which is what I apply it against. I will update it once again when Qt 6 is merged and we're ready to merge this into master.

vchimishuk · 2024-05-11T21:21:45Z

Hi @lufte,
Do you still have any plans to update the branch?

P. S.
It is sad, that such a great and important feature is still in review after many years.

lufte · 2024-05-11T21:53:12Z

Hi @lufte, Do you still have any plans to update the branch?

I've been lazy but I should do it. There is a rain forecast for tomorrow here, so maybe that will be a good opportunity :)

I'll try to apply it against 3.1.

vchimishuk · 2024-05-20T21:15:13Z

@lufte, thanks for updating PR.
I've started using your solution and it works great in general. However, I've found one issue with the proposed formula.
The issue is that links do not die for a very long time when they are not in use (for a long time) any more.

Here is my case. Year ago Web Telegram link has been changed from https://web.telegram.org/z/ (8K visits) to https://web.telegram.org/a/ (5K visits). So, /z/ is still higher than /a/ despite it has not been used for a year now.
Original Mozilla's formula handles such situations pretty good.

I've tweaked your formula to limit number of visits down to 100. In this case it seems to work better for me.

update CompletionHistory set frecency = min((visits - 1), 100) * 43200 + last_atime;

Is it worth to add one more configuration property next to frecency_bonus to limit the number of visits it takes into account?

lufte · 2024-05-20T21:36:24Z

@vchimishuk ideally that's what frecency_bonus is for. If you feel URLs with too many old visits are still too relevant, maybe the default value for frecency_bonus is too high. If you halve it, does the new URL rank higher?

vchimishuk · 2024-05-21T13:18:15Z

@vchimishuk ideally that's what frecency_bonus is for. If you feel URLs with too many old visits are still too relevant, maybe the default value for frecency_bonus is too high. If you halve it, does the new URL rank higher?

@lufte, nope. Even 6 hours bonus gives old URL 10 years of credits, which means it will be hard to beat it by any new URL.
Using 5 minutes bonus works good for a 2-3 top rows, but then fresh one-visitors beats regular time-to-time used pages.

Limiting number of visits in the formula works the best for me. Probably it can add extra flexibility and improve user experience.
Anyway, your solution works great in general, especially comparing to what we have available in QB master. Just want to share with you my experience.

lufte requested a review from rcorre as a code owner June 2, 2020 03:43

project-bot bot added this to Inbox in Pull request backlog Jun 2, 2020

rcorre reviewed Jun 3, 2020

View reviewed changes

qutebrowser/browser/history.py Outdated Show resolved Hide resolved

qutebrowser/browser/history.py Outdated Show resolved Hide resolved

qutebrowser/completion/models/histcategory.py Show resolved Hide resolved

lufte changed the title ~~More options for completion entries order~~ More options for history completion entries order Jun 5, 2020

Pull request backlog automation moved this from Inbox to Focus Jun 7, 2020

rcorre approved these changes Jun 7, 2020

View reviewed changes

tests/unit/browser/test_history.py Show resolved Hide resolved

The-Compiler moved this from Focus to Backlog in Pull request backlog Jun 10, 2020

The-Compiler mentioned this pull request Jun 26, 2020

Minimal sqlite version / compatibility layer for UPSERT? #5551

Open

The-Compiler mentioned this pull request Nov 19, 2020

Sort history completion results by frecency score. #4403

Closed

The-Compiler mentioned this pull request Jan 6, 2021

History/SQL cleanups for v2.0.0 #6019

Merged

lufte force-pushed the issue601 branch from e4d5da5 to 764896c Compare February 15, 2021 18:54

The-Compiler moved this from Inbox to Focus in Pull request backlog Jun 15, 2021

The-Compiler requested changes Jun 15, 2021

View reviewed changes

Pull request backlog automation moved this from Focus to WIP Jun 15, 2021

The-Compiler mentioned this pull request Jun 15, 2021

Qt SQL takes much longer compared to Python sqlite? #6548

Open

lufte force-pushed the issue601 branch from 764896c to d350620 Compare July 11, 2021 01:50

lufte commented Jul 11, 2021

View reviewed changes

lufte requested a review from The-Compiler July 11, 2021 02:16

project-bot bot moved this from WIP to Inbox in Pull request backlog Jul 11, 2021

The-Compiler moved this from Inbox to Focus in Pull request backlog Jul 19, 2021

lufte force-pushed the issue601 branch from 1ef15a7 to 08a60b4 Compare May 15, 2022 20:43

The-Compiler mentioned this pull request Jun 13, 2022

Qutedmenu sort history by frequency #7169

Open

lufte force-pushed the issue601 branch from 08a60b4 to 281dfb9 Compare March 2, 2023 14:29

Implement frecency

196f08b

lufte force-pushed the issue601 branch from 281dfb9 to 196f08b Compare May 12, 2024 19:51

More options for history completion entries order #5481

Are you sure you want to change the base?

More options for history completion entries order #5481

Conversation

lufte commented Jun 2, 2020

rcorre left a comment

Choose a reason for hiding this comment

lufte commented Jun 3, 2020

rcorre left a comment

Choose a reason for hiding this comment

lufte commented Jun 7, 2020

The-Compiler commented Jun 10, 2020

rcorre commented Jun 11, 2020

lufte commented Jun 11, 2020

lufte commented Dec 4, 2020

treeshateorcs commented Dec 4, 2020

treeshateorcs commented Dec 4, 2020

The-Compiler commented Dec 4, 2020

treeshateorcs commented Dec 15, 2020

treeshateorcs commented Jan 3, 2021

lufte commented Jan 6, 2021

treeshateorcs commented Jan 18, 2021 • edited

The-Compiler commented Jan 18, 2021

vchimishuk commented Jan 28, 2021

lufte commented Feb 15, 2021 • edited

The-Compiler commented Feb 15, 2021

lufte commented Feb 15, 2021

The-Compiler commented Feb 15, 2021

The-Compiler left a comment

Choose a reason for hiding this comment

The-Compiler Jun 15, 2021

Choose a reason for hiding this comment

lufte Jul 11, 2021

Choose a reason for hiding this comment

The-Compiler commented Jun 15, 2021

lufte commented Jun 15, 2021

The-Compiler commented Jul 9, 2021

lufte commented Jul 10, 2021

lufte Jul 11, 2021

Choose a reason for hiding this comment

lufte Jul 11, 2021 • edited

Choose a reason for hiding this comment

lufte Jul 13, 2021

Choose a reason for hiding this comment

lufte Jul 11, 2021

Choose a reason for hiding this comment

lufte commented Jul 11, 2021

The-Compiler commented Jul 19, 2021

treeshateorcs commented Mar 2, 2023

lufte commented Mar 2, 2023

treeshateorcs commented Mar 12, 2023 • edited

lufte commented Mar 12, 2023

vchimishuk commented May 11, 2024

lufte commented May 11, 2024

vchimishuk commented May 20, 2024

lufte commented May 20, 2024

vchimishuk commented May 21, 2024

treeshateorcs commented Jan 18, 2021 •

edited

lufte commented Feb 15, 2021 •

edited

lufte Jul 11, 2021 •

edited

treeshateorcs commented Mar 12, 2023 •

edited