Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider clustering all.requests table by page or rank #263

Open
tunetheweb opened this issue Apr 9, 2024 · 1 comment
Open

Consider clustering all.requests table by page or rank #263

tunetheweb opened this issue Apr 9, 2024 · 1 comment

Comments

@tunetheweb
Copy link
Member

tunetheweb commented Apr 9, 2024

One thing I find really handy for the all.pages table is setting rank = 1000 as a quick way to get results and save cots but still see real data (often the more interesting data too, to be honest!).

We can't do that with the all.requests table. We also can't quickly look up the data for a simple site so can't do this via the all.pages table either. It would be handy to be able to do either of these by clustering the all.requests table by page or rank .

Now there are a max of 4 clustering columns and we're already using 4 for all.requests:

  • client
  • is_root_page
  • is_main_document
  • type

These are all useful so we'd need to drop one if we wanted to add a new column.

I think is_main_document is useful, but can mostly be repeated by type='html' AND is_main_document (not entirely but 99.8% of cases and the most useful ones!) so I'd prefer to replace that with either page or rank. I'm thinking page as can use that to get rank, but open to ideas.

@tunetheweb
Copy link
Member Author

Or maybe we should have wptid in requests table top allow joins on that instead of page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant