Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bing returning same results, page numbering, minor refactor #3416

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

glanham-jr
Copy link
Contributor

@glanham-jr glanham-jr commented Apr 24, 2024

What does this PR do?

The 'sc' parameter, whatever it means, needs to be present in order to not return the same results.

Bing page numbering doesn't increase by 10 each time. The first page returns 10 results, and all pages thereafter return 14 results. This means we need to update the page numbering to account for this. This also seems to be the case when running on searxng.

Finally, the code to check the page had some duplicate checks, so I refactored the code in this section which seems low-risk. I can undo this if we want a dedicated PR for this.

Why is this change important?

Fixes 3402

How to test this PR locally?

  1. Run searxng
  2. Use !bing {...}
  3. Validate that each pages results are unique

Author's checklist

n/a

Related issues

Closes #3402

Local Testing

Page 1

image

Page 2

image

Page 3

image

Bing page numbering doesn't increase by 10 each time. The first page returns 10 results, and all pages thereafter return 14 results. This means we need to update the page numbering

Next, the 'sc' parameter, whatever it means, needs to be present in order to not return the same results.

Finally, the code to check the page had some duplicate checks, so I refactored the code in this section which is low-risk.
Copy link
Member

@Bnyro Bnyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Bing time again 😢

  • The changes don't work for me unfortunately
  • Page starts are 1, 11, 21, 31, ... for me, not as you changed it here
  • The sc parameter is 11-3 for me - I tried changing it in SearXNG but pagination still doesn't work oddly

It pretty much seems like Bing behaves different a lot depending on your location (mine is Germany) or other circumstances - which makes it pretty hard to get things to work properly...

@return42
Copy link
Member

original posted in #3358 (comment)

Yeah, bing is a source of constant errors / I tagged some of them: bing-engine

Personally I gave up to fix this engine, we will probably never find a stable solution for this, because bing itself is already full of quirks #2641 (comment) (Seems bing does not work in a VPN).

Anyway, I'm very happy to see that the community is addressing the issue ... the way the bing engine is currently running, any improvement (no matter how small) is welcome 👍

@glanham-jr
Copy link
Contributor Author

glanham-jr commented Apr 25, 2024

It's Bing time again 😢

  • The changes don't work for me unfortunately

  • Page starts are 1, 11, 21, 31, ... for me, not as you changed it here

  • The sc parameter is 11-3 for me - I tried changing it in SearXNG but pagination still doesn't work oddly

It pretty much seems like Bing behaves different a lot depending on your location (mine is Germany) or other circumstances - which makes it pretty hard to get things to work properly...

Hmmmmm, there is definitely some strange non deterministic behavior occurring then. I'll try setting my VPN to Germany to see what's working.

As for the SC, parameter, it seemed to vary in value, so I set it to a default which solved the issue for me.

@glanham-jr
Copy link
Contributor Author

glanham-jr commented Apr 26, 2024

@Bnyro Good news 🥳 I was able to reproduce the issue by tunneling requests through Germany via a VPN. So I can investigate at least Germany soon.

But now that I'm seeing issues depending on region.. I'm hoping I can make a patch that doesn't cause actual regressions. I'll try to test various different regions and see what sticks.

My current plan of action is to...

  1. include all HTTP Parameters. If that fails then...
  2. Start investigating differences of cookies between regions.

For anyone elses info, here is the current HTTP Parameters I'm investigating.

{
	"GET": {
		"scheme": "https",
		"host": "www.bing.com",
		"filename": "/search",
		"query": {
			"q": "{query}",
			"sp": "-1",
			"lq": "0",
			"pq": "",
			"sc": "0-0",
			"qs": "n",
			"sk": "",
			"cvid": "{random string 32 chars long}",
			"ghsh": "0",
			"ghacc": "0",
			"ghpl": "",
			"FPIG": "{random string 32 chars long}",
			"first": "{page offset}",
			"FORM": "{weird bing specific page offset}"
		},
	}
}

Seems like this Stack Overflow answer may have some insight.

In the Bing search context, cvid represents the JavaScript parameter ConversationId. Bing uses this key to identify your search result collection as its reply to your query, q. Similarly, pq is PartialQuery. These and other parameters may also apply to different kinds of searches, such as image or video searches.

Next, qs is your query's SuggestionType, sc shows your SuggestionCount, and from the suggestion list (dropped down, if enabled), sp shows the SuggestionPosition you chose. In your case, you did not select a suggestion, so &sp=-1. Toward the end of your string, sk is the SkipValue, because you might skip through your result pages, first tells the issuer how many results belong on the first page, and I'll let you figure out what FORM means. ;)

Lastly, while we aren't using the Bing API, there may be some information we can cross reference? Here is a link to the REST Documentation.


Also, now the page numbering is back to 1, 11, 21, ..., so who knows whats happening there. I'm going to revert those changes and assume Bing was having a moment. I don't want to cause regressions either, so less is more in that respect. However, the page numbering may be a strange combination of these other parameters, so I'll dig into this as well.

Going to put this in draft - anyone is welcome to add their inputs or testing, but I'll need to do more extensive testing by region.

@glanham-jr glanham-jr marked this pull request as draft April 26, 2024 02:50
@glanham-jr
Copy link
Contributor Author

Given Bing seems to be causing region-specific issues now, we should probably add documentation (whether in code or somewhere else) with major regions to check when making changes to this engine. But let me validate this first with extensive testing.

@return42
Copy link
Member

@glanham-jr Thank you for your research, your #3416 (comment) is very interesting 👍

Going to put this in draft - anyone is welcome to add their inputs or testing, but I'll need to do more extensive testing by region.

May I have time next weekend / but to answer your question ...

we should probably add documentation (whether in code or somewhere else) with major regions to check when making changes to this engine.

Lets document in comments and doc-strings in the engine .. the doc-strings are used here in the online documentation --> https://docs.searxng.org/dev/engines/online/bing.html

@glanham-jr
Copy link
Contributor Author

glanham-jr commented May 5, 2024

Tried testing this last night again. Germany is still having this issue despite supplying all HTTP Parameters. From what I saw, it appears to be an issue with the headers or the cookies?

I copied the exact same URL for page 2 that searxng emitted (that returned page 1 results) and tried it on a regular browser, where I got actual page 2 results.

It seems there are some slightly different behaviors even from HTTP parameters / cookies / headers. I didn't take exact notes, but I can try documenting the deltas of what I am seeing between the US and Germany.

@glanham-jr
Copy link
Contributor Author

I'm also thinking this PR partially fixes the problem for the US region, so maybe let's acknowledge this PR is a partial patch and let's dedicate a new issue specifically for Germany.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bing engine paging always return same result
3 participants