Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Minimum votes for update #118

Open
jobrien2001 opened this issue Mar 1, 2024 · 4 comments
Open

Enhancement: Minimum votes for update #118

jobrien2001 opened this issue Mar 1, 2024 · 4 comments

Comments

@jobrien2001
Copy link

Hello,

It would be good to have an environmental variable to set a minimum amount of votes for an update to happen.

The problem is a movie can have a 10 rating but only few votes... making it an unreliable rating.

There are too many new movies with high rating, making sorting movies by rating useless.

Thanks

@mynttt
Copy link
Owner

mynttt commented Mar 1, 2024

I agree that this is a useful feature, searching IMDB with minimum votes < 1000 yields some really bad results.

One question would be how are ratings below the threshold handled that have already been updated? Should they be reverted to 0 as a signal that they're categorized as too low for their votes to have any significance? Should that be an additional option?

@jobrien2001
Copy link
Author

jobrien2001 commented Mar 2, 2024

Im not sure.

Some of the data from the file in updatetool seems to be off from the actual rating, maybe its cached and not refreshed as often.

Maybe set rating to null and trigger a refresh so it gets the actual rating from the default agent? Im not sure if the agents handle this problem already

For this an env variable would be needed for a plex token and another for a host to send a curl request.

May i ask how do you get the ratings? If you scrape them youself maybe scrape more frequently on new records/low vote count for a period of time.

@mynttt
Copy link
Owner

mynttt commented Mar 3, 2024

@jobrien2001

Can you provide examples for data that is wrong? Data is sourced from the daily updated IMDB data sets or scraped from their website (then cached for 7 days in the very rare case that the data is not included in IMDBs data set); having completly off data would indicate that something in the ID matching process is wrong and would mean that the tool is possibly having a bug.

Data set: https://datasets.imdbws.com/title.ratings.tsv.gz

Scraper: https://github.com/mynttt/UpdateTool/blob/master/src/main/java/updatetool/imdb/ImdbScraper.java

@jobrien2001
Copy link
Author

jobrien2001 commented Mar 3, 2024

Hello,

Im not sure the data is wrong. Since you say its cached for 7 days, maybe at some point earlier it was right, (unreliable because of the low vote count, but right).

I see the way the data is presented.

Perhaps your suggestion is the best solution, set 0 or NULL(havent looked at db) to any record below a determined amount of votes. Also skip updating if below that same amount.

An env variable would be great so the user can set that number to their liking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants