Skip to content

makeyourownmaker/OscarsPredictions

Repository files navigation

OscarsPredictions

Lifecycle R %>%= 3.2.0

Oscars predictions using a Bradley-Terry model and stan

If you like OscarsPredictions, give it a star, or fork it and contribute!

Installation/Usage

Requires R version 3.2.0 and higher.

To install the required libraries in an R session:

install.packages("data.table")
install.packages("ggplot2")
install.packages("rstan")

Clone repository:

git clone https://github.com/makeyourownmaker/OscarsPredictions
cd OscarsPredictions

Then run file(s) in an R session:

setwd("OscarsPredictions")
source("oscars-bradley-terry-teams-multiyear.R", echo = TRUE)

Details

Oscars

Oscar nominees are difficult to rank. The Oscars are an annual event with a changing electorate and frequent rule changes which do not apply uniformly to all award categories. For example, during most of the history of the Oscars there have been five nominations in each category. Starting in 2009 the Academy of Motion Picture Arts and Sciences expanded the nominations for best picture to more than 5 films (but so far less than 10). The best picture prize uses instant runoff voting instead of first past the post which is used for the other categories. A winner is announced for each Occar category but no other information is explicitly provided for ranking, i.e. no 2nd or 3rd place medals and no runners-up.

I'm not aware of anybody polling the greater than 5,000 academy members eligible to vote; or anyone investigating the amounts studios spend on lobbying for their nominees. We can however look at current and past Oscar nominations plus press and guild award winners. There is likely to be considerable overlap between the academy membership and the more carrer-specific guilds such as Screen Actors Guild, Directors Guild, Producers Guild etc.

Model

Bradley-Terry (BT) models rank pairs of competitors. They give the probability that a competition will lead to a win or loss for either competitor. The BT model can be expanded by pairing the likelihood with a prior to produce a Bayesian model which allows inference for future competitions. A paired-comparison model is an obvious choice for the Oscars competition. I am not aware of anyone who has used the BT model to predict the Oscars. More thought may be required for the Best Picture award given the increase in nominees.

One potential disadvantage of BT models is they do not permit draws. This does not affect Oscar rankings where do not want, and should not have, draws.

Here I present a BT model for "team"-based competitions where teams consist of multiple "players", where teams are nominees and players are explanatory variables. Each player has an ability and the team's ability is assumed to be additive on the log odds scale. Interaction effects are ignored for now. Both player and team rankings can be estimated using stan.

The BT model(s) in this repository are in the early stages of development and will initially include the Best Director category. They are based on the stan Bradley-Terry example models.

A number of alternative models are briefly summarised in the Alternatives section.

Data

Data used come from Iain Pardoe who is using a Bayesian discrete choice multinomial model to predict Oscar winners for the 4 main categories. The data set runs from 1938 to 2006.

Immediately relevant predictors for the Best Director category include:

Explanatory Variable Description Data Availability
DPrN Total previous directing nominations 1939-2006
DP Picture Oscar nomination 1944-2006
Gd Golden globe director winer 1943-2006
DGA Directors guild award winner 1951-2006

These variables are centered and scaled to mean 0 and standard deviation 1. There are currently no missing data. Additional variables can be added later on.

The binary response variable is the same across all Oscars categories:

Response Variable Description
y 1 for win, 0 for loss

The response variable is zero-inflated.

Initially only 20 years (1987 to 2006) of data were considered. Runtime is short, but not negligible, so the number of years of data included will be increased presently.

Limitations

Let's ignore the fact that films are vicarious socialising and in the worst cases supernormal stimulators. The real entertainment is building the model.

Roadmap

  • NEXT
    • More thorough diagnostic checks
    • Vectorise stan model - should decrease run time but not change results
    • Make predictions for 2007
    • Check how including older data modifies prediction accuracy
    • Include additional information:
      • Such as Golden Globe genre (drama, comedy, musical etc) nominations/wins
      • Consider interactions between variables
      • See Oscarmetrics by Ben Zauzmer
    • Expand to other categories:
      • Best Picture, Best Male actor and Best Female actress
        • More thought may be required for the Best Picture award given the increase in nominees
  • Update data
    • The number of best picture nominations expanded from 5 to 8, 9 or 10 in 2009
  • Improve documentation
    • Include some data set summary graphs
    • Expand the model description
      • Describe prior on player abilities
      • Describe hierarchal nature of model
      • Describe predictor selection
      • Describe validation approach
  • Learning to rank models are interesting in their own right so deserve a separate repository

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Data updates are especially welcome.

Alternatives

See Also

License

GPL-2

Releases

No releases published

Packages

No packages published