📺 Netflix analysis 📺

This project was completed by Chisimnulia Okoye, Sofia Kauser, and Shannon Watts.

Research proposal

Our project focuses on two datasets sourced from Kaggle. We chose these datasets as we love to watch Netflix ... but we had run out of tv shows and films to watch. We wanted to see how many of IMDBs top 1000 films and tv series were on Netflix. Surely the most rated and top grossing films are the most interesting? So we sought to know these films/tv shows and how many of them were not on Netflix.

Research Questions:

How many of IMDB’s top 1000 films are currently on Netflix?

What is the corresponding IMDB score for these films, has Netflix missed any major top rated films?

What release year are most common in IMDB’s top 1000? Possible suggestions for films to be added next month?

The Datasets:

We chose these datasets because we thought it would best illustrate what we wanted to find.

IMDB Movies Dataset https://www.kaggle.com/datasets/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows

Netflix Movies and TV Shows https://www.kaggle.com/datasets/shivamb/netflix-shows

Analysis 🔍

Our ETL on the Netflix and IMDB data allowed us to conclude some very interesting analysis...

How many of IMDB’s top 1000 films are currently on Netflix? We found that there were 8807 titles currently on Netflix that WERE NOT on the IMDB top 1000...

We found that the average rating for films and tv shows on Netflix that were also on IMDB top 1000 was 7.9 and grossed at approx 68,000,000 - not too bad, but we have provided some examples below on how they can increase their viewer ratings by adding higher IMDB rated films.

What is the corresponding IMDB score for these films, has Netflix missed any major top rated films?

We looked into the top five highest rated IMDB films that were not on netflix. They had an IMDB rating of at least 9. These included The Shawshank Redemption, The Godfather Part I and II, The Dark Knight, and 12 Angry Men. We would argue that Netflix is missing out by not showing these films.. no wonder people have decided to spend their weekends outide again...

What release year are most common in IMDB’s top 1000? Possible suggestions for films to be added next month?

We found that 2018 was the release year with the highest count of films and TV shows on Netflix and in the IMDB top 1000. We then decided to look at the top rated films on IMDB that are not currenrly showing on Netflix. We found that Capharnaum, Spider-Man: Into the Spider-Verse, Avengers:Infinity War, Tumbbad, and Andhadhun were missing from Netflix... We definitely agree that these should be added next month!

Extract, Transform & Load: how we came to our conclusions

Extract 📂

We decided to extract the two CSV files and examine both separately to see what we were working with.

Transform 🧹

Many of the columns were not needed so we dropped many of them and then we renamed the column heads in both DataFrames to that we could concatanate the two DataFrames:

Lastly, we wanted to change the null values in the rows 'IMDB_Rating', 'Meta_score', 'No_of_Votes', 'Gross' - because we had many null values as there is clearly not many shows and films on Netflix that are also in the top 1000 IMDB rated list. We changed this to 'not currently in IMDB top 1000' to make it clear.

Some errors enountered whilst Transforming Data

We did run into one main issue when we tried to load the data to PostgreSQL - so we had to retrun to the transform stage and figure out what the issue was. We kept receiving an operational error related to 'PG'. The error promopted us to look at the row with title 'Apollo 13'. Upon further examination we found that the there was an original error in the CSV file. The 'certificate' which was 'PG' had been listed in the 'release_year' column. We rectified this by using .loc to find the exact row with the error.

We changed the value to 0 - we recognise that this is an anomaly

Another major error we encountered whilst loading, was commas in the 'Gross' column to eliminate this we :

Lastly our third major error occured due to the change of null values in the rows 'IMDB_Rating', 'Meta_score', 'No_of_Votes', 'Gross'. We needed to perform analysis using these columns, in order to do this we had to revert back to the NaN values and this allowed us to perform the analysis we required.

Load 📠

We chose to load our DateFrames into PostgreSQL. We chose a relational database rather than a non-relational database (e.g. such as MongoDB) because we wanted to load our data into a fixed data template, visualise and manage the table easily. We also used a relatively small dataset (around 10,000 rows) which meant that PostgreSQL could handle our data and queries. We also wished to run queries on the data and view the results in tabular form.

Our data was now ready for analysis... A snapshot of which is below:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
FINAL_jupyter_notebook.ipynb		FINAL_jupyter_notebook.ipynb
Project 2_ Project Proposal#.docx		Project 2_ Project Proposal#.docx
README.md		README.md
movies_db.sql		movies_db.sql
netflix_etl (2).ipynb		netflix_etl (2).ipynb
netflix_etl.ipynb		netflix_etl.ipynb
nettfli_etl(3).ipynb		nettfli_etl(3).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

FINAL_jupyter_notebook.ipynb

FINAL_jupyter_notebook.ipynb

Project 2_ Project Proposal#.docx

Project 2_ Project Proposal#.docx

README.md

README.md

movies_db.sql

movies_db.sql

netflix_etl (2).ipynb

netflix_etl (2).ipynb

netflix_etl.ipynb

netflix_etl.ipynb

nettfli_etl(3).ipynb

nettfli_etl(3).ipynb

Repository files navigation

📺 Netflix analysis 📺

Research proposal

Analysis 🔍

Extract, Transform & Load: how we came to our conclusions

Extract 📂

Transform 🧹

Some errors enountered whilst Transforming Data

We changed the value to 0 - we recognise that this is an anomaly

Load 📠

About

Releases

Packages

Contributors 3

Languages

Shannon-Watts/netflix_analysis

Folders and files

Latest commit

History

Repository files navigation

📺 Netflix analysis 📺

Research proposal

Analysis 🔍

Extract, Transform & Load: how we came to our conclusions

Extract 📂

Transform 🧹

Some errors enountered whilst Transforming Data

We changed the value to 0 - we recognise that this is an anomaly

Load 📠

About

Topics

Resources

Stars

Watchers

Forks

Languages