Skip to content

Explore Titanic dataset to uncover survival insights. Analyze demographics, ticket class, and cabins. Understand factors influencing passenger survival.

Notifications You must be signed in to change notification settings

Manu-Abuya/Titanic-EDA-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Titanic Exploratory Data Analysis (EDA) Project

Overview

This project involves an Exploratory Data Analysis (EDA) of the Titanic dataset to understand the factors that influenced survival rates among passengers. The analysis covers various aspects, including demographics, ticket class, and cabin locations.

Dataset

The Titanic dataset used in this project is sourced from Kaggle. The dataset includes information about passengers on the Titanic, such as their age, gender, ticket class, and survival status.

Steps Taken

1. Data Loading and Exploration

  • Loaded the dataset using Pandas and explored its basic structure using info() and head() functions.
  • Identified columns with missing values and decided on appropriate handling strategies.

2. Data Cleaning

  • Handled missing values in the 'Age' column by filling them with the median age.
  • Handled missing values in the 'Embarked' column by filling them with the most common port of embarkation.
  • Documented any additional steps taken to clean the data.

3. Visualization of Passenger Demographics

  • Visualized the gender distribution and passenger class distribution using countplots.
  • Extracted insights into the composition of passengers based on gender and class.

4. Analysis of Survival Rates Based on Gender

  • Analyzed survival rates based on gender using a countplot.
  • Noted any significant findings or trends, such as higher survival rates for a particular gender.

5. Visualization of Ticket Class and Cabin Locations

  • Plotted countplots to visualize survival rates based on passenger class and a heatmap for survival by cabin.
  • Explored insights into the relationship between ticket class, cabin location, and survival.

6. Documentation and Reporting

  • Documented the steps taken in the Jupyter Notebook and provided explanations for each code section.
  • Included insights and findings discovered during the analysis.
  • Added visualizations to support key points.
  • Created a README file to guide users on how to reproduce the analysis.

Conclusion

The EDA provided valuable insights into the factors influencing survival rates on the Titanic. Gender and ticket class emerged as significant factors, with women having higher survival rates and passengers in higher classes having better chances of survival. The analysis also considered cabin locations, although the high number of missing values in the 'Cabin' column limited the depth of exploration in this aspect.

Future Work

  • Further exploration of the 'Cabin' variable, perhaps grouping cabins based on their locations.
  • Analysis of age groups to identify specific demographics with higher or lower survival rates.
  • Consideration of interactions between different variables and their combined impact on survival.

Dependencies

  • Python 3.x
  • Libraries: pandas, numpy, seaborn, matplotlib

Usage

  1. Clone the repository: git clone https://github.com/your-username/Titanic-EDA-Analysis.git
  2. Open and run the Jupyter Notebook: titanic_analysis.ipynb
  3. Explore the code, visualizations, and documentation to understand the analysis.

Contribution

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License.

About

Explore Titanic dataset to uncover survival insights. Analyze demographics, ticket class, and cabins. Understand factors influencing passenger survival.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published