Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



32 Commits

Repository files navigation

USD Probability and Statistics for Artificial Intelligence (AAI-500-02) Final Project

In this repository, we'll be using a NHCS dataset on patient use of drugs to understand patterns of health care delivery and utilization in the U.S. with a focus on opioid and drug overdose from 2020-2023.

For our final project, we'll perform the following operations on this dataset:

  • Data Cleaning/Preparation
  • Exploratory Data Analysis
  • Model Selection
  • Model Analysis
  • Conclusion and Recommendations.

Dataset Overview

The NHCS collects data on patient care in hospital-based settings to offer insights into health care delivery patterns in the U.S. While the data from 2020-2023 is preliminary and not nationally representative, it can provide valuable insights into the use of opioids and other overdose drugs.

Dataset Summary

  • Data from 25 hospitals for inpatient and 25 hospitals for emergency departments (ED).
  • Data spans from January 1, 2020, to May 27, 2023.
  • The dataset includes information on various indicators related to drug use, such as overall drug use, comorbidities, drug, and polydrug overdose.

Repository Structure

  • data/: Folder containing the raw data.
  • notebooks/: Jupyter notebooks for data analysis, model selection, and evaluation.
  • This file.

Report Sections

  1. Introduction

    • Background of the NHCS and its importance.
    • Overview of the opioid crisis and the relevance of the dataset.
  2. Data Cleaning/Preparation

    • Data wrangling steps.
    • Handling missing values, outliers, and data transformations.
  3. Exploratory Data Analysis

    • Data distributions, trends, and patterns.
    • Visualizations of key metrics and features.
  4. Model Selection

    • Criteria for model selection.
    • Comparisons of different models and their performance metrics.
  5. Model Analysis

    • Detailed analysis of the selected model.
    • Feature importance, model evaluation, and validation.
  6. Conclusion and Recommendations

    • Key findings from the analysis.
    • Recommendations for health care policy, hospital practices, and further research.
  7. Appendix

    • Output of code from the technical Jupyter Notebook.

Getting Started

  1. Clone this repository.
  2. run jupyter notebook
  3. Navigate to the notebooks/ directory and open the project notebook.
  4. Follow the instructions in the notebook to run the analysis.

Tools and Technologies

  • Python
  • Jupyter Notebook
  • Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn


Brian Morris, Will Kencel


This project is licensed under the MIT License.


Final Project for USD course 500_501 fall 2023. EDA for a medical dataset






No releases published


No packages published