Skip to content

This repository contains the implementation of known formulas in the field of Data Mining / Machine Learning / Statistics using Python and the Numpy library.

Notifications You must be signed in to change notification settings

MigeruDev/numpy-formulas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NUMPY FORMULAS

Implementation of math known formulas in Numpy

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

I made this repo in order to improve my mathematical python skills. I saw it necessary because I was taking the Data Mining course at my University. In this course I learned a lot of things about distances, matrices, proximities, etc. And I took the opportunity to get a little fun with the Numpy library. Feel free to use it if it is useful to you or to improve it if you think so! ✌

Logo

Image taken from realpython.com/numpy-tutorial/


If you like this Repo, Please click the ⭐

Contents

Distances

Distance measures play an important role in machine learning.

A distance measure is an objective score that summarizes the relative difference between two objects in a problem domain. Most commonly, the two objects are rows of data that describe a subject (such as a person, car, or house), or an event (such as a purchase, a claim, or a diagnosis).

Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

Proximity Measure

Proximity measures refer to the Measures of Similarity and Dissimilarity. Similarity and Dissimilarity are important because they are used by a number of data mining techniques, such as clustering, nearest neighbour classification, and anomaly detection.

Impurity Measure

Measure of impurity is very important for any tree based algorithms, it will mainly helps us to decide the root node.

In a given dataset that contains class for the predicted/dependent variable (like Yes,No,Neutral etc..), we can measure homogeneity or heterogeneity of the table based on the classes. We say a dataset is pure or homogeneous if it contains only a single class(either YES or NO). If a dataset contains several classes, then we say that the table is impure or heterogeneous(Combination of YES and NO). There are several ways to measure degree of impurity. Most well known ways to measures are given below:

Contact

Miguel Ángel Macías - 👨‍💻Linkedin

My Personal Website: ✨mangelladev.com

About

This repository contains the implementation of known formulas in the field of Data Mining / Machine Learning / Statistics using Python and the Numpy library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages