GitHub - ndinhtuan/data_mining: Some common simple data mining algorithm

Common Data Mining Algorithm

1.Apriori Algorithm

This algorithm is used for mining transaction database. It can be obtaned knowledge about what items are bought at same time by a customer. I follow http://www.mathcs.emory.edu/~cheung/Courses/584-StreamDB/Syllabus/10-Mining/Apriori.html to speed up algorithm

Apriori Algorithm consist of 3 phase:

Generate candidate set Ck
Counting each set c in Ck on transitions T
Remove set c not bigger than supmin

Phase 1 : With phase 1 we can Brute Force generate all possible candidate with : add any item haven't been in set. -> Gen big candidate But that way is slow, I have gennerate with self-join of Ck-1: F1, F2 in Ck-1, F1 can join F1 if and only if first k-2 item of F1 and F2 are same.

Phase 2: We can loop all possible transaction and all possible candidate and increase counter when see each candidate in each transaction but it need so much computation. So we can use hash-tree to save candidate in hash-tree and then generate k-subsets of transaction, from that use each subsets to increase counter of candidate sets

Step 1: Build sub-transaction of origin transaction.

Step 2: Build hash-tree include Candidate set.

Step 3: Iterating over all sub-transaction on hash-tree, and increate counter of canidate sub-transaction visit.

2. KNN Algorithm

Step 1: Determining number K (number of neighbours we will evaluate), data D (feature and label).

Step 2: Find Similar measure on of test data on each training data, find K sample of training set has largest similar.

Step 3: Find Most appearance label on K label and assign that label for test sample.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
apriori.py		apriori.py
hashtree_apriori.py		hashtree_apriori.py
knn.py		knn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

apriori.py

apriori.py

hashtree_apriori.py

hashtree_apriori.py

knn.py

knn.py

Repository files navigation

Common Data Mining Algorithm

1.Apriori Algorithm

2. KNN Algorithm

Advantage:

Disadvantage:

3. Decision Tree - ID3

4. K-Means

5. HAC

6. Bayes Naive

About

Releases

Packages

Languages

ndinhtuan/data_mining

Folders and files

Latest commit

History

Repository files navigation

Common Data Mining Algorithm

1.Apriori Algorithm

2. KNN Algorithm

Advantage:

Disadvantage:

3. Decision Tree - ID3

4. K-Means

5. HAC

6. Bayes Naive

About

Topics

Resources

Stars

Watchers

Forks

Languages