Skip to content

farnazgh/Farsi-letters-distribution-and-Entropy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

farsi-characters-distribution-and-Entropy

Dataset(Zabra): data (ISNA news) includes 7 topics and each topic with 10 documents and at the first part of the code, we make a list for each topic and enter the docs in related list

This code includes 6 phases:

###a

By considering equal probabilty for each farsi letters, Entropy for each char is calculated

###b

Calculating entropy for each farsi letters and also puntuatuion marks (علائم سجاوندی) by using the probabilty extracted from dataset

###g

Like part b but just for farsi letters (without considerng punctuation marks)

###d

Calculation expectation length of Persian words form dataset

###h

Normal histogram in Persian letters (sorted)

###v

Histogram for length of words