Skip to content

Global NIPS Paper Implementation Challenge - Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

License

Notifications You must be signed in to change notification settings

albertusk95/nips-challenge-plagiarism-detection-vsm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Global NIPS Paper Implementation Challenge

I implemented the paper based on the research methodology

Original Paper

https://arxiv.org/pdf/1412.7782.pdf

Main Goal

Develope an effective plagiarism detection tool for text based assignments by comparing unigram, bigram, and trigram of vector space model with cosine and jaccard similarity measure

Programming Tools

  • Python 2.7
  • scikit-learn
  • NLTK

Files

Several important files / directories:

  • main.py

    Main file containing the whole source code

  • docs

    A directory containing students answer. Each answer is stored in a document having specified file name, namely assignment_index. The word assignment is fixed and word index is an integer that will be incremented each time a new student is added

  • combined_docs

    Each student answer will be combined into one document called MASTER Document. The detection processes will be done using this combined document

To Run

To run the program, execute the following command:

python main.py

Methodology

  • Combining students answer into one single answer file (MASTER DOCUMENT)

  • Extract unique words (unigram, bigram, trigram) from the MASTER DOCUMENT

  • Eliminate stopwords

  • Compute Document Frequency (DF) and Inverse Document Frequency (IDF) for each term

  • Compute TF-IDF Weight Vector for each document

  • Compare each pair of assignment using Cosine Similarity

  • Compare each pair of assignment using Jaccard Similarity


Albertus Kelvin
Bandung Institute of Technology

Code was developed on January 20th, 2018
Code was made publicly available on January 31st, 2018

About

Global NIPS Paper Implementation Challenge - Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages