Skip to content

Angel0726/Kaggle-Quora

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Kaggle Quora Questions Pairs Competition

14th place solution. My part. Code is uncleaned, latest versions are uploaded. Not every feature, that can be created with features notebooks was contained in final model - idea of this repository is to give more of an overview of methods used and those that could be used for similar problems.

Big thanks to the authors of all kernels & posts, which were of great inspiration and some features were derived based on them.

Features

  • Data Encoding:
    • Pipeline for text cleaning using Textacy
    • Lemmatization
    • Stemming
    • NER Encoding (based on Kernel)
  • NLP Features:
    • Features based on Kaggle Kernels & Discussions posts by: Abhishek, SRK, Jared Turkewitz, the_1owl, Mephistopheles & more
    • Latent Semantic Analysis, Latent Dirichlet Allocation, tSVD
    • Word2Vec
    • Doc2Vec
    • Distances based on data transformations - similarity measures
    • Textacy-based features
    • KNN-based features
  • Magic Features:
    • Jared Turkewitz's frequency features
    • NetworkX features

& some more.

Models:

  • XGB & LGBM models
    • Training
    • BayesianOptimization
    • Test Predictions
  • SpaCy Decomposable Attention Model on Quora data
  • LSTM Experiments
  • MLP models
  • Stacking
    • Sklearn Models Ensemble
    • Stacking with LGBM
    • Finding weights for ensemble using Scipy minimize function in-fold

About

Kaggle Quora Questions Pairs Competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.4%
  • Python 1.6%