Skip to content

michaelmml/NLP-Information-Extraction

Repository files navigation

NLP-Information-Extraction

Automated PDF and text processing; information extraction from text based on grammatical structure

General NLP on Text (Applied on Company Transcripts)

PDF Plumber extraction techniques; general data cleaning and boxplots of word count / densities; centroid words with TF-IDF and extractive summarisation by ranking; topic modelling and clustering; grammatical trends via dependencies and parts-of-speech

Keywords, Nouns and Topic Analysis (Applied to Patent Extracts)

Data preprocessing and word clouds over time periods; statistical analysis - keyword extraction with TF-IDF; comparison against RAKE, GENSIM, Spacy; topic modelling with Latent Dirichlet Analysis; Named Entity Recognition; nouns with Matcher and frequency/momentum analysis; noun pairing and network graphs

Generalised Research (Applied to Web3 Continuous News Extracts)

Exploratory Data Analysis - frequency-based histograms and subplots; Summarisation with TFIDF centroid vectors; text statistics with PCA, K-means clustering; word2vec; graph centrality; formation of n-grams / phrases

image image image

About

Automated PDF and text processing with Spacy and NLTK; information extraction from text based on grammatical structure; deployed on extracted raw search data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published