NLP-Information-Extraction

Automated PDF and text processing; information extraction from text based on grammatical structure

General NLP on Text (Applied on Company Transcripts)

PDF Plumber extraction techniques; general data cleaning and boxplots of word count / densities; centroid words with TF-IDF and extractive summarisation by ranking; topic modelling and clustering; grammatical trends via dependencies and parts-of-speech

Keywords, Nouns and Topic Analysis (Applied to Patent Extracts)

Data preprocessing and word clouds over time periods; statistical analysis - keyword extraction with TF-IDF; comparison against RAKE, GENSIM, Spacy; topic modelling with Latent Dirichlet Analysis; Named Entity Recognition; nouns with Matcher and frequency/momentum analysis; noun pairing and network graphs

Generalised Research (Applied to Web3 Continuous News Extracts)

Exploratory Data Analysis - frequency-based histograms and subplots; Summarisation with TFIDF centroid vectors; text statistics with PCA, K-means clustering; word2vec; graph centrality; formation of n-grams / phrases

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
CompanyKnowledgeGraph.ipynb		CompanyKnowledgeGraph.ipynb
CompanyTranscript.ipynb		CompanyTranscript.ipynb
LICENSE		LICENSE
PatentKeywords.ipynb		PatentKeywords.ipynb
README.md		README.md
ResearchTopicModelling.ipynb		ResearchTopicModelling.ipynb
TFIDFSummarizer.py		TFIDFSummarizer.py
TextCleaner_Contractions.py		TextCleaner_Contractions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompanyKnowledgeGraph.ipynb

CompanyKnowledgeGraph.ipynb

CompanyTranscript.ipynb

CompanyTranscript.ipynb

LICENSE

LICENSE

PatentKeywords.ipynb

PatentKeywords.ipynb

README.md

README.md

ResearchTopicModelling.ipynb

ResearchTopicModelling.ipynb

TFIDFSummarizer.py

TFIDFSummarizer.py

TextCleaner_Contractions.py

TextCleaner_Contractions.py

Repository files navigation

NLP-Information-Extraction

General NLP on Text (Applied on Company Transcripts)

Keywords, Nouns and Topic Analysis (Applied to Patent Extracts)

Generalised Research (Applied to Web3 Continuous News Extracts)

About

Releases

Packages

Languages

License

michaelmml/NLP-Information-Extraction

Folders and files

Latest commit

History

Repository files navigation

NLP-Information-Extraction

General NLP on Text (Applied on Company Transcripts)

Keywords, Nouns and Topic Analysis (Applied to Patent Extracts)

Generalised Research (Applied to Web3 Continuous News Extracts)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages