Skip to content

chaitanyakasaraneni/nlp_pipeline

Repository files navigation

Understanding NLP Pipeline

Link to medium article: click here

NLP

Natural Language Processing Wordcloud (Source: Wootric)

This repository contains examples for stages in Natural Language Processing (NLP) Pipeline. The NLP Pipeline involves the following stages.

  1. Text Processing
    • Cleaning
    • Normalization
    • Tokenization
    • Stop Word Removal
    • Part of Speech Tagging
    • Named Entity Recognition
    • Stemming and Lemmatization
  2. Feature Extraction
    • Bag of Words
    • TF-IDF
    • One-hot Encoding
    • Word Embeddings
  3. Modeling

NLP Pipeline

Natural Language Processing Pipeline

Files

  • text_processing.ipynb: This file explains the various stages involved in text processing.
    NLP Pipeline - Stages in Text Processing

Stages in Text Processing

  • bow_tfidf_example.ipynb: This file contains examples on Bag of Words (BOW) model and Term Frequency — Inverse Document Frequency (TF-IDF) calculation.

  • beautiful_soup_example.ipynb: This file contains example showing usage of BeautifulSoup library on crawling data from real estate agents section of realtor.com website

Note: Hope you gained some knowledge reading this article. Please remember that this article is just an overview and my understanding of NLP pipeline that I read from various online sources.

Note: Examples on more stages coming soon