Skip to content
#

tokenization

Here are 799 public repositories matching this topic...

This project utilizes a machine learning model where consumer brand data is employed. Initially, a preliminary model is developed, followed by a refined model using a process called 'fine-tuning' to improve results. Additionally, a comprehensive testing suite has been created to validate accuracy and reliability of the model's predictions.

  • Updated Feb 8, 2024
  • Jupyter Notebook

The project aims to build a search engine for EncyclEarthpedia by retrieving and processing content from Wikipedia articles, despite the unavailability of their database and API. Key tasks include retrieving Wikipedia content, cleaning and processing text data, tokenizing the content, counting token frequency, and visualizing the mostfrequenttokens

  • Updated Aug 7, 2023
  • Jupyter Notebook

[Tokenization, Topic Modeling, Sentiment Analysis, Network of Bigrams] The purpose of this project is to see if text mining techniques can ease better analysis for categorizing movies with just the Descriptions while ignoring the Genre from the dataset, IMDB_movies.csv, which is stored under the data frame variable, movies_desc. Tokenization (TF…

  • Updated Oct 29, 2022
  • HTML

Improve this page

Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."

Learn more