Building applications with LLMs through composability, in Kotlin, Scala, ...
-
Updated
Jun 4, 2024 - Kotlin
Building applications with LLMs through composability, in Kotlin, Scala, ...
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
Use custom tokenizers in spacy-transformers
Small library that provides functions to tokenize a string into an array of words with or without punctuation
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
A graphical user interface for the Elasticsearch Analyze API
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Visualize some important concepts related to LLM architectures.
Fine tuning pre-trained transformer models in TensorFlow and in PyTorch for question answering
Question and Answer web applicaiton using fine-tuned and pre-trained T5 models. Application runs on Streamlit.
Package to align tokens from different tokenizations.
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Bachelor Thesis Repository. Wsm-tokenizer (word shape mapping) uses vocabulary comparisons to find probable morphemes in lexemic tokens.
NLP Dataset Creation and Semantic Search Demonstration
Add a description, image, and links to the tokenizers topic page so that developers can more easily learn about it.
To associate your repository with the tokenizers topic, visit your repo's landing page and select "manage topics."