The collections of tools for testing and dumping LLMs
-
Updated
May 23, 2024 - Python
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
The collections of tools for testing and dumping LLMs
⛄ Possibly the smallest Lua compiler ever
DOM-aware tokenization for Hugging Face language models
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
🧪 N-Gram Tools for 🙃 Phony Language that includes features like sanitizing, tokenization, n-gram extraction, frequency mapping.
[READ ONLY] Locate available classes by parent, interface or trait. Subtree split of the Spiral Tokenizer component (see spiral/framework)
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
Taiwanese Hokkien Transliterator and Tokeniser
Taiwanese Hokkien Transliterator and Tokeniser
An elegant Math Parser written in Lua, featuring support for adding custom operators and functions
Tokenization utilities for building parsers in Rust
Lua Compiler, (De)Obfuscator, Minifier, Beautifier, And more
Oxide is a hybrid database and streaming messaging system (think Kafka + MySQL); supporting data access via REST and SQL.