Data Toolkit for Sailor Language Models
-
Updated
May 15, 2024 - Python
Data Toolkit for Sailor Language Models
PolyDeDupe: Multi-Lingual Data Deduplication
Fellow is a package for creating people that can be unified by their shared values via a singleton list on the class
A calculator for storage and transmission of deduplicated data presentation in charts and tables
Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.
Self-contained C# library for data deduplication using Sqlite
Data deduplication engine, supporting optional compression and public key encryption.
A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.
General deduping engine for JDBC sources with output to JDBC/csv targets
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
This is a server client architecture based data deduplication implementation
Practical backups. The Unix toolkit way.
Add a description, image, and links to the data-deduplication topic page so that developers can more easily learn about it.
To associate your repository with the data-deduplication topic, visit your repo's landing page and select "manage topics."