quantization

Star

Here are 571 public repositories matching this topic...

hiyouga / LLaMA-Factory

Star

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 29, 2024
Python

intel / auto-round

Star

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor weight-only

Updated May 29, 2024
Python

huggingface / optimum-intel

Star

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated May 29, 2024
Jupyter Notebook

huggingface / optimum

Star

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated May 29, 2024
Python

autohdw / QuBLAS

Star

Quantized BLAS

template cpp blas quantization meta-programming cpp23

Updated May 29, 2024
C++

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

machine-learning deep-neural-networks deep-learning neural-network tensorflow optimizer pytorch quantization qat network-quantization network-compression edge-ai ptq

Updated May 29, 2024
Python

huggingface / quanto

Star

A pytorch Quantization Toolkit

pytorch quantization

Updated May 29, 2024
Python

AntonioGr7 / pratical-llms

Star

A collection of hand on notebook for LLMs practitioner

quantization llm llm-serving genai llm-training llm-inference llm-evaluation

Updated May 29, 2024
Jupyter Notebook

OpenNMT / CTranslate2

Star

Fast inference engine for Transformer models

Updated May 29, 2024
C++

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated May 29, 2024
Python

openvinotoolkit / training_extensions

Star

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated May 29, 2024
Python

cubicibo / piliq

Star

Lightweight Python PIL-libimagequant/pngquant interface with autonomous lib look-up.

python quantization pngquant libimagequant palettize

Updated May 29, 2024
Python

quic / aimet

Star

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated May 29, 2024
Python

Picovoice / picollm

Star

On-device LLM Inference Powered by X-Bit Quantization

natural-language-processing compression self-hosted llama language-models quantization language-model gemma mistral model-compression efficient-inference llm llms generative-ai large-language-model llama2 mixtral llm-infernece llama3

Updated May 29, 2024
Python

intel / intel-extension-for-pytorch

Star

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated May 28, 2024
Python

PINTO0309 / onnx2tf

Sponsor

Star

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.