A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 3, 2024 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
High-Performance Cross-Platform Monte Carlo Renderer Based on LuisaCompute
Implementing a CNN from scratch for MNIST classification in C++ using CUDA
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
OneDiff: An out-of-the-box acceleration library for diffusion models.
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
kaldi-asr/kaldi is the official location of the Kaldi project.
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
Radar Simulator built with Python and C++
A retargetable MLIR-based machine learning compiler and runtime toolkit.
A 3D render engine from scratch, using CUDA/C++.
Created by Nvidia
Released June 23, 2007