Kernel Tuner
-
Updated
Jun 6, 2024 - Python
Kernel Tuner
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
CUDA C++ Core Libraries
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
Spiral's Machine Learning Library
Safe rust wrapper around CUDA toolkit
A tool for examining GPU scheduling behavior.
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
From zero to hero CUDA for accelerating maths and machine learning on GPU.
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Just a few cuda kernels with ability to use it from python as dll
CUDA Kernel Benchmarking Library
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
Implement Neural Networks in Cuda from Scratch
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
C++ cross-platform gpu SDK
This repos contains a personnal project created within one week. It can generate fractals pictures based on a Julia Set, and explore such a fractal in real time (zoom in and out, go left, right, up and down)
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."