#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 4,923 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated Jun 3, 2024
Python

LuisaGroup / LuisaRender

High-Performance Cross-Platform Monte Carlo Renderer Based on LuisaCompute

metal cpp gpu high-performance rendering cuda renderer ray-tracing optix path-tracing ispc siggraph-asia-2022

Updated Jun 3, 2024
C++

divyankachaudhari / CNN-with-CUDA

Implementing a CNN from scratch for MNIST classification in C++ using CUDA

machine-learning cpp cuda mnist mnist-classification

Updated Jun 3, 2024
Cuda

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Jun 3, 2024
Python

onediff

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

cuda pytorch lora lcm performance-optimization inference-engine diffusion-models stable-diffusion diffusers sd-webui comfyui sdxl aigc-serving lcm-lora stable-video-diffusion sdxl-turbo comfyui-workflow

Updated Jun 3, 2024
Python

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Updated Jun 3, 2024
Cuda

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Jun 3, 2024
Python

b0nes164 / GPUSorting

OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

unity cuda hlsl d3d12 radix-sort compute-shader onesweep deviceradixsort

Updated Jun 3, 2024
Cuda

gridhead / nvidia-auto-installer-for-fedora-linux

A CLI tool which lets you install proprietary NVIDIA drivers and much more easily on Fedora Linux (32 or above and Rawhide)

fedora cuda nvidia optimus hacktoberfest rpmfusion

Updated Jun 3, 2024
Python

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

shell c-plus-plus cuda speech speech-recognition speech-to-text kaldi speaker-verification speaker-id

Updated Jun 3, 2024
Shell

YdrMaster / cuda-driver

基于 CUDA Driver API 的 cuda 运行时环境

Updated Jun 3, 2024
Rust

janhq / cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated Jun 3, 2024
C++

deepflowio / deepflow

✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀

kubernetes gpu cuda wasm apm profiling distributed-tracing service-map opentelemetry llm

Updated Jun 3, 2024
Go

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Jun 3, 2024
C++

wi2trier / gpu-server

System configuration for a CUDA-based GPU server using Nix

nix server ubuntu gpu cuda system-config

Updated Jun 3, 2024
Nix

DefTruth / CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

cuda cuda-kernels gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce

Updated Jun 3, 2024
Cuda

QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

c-plus-plus hpc gpu mpi cuda high-performance-computing quantum-chemistry quantum-monte-carlo electronic-structure

Updated Jun 3, 2024
C++

radarsimx / radarsimpy

Radar Simulator built with Python and C++

simulation radar cuda raytracing

Updated Jun 3, 2024
Python

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

machine-learning compiler runtime tensorflow vulkan cuda pytorch spirv jax mlir

Updated Jun 3, 2024
C++

Sundance636 / Raster3D

A 3D render engine from scratch, using CUDA/C++.

graphics cuda gpu-acceleration

Updated Jun 2, 2024
Cuda

Created by Nvidia

Released June 23, 2007

Followers: 202 followers
Website: developer.nvidia.com/cuda-zone
Wikipedia: Wikipedia

Related Topics

nvcc