Skip to content

theHamsta/petalisp-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

petalisp-cuda

A CUDA backend for Petalisp. The project is in an early experimental phase.

Usage

(quicklisp:quickload :petalisp-cuda)

;; use petalisp-cuda as petalisp:*backend*
(petalisp-cuda:use-cuda-backend)

;; when you want to the petalisp-cuda for a specific code section
;; with potential re-use of already allocated CUDA resources
(petalisp-cuda:with-cuda-backend
    ...)

;; with-cuda-backend-raii will free all CUDA resources
;; and destroy the backend after the calculations
(petalisp-cuda:with-cuda-backend-raii
    ...)

Requirements

  • CUDA toolkit
  • CUDNN (optional)

Installation

With quicklisp installed, clone this repository to your local projects folder:

git clone git@github.com:theHamsta/petalisp-cuda.git

It is recommended to use my fork of cl-cuda that performs disk caching to ensure that the same kernel is not compiled multiple times.

git clone git@github.com:theHamsta/cl-cuda.git

TODOs

  • [x] sort indices for fasted dimensions
  • [ ] hash CUDA array strides to recompile correctly with arrays not allocated in C-layout with alternative memory layouts
  • [x] fix super-slow lisp->native, native->lisp calls (for Lisp arrays of element-type single-float/double-float)
  • [x] compile kernels not only for fixed iteration spaces
  • [x] implement reductions with CUDNN (TODO: automatic integration into compute graph)
  • [x] implement convolutions with CUDNN (TODO: automatic integration into compute graph)
  • [ ] __restrict__ kernel parameters
  • [x] infer function parameters for generated __device__ functions
  • [x] Load scalars over __constant__ memory when loading from GPU RAM instead of Host RAM (transfered as kernel arguments)
  • [ ] fast interpolation using textures (?)
  • [ ] half float, bfloat support
  • [ ] optimized transposed memory accesses via cub.h
  • [ ] optimized memory accesses via stride tricks
  • [ ] static scheduler
  • [x] use cudartc library for faster JIT compilation

Debugging

You can activate logging in CUDNN by setting

export CUDNN_LOGDEST_DBG=stdout
export CUDNN_LOGINFO_DBG=1

or stderr or filename.txt

cl-cuda will print a lot of stuff to stdout. This behavior can be controlled with cl-cuda:*show-messages*.

Author

  • Stephan Seitz (stephan.seitz@fau.de)

Copyright

Copyright (c) 2020 Stephan Seitz (stephan.seitz@fau.de)

License

Licensed under the GPLv3 License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published