CUDA Bitonic Sort

If you're looking for a in-place GPU comparison based sorting library that requires zero extra GPU memory, this might be what you're looking for. This repository contains a header-only implementation of bitonic sort. It is not the fastest sorting algorithm, but it might be good enough for your application in which GPU memory is more precious.

The code is adapted from https://github.com/darkobozidar/sequential-vs-parallel-sort and is refactored to be concise and support arbitrary types and comparators.

Basic usage

#include "bitonic_sort.h"

...

int* keys;
int* values;
int num_items;
...

// sort keys only
bitonic::sort(keys, num_items);

// sort key-value pairs
bitonic::sort_by_key(keys, values, num_items);

Since this algorithm operates in-place, we only support pointer types. So no fancy iterators.

Custom comparator

Example: sort pointers by their pointed-to value

struct Compare {
  __device__ __host__ bool operator()(float* a, float* b) const {
    return *a < *b;
  }
};
...
float** data;
int N;
...
// here 0 means we're using the default stream
bitonic::sort(data, N, 0, Compare());

Performance benchmark

We compared the performance of bitonic sort with state-of-the-art implementations of sorting algorithns in CUB on Tesla V100 and Tesla A100, and produced the graphs below.

The main takeaway is that if performance is all you need, then CUB/thrust might be a better choice. Bitonic sort is only comparable to other sorting methods for small array sizes (< 10^5 elements). The advantage of bitonic sort is that, unlike CUB merge sort or radix sort, it does not require an extra copy of keys and values during the sorting process, which halves the memory usage.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark-a100.png		benchmark-a100.png
benchmark-h100.png		benchmark-h100.png
benchmark-v100.png		benchmark-v100.png
benchmark.cu		benchmark.cu
bitonic_sort.h		bitonic_sort.h
compile.sh		compile.sh
plot.py		plot.py
tests.cu		tests.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

benchmark-a100.png

benchmark-a100.png

benchmark-h100.png

benchmark-h100.png

benchmark-v100.png

benchmark-v100.png

benchmark.cu

benchmark.cu

bitonic_sort.h

bitonic_sort.h

compile.sh

compile.sh

plot.py

plot.py

tests.cu

tests.cu

Repository files navigation

CUDA Bitonic Sort

Basic usage

Custom comparator

Performance benchmark

V100 SXM2 32GB

A100 SXM4 80GB

H100 80GB

About

Releases

Packages

Languages

License

hanzhi713/bitonic-sort

Folders and files

Latest commit

History

Repository files navigation

CUDA Bitonic Sort

Basic usage

Custom comparator

Performance benchmark

V100 SXM2 32GB

A100 SXM4 80GB

H100 80GB

About

Topics

Resources

License

Stars

Watchers

Forks

Languages