Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 2.1.0

Latest
Compare
Choose a tag to compare
@alliepiper alliepiper released this 08 Mar 22:03
· 126 commits to main since this release

Breaking Changes

  • #553: Deprecate the CUB_USE_COOPERATIVE_GROUPS macro, as all supported CTK distributions provide CG. This macro will be removed in a future version of CUB.

New Features

  • #359: Add new DeviceBatchMemcpy algorithm.
  • #565: Add DeviceMergeSort::StableSortKeysCopy API. Thanks to David Wendt (@davidwendt) for this contribution.
  • #585: Add SM90 tuning policy for DeviceRadixSort. Thanks to Andy Adinets (@canonizer) for this contribution.
  • #586: Introduce a new mechanism to opt-out of compiling CDP support in CUB algorithms by defining CUB_DISABLE_CDP.
  • #589: Support 64-bit indexing in DeviceReduce.
  • #607: Support 128-bit integers in radix sort.

Bug Fixes

  • #547: Resolve several long-running issues resulting from using multiple versions of CUB within the same process. Adds an inline namespace that encodes CUB version and targeted PTX architectures.
  • #562: Fix bug in BlockShuffle resulting from an invalid thread offset. Thanks to @sjfeng1999 for this contribution.
  • #564: Fix bug in BlockRadixRank when used with blocks that are not a multiple of 32 threads.
  • #579: Ensure that all threads in the logical warp participate in the index-shuffle for BlockRadixRank. Thanks to Andy Adinets (@canonizer) for this contribution.
  • #582: Fix reordering in CUB member initializer lists.
  • #589: Fix DeviceSegmentedSort when used with bool keys.
  • #590: Fix CUB’s CMake install rules. Thanks to Robert Maynard (@robertmaynard) for this contribution.
  • #592: Fix overflow in DeviceReduce.
  • #598: Fix DeviceRunLengthEncode when the first item is a NaN.
  • #611: Fix WarpScanExclusive for vector types.

Other Enhancements