Avoid using Double in HashTable implementation #73583

kubamracek · 2024-05-11T22:44:33Z

Using floating point in the HashTable implementation is (1) unfriendly to embedded environments which might not have floating point support in hardware and/or would need to pay the codesize cost of software fp libraries, (2) seemingly not necessary.

kubamracek · 2024-05-11T22:44:51Z

@swift-ci please test

stdlib/public/core/HashTable.swift

kubamracek · 2024-05-12T02:07:31Z

@swift-ci please test

stephentyrone · 2024-05-14T17:33:41Z

stdlib/public/core/HashTable.swift

  }

  internal static func capacity(forScale scale: Int8) -> Int {
    let bucketCount = (1 as Int) &<< scale
-    return Int(Double(bucketCount) * maxLoadFactor)
+    return bucketCount * maxLoadFactor.numerator / maxLoadFactor.denominator


For 32b and smaller systems, we plausibly might care about intermediate overflow here.

Any suggestions? We could ifdef this change to apply just to #if $Embedded, perhaps it would be reasonable to assume that 32-bit embedded systems will never have use a hashtable that's on the order of ~1 billion entries?

Or would using 64-bit arithmetics even on 32-bit systems be reasonable?

Do we expect to want to tweak the load factor, @lorentey?

Perhaps, but not on a whim! It'd be fine if it took some effort to tweak it.

(75% is very much on the lower end of the slider, so we're trading a bit of memory for lookup speed.)

capacity(forScale:) and scale(forCapacity:) are only ever called when allocating a new storage instance, so they do not need to be super quick. (Both the scale and the capacity are stored in the storage header.)

Aha, so would it be reasonable to replace the multiply-then-divide (possible overflow) with divide-then-multiply (no overflow) plus handling the remainder separately?

stephentyrone · 2024-05-14T17:34:17Z

stdlib/public/core/HashTable.swift

-      Int((Double(capacity) / maxLoadFactor).rounded(.up)),
+    func divideRoundingUp(n: Int, by: Int) -> Int { (n + (by - 1)) / by }
+    let minimumEntries = Swift.max(divideRoundingUp(
+      n: capacity * maxLoadFactor.denominator, by: maxLoadFactor.numerator),


Avoid using Double in HashTable implementation

bd64f50

kubamracek requested a review from lorentey May 11, 2024 22:44

kubamracek requested a review from a team as a code owner May 11, 2024 22:44

kubamracek requested review from phausler, rauhul, zoecarver and eeckstein May 11, 2024 22:44

rauhul reviewed May 11, 2024

View reviewed changes

stdlib/public/core/HashTable.swift Outdated Show resolved Hide resolved

NFC HashTable.swift it's 'numerator' and not 'nominator'

177163c

kubamracek added the embedded Embedded Swift label May 12, 2024

stephentyrone reviewed May 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid using Double in HashTable implementation #73583

Avoid using Double in HashTable implementation #73583

kubamracek commented May 11, 2024

kubamracek commented May 11, 2024

kubamracek commented May 12, 2024

stephentyrone May 14, 2024

kubamracek May 14, 2024

stephentyrone May 14, 2024

lorentey May 30, 2024

kubamracek May 30, 2024

stephentyrone May 14, 2024

Avoid using Double in HashTable implementation #73583

Are you sure you want to change the base?

Avoid using Double in HashTable implementation #73583

Conversation

kubamracek commented May 11, 2024

kubamracek commented May 11, 2024

kubamracek commented May 12, 2024

stephentyrone May 14, 2024

Choose a reason for hiding this comment

kubamracek May 14, 2024

Choose a reason for hiding this comment

stephentyrone May 14, 2024

Choose a reason for hiding this comment

lorentey May 30, 2024

Choose a reason for hiding this comment

kubamracek May 30, 2024

Choose a reason for hiding this comment

stephentyrone May 14, 2024

Choose a reason for hiding this comment