-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid using Double in HashTable implementation #73583
base: main
Are you sure you want to change the base?
Conversation
@swift-ci please test |
@swift-ci please test |
} | ||
|
||
internal static func capacity(forScale scale: Int8) -> Int { | ||
let bucketCount = (1 as Int) &<< scale | ||
return Int(Double(bucketCount) * maxLoadFactor) | ||
return bucketCount * maxLoadFactor.numerator / maxLoadFactor.denominator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 32b and smaller systems, we plausibly might care about intermediate overflow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions? We could ifdef this change to apply just to #if $Embedded
, perhaps it would be reasonable to assume that 32-bit embedded systems will never have use a hashtable that's on the order of ~1 billion entries?
Or would using 64-bit arithmetics even on 32-bit systems be reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to want to tweak the load factor, @lorentey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but not on a whim! It'd be fine if it took some effort to tweak it.
(75% is very much on the lower end of the slider, so we're trading a bit of memory for lookup speed.)
capacity(forScale:)
and scale(forCapacity:)
are only ever called when allocating a new storage instance, so they do not need to be super quick. (Both the scale and the capacity are stored in the storage header.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, so would it be reasonable to replace the multiply-then-divide (possible overflow) with divide-then-multiply (no overflow) plus handling the remainder separately?
Int((Double(capacity) / maxLoadFactor).rounded(.up)), | ||
func divideRoundingUp(n: Int, by: Int) -> Int { (n + (by - 1)) / by } | ||
let minimumEntries = Swift.max(divideRoundingUp( | ||
n: capacity * maxLoadFactor.denominator, by: maxLoadFactor.numerator), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
Using floating point in the HashTable implementation is (1) unfriendly to embedded environments which might not have floating point support in hardware and/or would need to pay the codesize cost of software fp libraries, (2) seemingly not necessary.