-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamMap slow? #353
Comments
Thanks for filing this: I think the debug builds are vastly overestimating the cost of the
Incidentally, this is why |
One note: we could replace |
Just reading the header comment in It also dismisses dictionaries, due to the computational load of computing hashes. But couldn't we just make |
I've noticed release builds being way faster too, so maybe all of this is a non-issue. I'll check when I have time 👍 |
This totally works, but the tradeoff isn't straightforward. For small dictionaries (in this case, fewer than 100 elements), it's cheaper to do a linear array scan to find the element than it is to hash the key and do a jump. This is fairly non-intuitive but comes down to the way branch predictors and cache preloaders work: the memory subsystem can observe and optimise this access pattern, meaning that we're never waiting for memory, in a way that it cannot do with hashing. The data structure is therefore optimised for small size. For larger sizes, we do binary search instead, which is still extremely cheap. |
Thanks for the technical explanation! I'm glad a lot of thought went into this. Diving a bit deeper, I'm seeing a lot of the work is actually coming from |
The Concretely, those 24 bits store the length of the buffer such that we can detect if some code illegally stores and reuses buffer offsets across resizes. That's illegal because (like other many other data structures) the actual indexes get invalidates if you add/remove elements. |
This is a WIP issue as I plan to investigate further.
I'm working on a
swift-nio
project that has an emphasis on large file transfers. Getting a stream of bytes out of the server is very fast, but uploading a stream is around 3x slower - even when the incoming buffers aren't handled/stored at all.I've profiled in instruments and see a lot of traffic (around 15%) in
StreamMap
that's just trying to pick a struct out of a collection based on a stream ID. On first glance, it seems to me that we can probably do away withCircularBuffer
, in favour of something more simple. I think the fact that stream IDs start at zero, and increment from there (but may arrive out of order), potentially opens up some optimisation opportunities.Note that this performance bottleneck was noticed in debug builds. I'll report back with release build findings.
The text was updated successfully, but these errors were encountered: