-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use function pointers for runtime dispatching #303
Comments
Is it your expectation that the code you are proposing is thread safe? I think you need atomic pointers. Otherwise: pull requests are invited. |
Good point. Are indirect calls atomic in general? In ARM? x86? |
Even if the underlying hardware does it for free, you will still get flagged by the sanitizers. The |
See CRoaring/include/roaring/isadetection.h Lines 170 to 198 in a5656ae
|
Note that we don't do runtime dispatching under ARM. With ARM, NEON is usually available by default so you would not need it now. Even so, we may hope that, for that reason, compilers do a good job using NEON via autovectorization if the proper optimization flags are provided under ARM. We only do runtime dispatching for AVX2 support. It is where the gains are most important since compilers won't emit AVX2 by default. But most people have AVX2 support at this point in time. So there is huge performance gap there. |
Note that I had a terrible performance bug in my initial implementation. Fixed by 6403d44 I would call The static inline uint32_t croaring_detect_supported_architectures() {
static std::atomic<int> buffer{CROARING_UNINITIALIZED};
if(buffer == CROARING_UNINITIALIZED) {
buffer = dynamic_croaring_detect_supported_architectures();
}
return buffer;
} It should be quite cheap. Note that, in CRoaring, we use runtime dispatch strategically. It is not used for processing small blocks of data, it is always to process, e.g., a whole container or aggregate two containers. Of course, if you a ton of small containers, there is some overhead, but then you have other problems then. |
Rather than branching each time, consider doing something like this:
This way after the first call all calls will be direct. It may give a tiny performance gain.
The text was updated successfully, but these errors were encountered: