New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Reduce overhead of configurable data allocation strategy (NEP49) #21531
Conversation
5bc581f
to
10dded6
Compare
@seberg Only documentation is failing. I cannot figure out what is wrong (I am trying to make a reference to the NEP49) |
Hmmm, it looks right to me. The issue is just that we use intersphinx, so the link is broken until the NEPs get deployed. |
The alternative will be to make a tiny PR just adding the |
numpy/core/src/multiarray/alloc.c
Outdated
_PyDataMem_GetHandler_Internal(PyObject *mem_handler) | ||
{ | ||
#if (!defined(PYPY_VERSION_NUM) || PYPY_VERSION_NUM >= 0x07030600) | ||
if (default_allocator_policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to do PyDataMem_DefaultHandler == mem_handler
here instead? That way it will remain working if the default had been changed once.
Otherwise, I am OK with this (modulo possible style nitpicks), so long @mattip doesn't mind the added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip I rebased to main. Could you review?
10dded6
to
c4cbcb2
Compare
c4cbcb2
to
34e5e02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine. I will be happy if anyone comes along to remove the micro-optimization again, but happy to trust that it is worthwhile for now.
I made two suggestions for long lines, and expanded the comment to explicitly note that it is a micro-optimization.
If no-one else comments, will apply the suggestions and merge tomorrow (or so).
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
Sorry to come back to this after all. Could you check that these are truly worth it (e.g. show the benchmarks?). |
Fair enough. Some numbers are in the issue #21488. I recall that the improvement was a little bit different between windows and linux, so I will try to run them both. |
@seberg Benchmark on a linux system for
Test script:
I also run the numpy ufunc benchmarks. They show both speedups and slowdowns, but I believe these are just random fluctuations (I see them every time I run the benchmarks). Benchmark results for ufunc
|
In both the |
Just posting for posterity, the "scalar" benchmarks are probably most clear about the best case speedups some operations can get. (The rest tends too have too much noise).
|
I agree with @rossbar here, the gain looks to be marginal. I suppose scalars could be a bottleneck if used in a loop, but perhaps that could be fixed by refactoring the loop. |
OK, sounds like we should draw the line here for now then, so closing. I have to admit, at least hacking around the |
@seberg The scalar benchmarks you posted earlier show there are real improvements for scalar operations of about 5% (and somewhat less for small arrays). But 5% is not a lot, so I am fine with closing this for now. |
This PR reduces the overhead introduced in #17582 when the default allocation strategy is used. Approaches to reduce the overhead are discussed in #21488
Fixes #21488