New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Improve performance of np.save
for small arrays
#18657
Conversation
_filter_header
from _write_array_header
.np.save
for small arrays
Thanks! The documentation of Since this seems worth it... there isn't a version number that we can rely on having stripped headers when reading? I am just curious if we could skip this even on reading most of the time. IIRC, we do have different version numbers but only use the new one in the rare event that the old version is not able to write the array correctly? If that is the case, then it seems like its not worth skipping the header cleanup on read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this line was pointless on Python3. I haven't checked to see if the CI failure is real.
Hi @seberg 👋 |
This could use a release note. See |
4cc5340
to
26ec1c7
Compare
@seberg @eric-wieser Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy, with putting it in. Found something to nitpick, but there always is :).
We could name the improvement.rst
also performance.rst
, since we actually got a "performance" category in the release notes right now.
@@ -590,7 +590,8 @@ def _read_array_header(fp, version): | |||
# "shape" : tuple of int | |||
# "fortran_order" : bool | |||
# "descr" : dtype.descr | |||
header = _filter_header(header) | |||
if version <= (2, 0): | |||
header = _filter_header(header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I doubt it is very useful, but its simple enough. If you are enthusiastic, maybe add a comment why we only do this on older "versions", though?
(Would be happy to just skip as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful because it lets me pair a write_array(..., version=(3,0)
with read_array(..., version=(3,0))
and get the nice perf boost 😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, fair enough :). Thanks @ohadravid!
msg = "Header does not contain the correct keys: {!r}" | ||
raise ValueError(msg.format(keys)) | ||
raise ValueError(msg.format(d.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the sorted
above looks correct to me. But I think d.keys()
now prints slightly less nice? OTOH, this is de-facto an internal error so I am OK with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the sorted
bash when generating the error :)
Improve performance of `np.save` by removing the call when writing the header, as it is known to be done in Python 3.
Improve performance of `np.load` for arrays with version >= (3,0) by removing the call, as it is known to be done in Python 3.
3942cdf
to
6fc8d44
Compare
Improve performance of `np.load`.
Improve performance of
np.save
for small arrays by removing the call to_filter_header
when writing the header, as it is known to be done in Python 3.
We use
np.save
to save a lot of small arrays (which are 3D point, so are just 3 ints/floats),and we found out that
np.save
is about 20 times slower thanpickle.dumps
.Profiling shows that most of the time is spent in
_filter_header
,which (as I understand it) is not really needed anymore.
The result is about 10x improvement on my machine:
Before:
140 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
After:
14.4 µs ± 497 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For reference, using
pickle.dumps
is:6.27 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Edit: This PR also changes reading an array to only call this function for older versions which might have been created by Python 2 before this function was added in 2014 (which are still used by default),
resulting in a nice improvement as well:
Reading a v1 array (same as today):
141 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Reading a v3 array now:
56 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
37.3 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(after more little improvements)