Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve performance of np.save for small arrays #18657

Merged
merged 4 commits into from Mar 24, 2021

Conversation

ohadravid
Copy link
Contributor

@ohadravid ohadravid commented Mar 21, 2021

Improve performance of np.savefor small arrays by removing the call to _filter_header
when writing the header, as it is known to be done in Python 3.


We use np.save to save a lot of small arrays (which are 3D point, so are just 3 ints/floats),
and we found out that np.save is about 20 times slower than pickle.dumps.

Profiling shows that most of the time is spent in _filter_header,
which (as I understand it) is not really needed anymore.

The result is about 10x improvement on my machine:

import numpy as np
from numpy.lib.format import write_array
import io
array = np.array([2, 3, 4])

%%timeit
buffer = io.BytesIO()
write_array(buffer, array, allow_pickle=False)

Before:
140 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

After:
14.4 µs ± 497 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For reference, using pickle.dumps is:
6.27 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Edit: This PR also changes reading an array to only call this function for older versions which might have been created by Python 2 before this function was added in 2014 (which are still used by default),
resulting in a nice improvement as well:

Reading a v1 array (same as today):
141 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Reading a v3 array now:
56 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
37.3 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) (after more little improvements)

@ohadravid ohadravid changed the title ENH: Remove call to _filter_header from _write_array_header. ENH: Improve performance of np.save for small arrays Mar 21, 2021
@seberg
Copy link
Member

seberg commented Mar 21, 2021

Thanks! The documentation of _filter_header already allures to Python 2 support in reading, so I think that is good as is.

Since this seems worth it... there isn't a version number that we can rely on having stripped headers when reading? I am just curious if we could skip this even on reading most of the time. IIRC, we do have different version numbers but only use the new one in the rare event that the old version is not able to write the array correctly? If that is the case, then it seems like its not worth skipping the header cleanup on read.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this line was pointless on Python3. I haven't checked to see if the CI failure is real.

@ohadravid
Copy link
Contributor Author

Hi @seberg 👋
Looking at the release and git history, I think that the header is filtered since 2014, but versions (1,0) and (2,0) existed before and might have included Ls.
Versions (3,0) comes after 1.16 which was the last release supporting Python 2.7, so we could check if version >= (3,0), but
since np.save defaults to saving with the lowest compat this won't help too much I think, but it won't hurt :)

@charris
Copy link
Member

charris commented Mar 21, 2021

This could use a release note. See doc/release/upcoming_changes/ for examples. If there are potential compatibility problems the release note should probably be in the compatibility section, otherwise improvements. Maybe two notes if both :)

@ohadravid ohadravid force-pushed the improve-np-save-perf branch 2 times, most recently from 4cc5340 to 26ec1c7 Compare March 22, 2021 09:56
@charris
Copy link
Member

charris commented Mar 23, 2021

@seberg @eric-wieser Ping.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy, with putting it in. Found something to nitpick, but there always is :).

We could name the improvement.rst also performance.rst, since we actually got a "performance" category in the release notes right now.

@@ -590,7 +590,8 @@ def _read_array_header(fp, version):
# "shape" : tuple of int
# "fortran_order" : bool
# "descr" : dtype.descr
header = _filter_header(header)
if version <= (2, 0):
header = _filter_header(header)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I doubt it is very useful, but its simple enough. If you are enthusiastic, maybe add a comment why we only do this on older "versions", though?
(Would be happy to just skip as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful because it lets me pair a write_array(..., version=(3,0) with read_array(..., version=(3,0)) and get the nice perf boost 😎

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, fair enough :). Thanks @ohadravid!

msg = "Header does not contain the correct keys: {!r}"
raise ValueError(msg.format(keys))
raise ValueError(msg.format(d.keys()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the sorted above looks correct to me. But I think d.keys() now prints slightly less nice? OTOH, this is de-facto an internal error so I am OK with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the sorted bash when generating the error :)

Improve performance of `np.save` by removing the call
when writing the header, as it is known to be done in Python 3.
Improve performance of `np.load` for arrays with version >= (3,0)
by removing the call, as it is known to be done in Python 3.
@ohadravid ohadravid force-pushed the improve-np-save-perf branch 2 times, most recently from 3942cdf to 6fc8d44 Compare March 24, 2021 09:07
@seberg seberg merged commit badbf70 into numpy:main Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants