-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
ENH, SIMD: Extend universal intrinsics to support IBMZ #20913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+932
−262
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b66ba82
to
75d77af
Compare
6224f8c
to
50407a6
Compare
873ea68
to
982ba4b
Compare
It covers SIMD operations for all datatypes starting from z/Arch11 a.k.a IBM Z13, except for single-precision which requires minimum z/Arch12 a.k.a IBMZ 14 to be dispatched. This patch rename the branch /simd/vsx to /simd/vec, the new the path is hold the definitions of universal intrinsics for both Power and Z architectures. This patch also adds new preprocessor identifiers: * NPY_SIMD_BIGENDIAN: 1 if the enabled SIMD extension is running on big-endian mode otherwise 0. * NPY_SIMD_F32: 1 if the enabled SIMD extension supports single-precision otherwise 0.
Github doesn't allow big body of the pr desc, the rest as follows: VXexport NPY_DISABLE_CPU_FEATURES="VXE VXE2"
python runtests.py --bench-compare parent/main before after ratio
[982fcd38] [47d54c6d]
<zsystem_sup~5> <zsystem_sup>
+ 35.8±0.2μs 141±0.7μs 3.95 bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 1000))
+ 79.8±0.5μs 264±2μs 3.31 bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 100))
+ 183±0.9μs 268±2μs 1.47 bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 10))
+ 7.37±0.1ms 10.1±0.05ms 1.37 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 1000000)
+ 74.3±0.2μs 101±1μs 1.36 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 10000)
+ 3.28±0.05ms 4.43±0.1ms 1.35 bench_reduce.AddReduceSeparate.time_reduce(1, 'float16')
+ 83.4±0.9μs 102±1μs 1.22 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'h')
+ 9.46±0.2ms 11.4±0.2ms 1.21 bench_reduce.AddReduce.time_axis_1
+ 127±0.6μs 153±3μs 1.20 bench_function_base.Sort.time_argsort('quick', 'int32', ('reversed',))
+ 85.1±1μs 102±0.4μs 1.20 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'h')
+ 83.6±0.9μs 99.9±0.9μs 1.20 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'h')
+ 84.4±0.3μs 101±0.3μs 1.19 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'h')
+ 84.1±1μs 99.9±0.4μs 1.19 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'h')
+ 84.5±0.6μs 100.0±0.6μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'h')
+ 84.1±1μs 99.4±0.3μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'h')
+ 84.7±3μs 100±0.4μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'b')
+ 84.6±1μs 100.0±1μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'h')
+ 560±20ms 661±20ms 1.18 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'wrap')
+ 83.5±1μs 98.6±0.09μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'h')
+ 84.9±1μs 100±0.4μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'h')
+ 85.2±1μs 100±0.3μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'h')
+ 85.5±2μs 101±0.5μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'h')
+ 85.6±1μs 101±0.6μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'h')
+ 84.4±0.2μs 99.3±0.3μs 1.18 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'h')
+ 85.0±0.3μs 99.6±0.6μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'h')
+ 84.9±0.7μs 99.6±0.5μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'b')
+ 85.9±0.7μs 101±0.5μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'h')
+ 85.6±1μs 100±2μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'h')
+ 85.3±1μs 99.6±0.2μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'h')
+ 85.3±0.9μs 99.5±0.7μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'b')
+ 85.6±0.1μs 99.9±0.6μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'h')
+ 84.7±1μs 98.8±0.7μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'b')
+ 85.5±1μs 99.6±0.4μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'h')
+ 86.5±2μs 101±0.4μs 1.17 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'h')
+ 85.0±0.9μs 99.0±0.1μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'h')
+ 85.8±2μs 99.8±0.7μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'h')
+ 86.1±1μs 100±0.6μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'h')
+ 85.3±0.8μs 99.2±0.9μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'b')
+ 86.3±1μs 100±0.4μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'h')
+ 84.6±0.6μs 98.3±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'b')
+ 85.0±0.6μs 98.6±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'b')
+ 86.5±2μs 100±1μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'b')
+ 87.0±1μs 101±0.6μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'h')
+ 84.9±0.3μs 98.5±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'b')
+ 86.4±0.4μs 100±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'h')
+ 87.0±2μs 101±0.5μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'h')
+ 86.1±0.8μs 99.7±0.5μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'h')
+ 86.6±2μs 100±0.6μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'h')
+ 85.7±0.8μs 99.1±1μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'b')
+ 87.3±0.9μs 101±0.5μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'h')
+ 85.3±0.7μs 98.7±0.6μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'b')
+ 85.0±1μs 98.3±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'b')
+ 87.5±0.6μs 101±0.7μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'h')
+ 86.3±0.8μs 99.8±0.3μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'b')
+ 86.6±0.9μs 100±0.6μs 1.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'h')
+ 85.7±2μs 99.0±0.2μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'h')
+ 87.4±1μs 101±0.7μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'h')
+ 85.2±1μs 98.3±0.3μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'b')
+ 87.6±1μs 101±0.4μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'h')
+ 301±4μs 347±0.8μs 1.15 bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 1000))
+ 87.5±1μs 101±0.5μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'h')
+ 86.5±2μs 99.7±0.8μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'h')
+ 87.8±2μs 101±0.9μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'b')
+ 85.7±1μs 98.6±0.3μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'b')
+ 85.7±0.9μs 98.6±0.5μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'b')
+ 87.5±0.7μs 101±0.5μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'h')
+ 86.1±0.6μs 99.0±0.7μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'b')
+ 85.6±0.6μs 98.4±0.3μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'b')
+ 86.3±0.3μs 99.2±0.2μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'b')
+ 87.8±2μs 101±0.9μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'h')
+ 87.5±0.6μs 100±0.3μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'h')
+ 88.6±1μs 102±0.4μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'h')
+ 86.1±2μs 98.8±0.2μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'b')
+ 12.3±0.2μs 14.1±0.2μs 1.15 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'D')
+ 85.8±0.8μs 98.3±0.3μs 1.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'b')
+ 1.15±0.03μs 1.32±0.02μs 1.15 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 100)
+ 86.9±2μs 99.4±0.3μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'h')
+ 88.2±1μs 101±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'h')
+ 88.9±1μs 102±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'h')
+ 86.2±0.4μs 98.6±0.1μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'b')
+ 87.2±0.3μs 99.6±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'b')
+ 87.7±0.2μs 100±0.6μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'h')
+ 86.2±0.4μs 98.3±0.3μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'b')
+ 87.0±2μs 99.0±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'b')
+ 86.5±1μs 98.4±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'b')
+ 89.1±2μs 101±1μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'h')
+ 86.8±1μs 98.6±1μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'b')
+ 86.9±1μs 98.8±0.3μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'b')
+ 87.7±0.4μs 99.7±0.2μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'h')
+ 87.2±1μs 99.0±0.5μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'b')
+ 87.2±0.4μs 99.0±0.2μs 1.14 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'h')
+ 86.6±1μs 98.3±0.2μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'b')
+ 87.5±0.4μs 99.2±1μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'b')
+ 87.2±0.4μs 98.8±0.3μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'b')
+ 87.7±0.3μs 99.4±0.3μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'b')
+ 88.3±1μs 99.9±0.5μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'h')
+ 87.0±0.3μs 98.4±0.3μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'b')
+ 89.0±2μs 101±1μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'b')
+ 86.9±0.8μs 98.2±0.3μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'b')
+ 88.0±0.6μs 99.4±1μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'b')
+ 87.4±2μs 98.6±0.2μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'b')
+ 88.0±0.9μs 99.3±0.6μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'b')
+ 326±1μs 368±2μs 1.13 bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 1000))
+ 87.3±2μs 98.5±0.1μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'b')
+ 87.1±0.4μs 98.2±0.2μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'b')
+ 87.2±0.6μs 98.2±0.4μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'b')
+ 87.7±2μs 98.7±0.1μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'b')
+ 87.4±2μs 98.4±0.2μs 1.13 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'b')
+ 88.8±0.4μs 99.8±0.7μs 1.12 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'h')
+ 88.3±2μs 99.0±0.4μs 1.12 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'b')
+ 88.4±2μs 98.9±0.5μs 1.12 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'b')
+ 89.2±1μs 99.2±0.4μs 1.11 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'h')
+ 120±4μs 133±3μs 1.11 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'L')
+ 89.7±0.6μs 99.5±0.5μs 1.11 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'b')
+ 88.7±2μs 98.3±0.1μs 1.11 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'b')
+ 89.4±2μs 98.8±0.3μs 1.10 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'b')
+ 79.2±0.1μs 87.4±0.2μs 1.10 bench_function_base.Where.time_interleaved_zeros_x8
+ 133±2μs 147±3μs 1.10 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'Q')
+ 13.4±0.2μs 14.7±0.3μs 1.10 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'D')
+ 505±4μs 554±0.6μs 1.10 bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',))
+ 6.90±0.05ms 7.56±0.02ms 1.10 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 1000000)
+ 89.9±2μs 98.5±0.1μs 1.10 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'b')
+ 89.9±3μs 98.4±0.6μs 1.09 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'b')
+ 330±3μs 361±3μs 1.09 bench_function_base.Sort.time_argsort('quick', 'int32', ('sorted_block', 1000))
+ 134±3μs 146±4μs 1.09 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'q')
+ 70.1±0.3μs 76.6±0.6μs 1.09 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 10000)
+ 119±2μs 130±2μs 1.09 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'Q')
+ 409±2μs 443±0.5μs 1.08 bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 100))
+ 371±0.4μs 401±3μs 1.08 bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 100))
+ 2.68±0.03μs 2.89±0.08μs 1.08 bench_core.Core.time_hstack_l
+ 17.2±0.04μs 18.6±0.7μs 1.08 bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int32'>)
+ 86.8±0.3μs 93.1±0.5μs 1.07 bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 100))
+ 123±2μs 131±2μs 1.07 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'L')
+ 13.5±0.1μs 14.4±0.3μs 1.07 bench_ma.MA.time_masked_array_l100_t100
+ 139±0.6μs 148±2μs 1.07 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'Q')
+ 17.8±0.07μs 19.0±0.4μs 1.07 bench_lib.Nan.time_nanmean(200, 0)
+ 5.04±0.06ms 5.37±0.1ms 1.07 bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'wrap')
+ 10.3±0.06μs 11.0±0.3μs 1.07 bench_ma.MA.time_masked_array_l100
+ 19.3±0.06μs 20.6±0.6μs 1.07 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
+ 1.15±0.02μs 1.23±0.03μs 1.07 bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int64')
+ 1.20±0.01ms 1.28±0.05ms 1.06 bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int16'>)
+ 55.1±1ms 58.6±2ms 1.06 bench_ma.Concatenate.time_it('masked', 2000)
+ 128±2ms 136±2ms 1.06 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'constant')
+ 21.0±0.1μs 22.3±0.4μs 1.06 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
+ 638±10μs 677±20μs 1.06 bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'numpy.int16'>)
+ 1.32±0.01μs 1.40±0.05μs 1.06 bench_core.Core.time_ones_100
+ 18.3±0.08μs 19.4±0.4μs 1.06 bench_ma.UFunc.time_2d(True, True, 10)
+ 64.8±1ms 68.7±1ms 1.06 bench_ma.Concatenate.time_it('unmasked+masked', 2000)
+ 108±2μs 115±0.4μs 1.06 bench_function_base.Sort.time_argsort('merge', 'int32', ('sorted_block', 100))
+ 335±3μs 355±5μs 1.06 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'linear_ramp')
+ 27.9±0.09μs 29.5±0.5μs 1.06 bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'constant')
+ 492±1ns 520±8ns 1.06 bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs([1])
+ 9.82±0.04μs 10.4±0.3μs 1.06 bench_lib.Unique.time_unique(200, 90.0)
+ 565±2μs 597±3μs 1.06 bench_function_base.Sort.time_argsort('heap', 'float64', ('reversed',))
+ 110±1μs 116±4μs 1.06 bench_lib.Pad.time_pad((256, 128, 1), 1, 'reflect')
+ 60.7±0.6ms 64.1±2ms 1.06 bench_ma.Concatenate.time_it('ndarray+masked', 2000)
+ 4.27±0.01μs 4.50±0.1μs 1.06 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'numpy.int64'>)
+ 20.1±0.1μs 21.2±0.6μs 1.06 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
+ 14.6±0.2μs 15.4±0.04μs 1.06 bench_ma.UFunc.time_scalar_1d(False, False, 100)
+ 14.8±0.2μs 15.6±0.2μs 1.06 bench_ma.UFunc.time_scalar_1d(False, False, 1000)
+ 10.8±0.02μs 11.4±0.07μs 1.06 bench_lib.Unique.time_unique(200, 0)
+ 837±20ns 883±20ns 1.05 bench_io.Copy.time_memcpy('int8')
+ 110±0.4μs 116±1μs 1.05 bench_function_base.Sort.time_argsort('merge', 'float32', ('sorted_block', 100))
+ 21.2±0.2μs 22.4±0.8μs 1.05 bench_ma.UFunc.time_scalar_1d(False, True, 100)
+ 14.5±0.2μs 15.3±0.1μs 1.05 bench_ma.UFunc.time_scalar_1d(False, False, 10)
+ 2.18±0.01ms 2.30±0.04ms 1.05 bench_indexing.IndexingSeparate.time_mmap_fancy_indexing
+ 673±10ns 709±7ns 1.05 bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'numpy.int8'>)
+ 532±2μs 559±2μs 1.05 bench_function_base.Sort.time_argsort('heap', 'float32', ('ordered',))
+ 2.85±0.01μs 3.00±0.02μs 1.05 bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'str'>)
+ 1.52±0μs 1.60±0.05μs 1.05 bench_reduce.AnyAll.time_all_fast
+ 1.21±0.02ms 1.27±0.05ms 1.05 bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int64'>)
+ 56.2±0.2μs 59.1±0.9μs 1.05 bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 1000))
- 90.6±1μs 86.2±0.3μs 0.95 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'H')
- 344±5μs 327±1μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 1, 'd')
- 342±0.9μs 325±2μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 2, 'd')
- 1.41±0.01ms 1.34±0.02ms 0.95 bench_lib.Nan.time_nanvar(200000, 0)
- 215±2μs 204±2μs 0.95 bench_function_base.Sort.time_sort('quick', 'float32', ('reversed',))
- 1.07±0.02ms 1.02±0.01ms 0.95 bench_reduce.AddReduceSeparate.time_reduce(0, 'longfloat')
- 718±7μs 682±5μs 0.95 bench_indexing.Indexing.time_op('indexes_rand_', 'np.ix_(I, I)', '=1')
- 13.2±0.4ms 12.5±0.07ms 0.95 bench_linalg.Linalg.time_op('svd', 'complex128')
- 71.4±1μs 67.7±0.9μs 0.95 bench_function_base.Sort.time_sort('quick', 'int32', ('ordered',))
- 88.2±0.5μs 83.6±0.8μs 0.95 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'H')
- 273±6μs 259±6μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'd')
- 62.2±0.9μs 59.0±0.3μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 2, 'd')
- 74.8±0.4μs 70.8±0.1μs 0.95 bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 10000)
- 346±4μs 328±1μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 1, 'd')
- 92.7±0.7μs 87.8±1μs 0.95 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'H')
- 347±5μs 328±3μs 0.95 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 2, 'd')
- 88.3±1μs 83.6±0.4μs 0.95 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'H')
- 1.41±0.01ms 1.34±0.01ms 0.95 bench_lib.Nan.time_nanvar(200000, 0.1)
- 151±0.8μs 143±0.8μs 0.94 bench_function_base.Sort.time_argsort('quick', 'int16', ('reversed',))
- 246±2ms 232±4ms 0.94 bench_app.LaplaceInplace.time_it('normal')
- 89.0±1μs 83.8±0.6μs 0.94 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'H')
- 347±3μs 327±1μs 0.94 bench_ufunc.UFunc.time_ufunc_types('square')
- 1.54±0.01ms 1.45±0.01ms 0.94 bench_lib.Nan.time_nanargmin(200000, 50.0)
- 91.1±0.9μs 85.5±0.8μs 0.94 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'H')
- 90.3±1μs 84.7±0.9μs 0.94 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'H')
- 430±1μs 404±2μs 0.94 bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 100))
- 526±0.5μs 493±4μs 0.94 bench_function_base.Sort.time_sort('heap', 'float64', ('reversed',))
- 478±7μs 448±3μs 0.94 bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 10))
- 92.8±2μs 87.0±1μs 0.94 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'H')
- 63.4±2μs 59.5±0.1μs 0.94 bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 2, 1, 'f')
- 2.31±0.01ms 2.16±0.02ms 0.94 bench_lib.Nan.time_nanvar(200000, 90.0)
- 79.5±1μs 74.4±1μs 0.94 bench_function_base.Sort.time_argsort('quick', 'int32', ('ordered',))
- 272±0.7μs 255±3μs 0.94 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'd')
- 2.33±0.01ms 2.18±0.02ms 0.94 bench_lib.Nan.time_nanstd(200000, 90.0)
- 87.9±0.6μs 82.3±0.5μs 0.94 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'H')
- 273±2μs 255±2μs 0.94 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'd')
- 16.3±0.08μs 15.3±1μs 0.93 bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int64'>)
- 1.43±0.02ms 1.33±0.03ms 0.93 bench_lib.Pad.time_pad((1024, 1024), 1, 'mean')
- 89.1±1μs 83.1±0.5μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'H')
- 275±1μs 256±3μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'd')
- 725±3μs 675±3μs 0.93 bench_indexing.Indexing.time_op('indexes_', 'np.ix_(I, I)', '=1')
- 91.8±1μs 85.4±0.6μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'H')
- 91.0±2μs 84.6±0.5μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'H')
- 92.0±2μs 85.6±0.4μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'H')
- 88.0±0.8μs 81.8±0.3μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'H')
- 93.4±2μs 86.7±1μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'H')
- 91.9±0.6μs 85.3±1μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'H')
- 89.9±1μs 83.4±0.6μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'H')
- 92.0±1μs 85.3±0.5μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'H')
- 91.1±0.5μs 84.5±0.6μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'H')
- 276±4μs 256±4μs 0.93 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'd')
- 88.6±0.7μs 82.1±0.5μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'H')
- 92.5±0.8μs 85.6±0.9μs 0.93 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'H')
- 93.4±1μs 86.3±0.5μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'H')
- 90.8±2μs 83.9±0.6μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'H')
- 95.0±0.3μs 87.8±0.4μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'H')
- 89.6±1μs 82.7±0.2μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'H')
- 91.5±0.5μs 84.5±0.2μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'H')
- 89.4±1μs 82.4±0.3μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'H')
- 89.9±0.8μs 82.9±0.4μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'H')
- 92.4±0.9μs 85.1±0.4μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'H')
- 420±3μs 386±2μs 0.92 bench_ufunc.UFunc.time_ufunc_types('multiply')
- 91.9±2μs 84.6±1μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'H')
- 93.8±1μs 86.2±0.7μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'H')
- 1.38±0.04ms 1.27±0.03ms 0.92 bench_lib.Pad.time_pad((1024, 1024), 8, 'mean')
- 91.8±0.8μs 84.3±0.3μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'H')
- 91.5±1μs 83.9±0.7μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'H')
- 95.3±0.9μs 87.4±0.6μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'H')
- 265±3μs 243±4μs 0.92 bench_ufunc.UFunc.time_ufunc_types('minimum')
- 94.0±2μs 86.1±0.7μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'H')
- 93.6±1μs 85.8±0.5μs 0.92 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'H')
- 90.8±0.8μs 83.1±0.4μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'H')
- 91.5±2μs 83.6±0.3μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'H')
- 557±3μs 509±1μs 0.91 bench_function_base.Sort.time_sort('heap', 'float32', ('reversed',))
- 91.2±0.7μs 83.3±0.6μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'H')
- 91.4±1μs 83.4±0.3μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'H')
- 277±1μs 253±0.5μs 0.91 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'd')
- 6.03±0.07μs 5.49±0.02μs 0.91 bench_itemselection.PutMask.time_dense(False, 'complex256')
- 91.0±1μs 83.0±0.3μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'H')
- 277±1μs 253±2μs 0.91 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'd')
- 92.3±1μs 84.1±0.3μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'H')
- 94.2±0.8μs 85.6±0.8μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'H')
- 93.0±0.2μs 84.5±0.4μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'H')
- 87.4±0.9μs 79.4±0.5μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'q')
- 91.4±0.9μs 82.9±0.3μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'H')
- 89.9±0.5μs 81.4±0.2μs 0.91 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'H')
- 503±3μs 455±2μs 0.91 bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 100))
- 93.1±1μs 84.1±0.4μs 0.90 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'H')
- 92.9±2μs 84.0±0.5μs 0.90 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'H')
- 269±4μs 243±2μs 0.90 bench_ufunc.UFunc.time_ufunc_types('maximum')
- 315±3μs 284±3μs 0.90 bench_ufunc.UFunc.time_ufunc_types('subtract')
- 77.9±1μs 70.0±0.3μs 0.90 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'I')
- 362±4μs 324±2μs 0.90 bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 1000))
- 85.5±2μs 76.5±0.7μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'i')
- 78.6±2μs 70.2±0.3μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'i')
- 76.6±1μs 68.4±0.4μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'I')
- 107±6μs 95.4±1μs 0.89 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'd')
- 78.9±1μs 70.4±0.6μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'I')
- 76.6±0.9μs 68.3±0.1μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'i')
- 108±4μs 96.5±3μs 0.89 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
- 83.4±1μs 74.3±1μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'Q')
- 78.2±1μs 69.6±0.3μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'i')
- 79.6±2μs 70.8±0.9μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'i')
- 82.5±1μs 73.4±1μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'i')
- 82.0±1μs 72.9±0.5μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'i')
- 938±6μs 833±10μs 0.89 bench_lib.Nan.time_nanargmax(200000, 90.0)
- 413±1μs 367±3μs 0.89 bench_ufunc.UFunc.time_ufunc_types('rint')
- 79.3±1μs 70.4±0.9μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'I')
- 110±3μs 97.9±2μs 0.89 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'd')
- 391±5μs 348±2μs 0.89 bench_ufunc.UFunc.time_ufunc_types('fmin')
- 77.3±0.9μs 68.7±0.2μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'i')
- 78.5±2μs 69.7±0.3μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'i')
- 89.7±0.9μs 79.6±0.4μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'L')
- 88.9±2μs 78.9±0.5μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'Q')
- 77.7±0.6μs 68.9±0.5μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'I')
- 78.3±2μs 69.3±0.4μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'I')
- 89.4±0.4μs 79.2±2μs 0.89 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'q')
- 77.7±0.6μs 68.6±0.5μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'i')
- 88.8±2μs 78.5±2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'I')
- 82.8±0.6μs 73.2±0.6μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'i')
- 81.7±3μs 72.1±0.6μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'i')
- 6.63±0s 5.85±0s 0.88 bench_ufunc_strides.LogisticRegression.time_train(<class 'numpy.float32'>)
- 4.01±0.1ms 3.53±0.07ms 0.88 bench_core.VarComplex.time_var(1000000)
- 69.6±1ms 61.4±0.9ms 0.88 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>)
- 78.7±1μs 69.3±0.5μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'i')
- 89.1±1μs 78.5±1μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'Q')
- 77.1±2μs 67.9±0.2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'I')
- 83.2±1μs 73.2±0.9μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'Q')
- 78.5±1μs 69.0±0.2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'i')
- 88.4±1μs 77.7±0.4μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'l')
- 88.8±1μs 78.0±0.7μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'l')
- 81.8±0.9μs 71.8±0.7μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'I')
- 82.0±1μs 72.0±0.3μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'i')
- 81.5±2μs 71.6±0.6μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'I')
- 90.4±2μs 79.3±1μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'I')
- 82.3±1μs 72.2±1μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'I')
- 79.0±1μs 69.3±0.3μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'I')
- 78.0±1μs 68.4±0.2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'I')
- 77.6±1μs 68.1±0.2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'i')
- 89.2±2μs 78.2±0.6μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'l')
- 82.1±1μs 71.9±1μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'i')
- 79.7±1μs 69.9±0.3μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'I')
- 77.7±1μs 68.2±0.3μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'i')
- 81.5±0.2μs 71.4±0.9μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'i')
- 90.3±1μs 79.0±2μs 0.88 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'L')
- 473±3μs 414±2μs 0.88 bench_lib.Nan.time_nanargmax(200000, 0.1)
- 61.4±0.9μs 53.8±0.6μs 0.88 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 1, 'd')
- 78.1±0.2μs 68.3±0.2μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'i')
- 150±1μs 131±3μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd')
- 82.9±2μs 72.5±0.9μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'i')
- 81.5±0.6μs 71.3±0.9μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'I')
- 639±2μs 559±7μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'f')
- 79.7±2μs 69.7±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'i')
- 84.4±2μs 73.8±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'i')
- 473±3μs 413±2μs 0.87 bench_lib.Nan.time_nanargmax(200000, 0)
- 637±3μs 556±2μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'f')
- 82.9±1μs 72.4±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'i')
- 523±1μs 457±3μs 0.87 bench_lib.Nan.time_nanargmax(200000, 2.0)
- 89.5±0.8μs 78.1±0.8μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'Q')
- 89.1±0.8μs 77.8±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'q')
- 80.8±0.7μs 70.5±0.6μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'i')
- 62.7±0.9μs 54.7±0.7μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 2, 'd')
- 82.0±0.6μs 71.5±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'I')
- 636±1μs 555±4μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'f')
- 82.5±1μs 71.9±0.3μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'I')
- 86.1±2μs 75.1±0.8μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'I')
- 638±1μs 556±5μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'f')
- 78.0±2μs 67.9±0.3μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'I')
- 639±2μs 557±3μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'f')
- 108±1ms 93.9±3ms 0.87 bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'str'>)
- 1.49±0.01μs 1.29±0.01μs 0.87 bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int16')
- 641±2μs 558±5μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'f')
- 84.9±2μs 73.9±0.3μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'q')
- 90.8±2μs 79.0±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'i')
- 527±5μs 458±9μs 0.87 bench_lib.Nan.time_nanargmin(200000, 2.0)
- 85.5±1μs 74.4±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'I')
- 90.0±1μs 78.3±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'i')
- 90.3±2μs 78.6±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'I')
- 83.1±0.9μs 72.3±0.2μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'q')
- 641±1μs 557±7μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'f')
- 82.6±0.9μs 71.8±0.6μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'i')
- 642±3μs 557±7μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'f')
- 110±3μs 95.6±1μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'd')
- 78.8±0.6μs 68.4±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'i')
- 639±0.6μs 554±3μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'f')
- 1.49±0.01μs 1.30±0μs 0.87 bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float16')
- 90.3±2μs 78.4±0.4μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'i')
- 82.9±2μs 71.9±0.8μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'I')
- 112±2μs 97.2±2μs 0.87 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd')
- 640±3μs 555±3μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'f')
- 636±2μs 552±1μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'f')
- 642±3μs 557±3μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'f')
- 85.6±1μs 74.3±0.8μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'i')
- 641±6μs 555±1μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'f')
- 640±3μs 555±1μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'f')
- 642±2μs 557±5μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'f')
- 641±1μs 556±2μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'f')
- 490±2μs 425±1μs 0.87 bench_function_base.Sort.time_sort('heap', 'float64', ('ordered',))
- 89.9±1μs 77.9±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'L')
- 84.1±1μs 72.8±0.9μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'I')
- 82.2±0.9μs 71.2±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'i')
- 639±3μs 553±2μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'f')
- 638±3μs 552±2μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'f')
- 82.6±2μs 71.5±0.5μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'I')
- 86.7±1μs 75.0±0.5μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'I')
- 81.1±0.4μs 70.2±0.7μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'I')
- 638±1μs 552±0.6μs 0.87 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'f')
- 90.0±2μs 77.9±1μs 0.87 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'i')
- 642±2μs 555±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'f')
- 638±1μs 551±0.8μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'f')
- 88.7±1μs 76.7±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'I')
- 643±5μs 556±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'f')
- 83.8±1μs 72.5±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'I')
- 90.5±0.9μs 78.2±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'L')
- 644±2μs 557±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'f')
- 88.1±0.5μs 76.1±0.9μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'i')
- 642±2μs 555±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'f')
- 659±7μs 570±7μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'f')
- 641±2μs 554±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'f')
- 81.5±0.9μs 70.4±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'I')
- 84.3±0.9μs 72.8±0.6μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'i')
- 89.4±1μs 77.2±0.9μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'q')
- 652±8μs 563±5μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'f')
- 643±4μs 556±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'f')
- 82.8±2μs 71.5±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'i')
- 639±2μs 551±0.6μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'f')
- 80.8±1μs 69.8±0.6μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'i')
- 642±3μs 554±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'f')
- 83.6±0.7μs 72.2±0.9μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'L')
- 81.5±2μs 70.3±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'I')
- 639±3μs 552±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'f')
- 81.0±2μs 69.8±0.2μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'i')
- 79.2±2μs 68.3±0.3μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'I')
- 643±2μs 554±0.9μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'f')
- 83.3±1μs 71.8±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'i')
- 641±2μs 553±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'f')
- 640±2μs 552±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'f')
- 83.4±2μs 71.9±0.7μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'I')
- 639±2μs 551±0.5μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'f')
- 640±0.7μs 552±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'f')
- 83.8±1μs 72.3±0.7μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'Q')
- 84.9±2μs 73.2±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'L')
- 642±2μs 553±1μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'f')
- 641±3μs 552±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'f')
- 643±2μs 554±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'f')
- 641±3μs 552±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'f')
- 84.3±2μs 72.7±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'l')
- 645±4μs 555±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'f')
- 86.9±0.7μs 74.8±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'I')
- 89.7±1μs 77.3±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'l')
- 2.47±0.02μs 2.13±0.02μs 0.86 bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float16')
- 642±1μs 553±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'f')
- 641±5μs 552±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'f')
- 642±1μs 553±3μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'f')
- 90.6±1μs 77.9±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'Q')
- 82.6±1μs 71.0±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'i')
- 645±2μs 555±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'f')
- 83.5±0.8μs 71.8±0.6μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'l')
- 391±2μs 336±4μs 0.86 bench_ufunc.UFunc.time_ufunc_types('fmax')
- 83.4±1μs 71.7±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'L')
- 89.3±2μs 76.8±0.8μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'Q')
- 86.9±0.8μs 74.7±0.8μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'i')
- 85.3±1μs 73.3±0.8μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'l')
- 87.0±0.9μs 74.7±0.2μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'I')
- 108±2ms 93.0±2ms 0.86 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'str'>)
- 85.5±1μs 73.4±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'i')
- 645±6μs 553±1μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'f')
- 83.7±1μs 71.9±1μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'I')
- 644±3μs 553±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'f')
- 643±2μs 551±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'f')
- 91.5±2μs 78.4±0.7μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'i')
- 87.6±1μs 75.1±0.7μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'i')
- 84.7±1μs 72.6±0.5μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'q')
- 644±3μs 552±2μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'f')
- 69.4±2ms 59.5±1ms 0.86 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>)
- 482±6μs 413±6μs 0.86 bench_lib.Nan.time_nanargmin(200000, 0)
- 79.2±0.5μs 67.8±0.4μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'I')
- 645±5μs 552±1μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'f')
- 90.0±1μs 77.0±0.8μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'L')
- 649±4μs 556±4μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'f')
- 647±2μs 554±5μs 0.86 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'f')
- 521±3μs 446±1μs 0.86 bench_function_base.Sort.time_sort('heap', 'float32', ('ordered',))
- 79.7±0.4μs 68.2±0.2μs 0.86 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'I')
- 648±7μs 554±2μs 0.85 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'f')
- 85.0±2μs 72.7±0.8μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'l')
- 91.5±0.7μs 78.2±0.6μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'q')
- 86.0±1μs 73.4±1μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'Q')
- 149±4μs 127±3μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd')
- 84.9±1μs 72.4±0.6μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'L')
- 83.5±0.8μs 71.3±0.7μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'i')
- 86.5±0.8μs 73.8±1μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'Q')
- 85.3±2μs 72.8±0.3μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'L')
- 84.7±1μs 72.3±0.4μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'I')
- 83.0±0.9μs 70.7±0.4μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'I')
- 2.47±0.02μs 2.10±0.02μs 0.85 bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int16')
- 58.6±0.6μs 50.0±0.6μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd')
- 87.3±1μs 74.4±0.9μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'i')
- 109±3μs 92.6±4μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'd')
- 91.4±1μs 77.8±2μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'I')
- 653±5μs 556±3μs 0.85 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'f')
- 83.9±0.8μs 71.4±0.9μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'I')
- 57.8±0.6μs 49.2±0.5μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd')
- 84.7±0.8μs 72.0±0.6μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'i')
- 82.4±2μs 70.1±0.2μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'I')
- 82.7±0.8μs 70.3±0.5μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'i')
- 260±1μs 221±2μs 0.85 bench_ufunc.UFunc.time_ufunc_types('floor')
- 86.2±2μs 73.3±0.9μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'q')
- 84.4±2μs 71.7±0.4μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'I')
- 82.7±2μs 70.2±0.4μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'i')
- 87.8±1μs 74.5±0.8μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'I')
- 85.6±1μs 72.7±0.2μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'I')
- 151±2μs 128±3μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'd')
- 5.24±0.01μs 4.44±0.04μs 0.85 bench_lib.Nan.time_nanmin(200, 2.0)
- 91.8±1μs 77.8±0.8μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'L')
- 92.1±2μs 78.0±0.8μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'I')
- 84.1±0.8μs 71.1±0.3μs 0.85 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'I')
- 258±3μs 218±5μs 0.85 bench_ufunc.UFunc.time_ufunc_types('trunc')
- 327±4μs 277±10μs 0.85 bench_ufunc.UFunc.time_ufunc_types('add')
- 78.1±2μs 66.0±3μs 0.85 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'd')
- 90.4±1μs 76.3±0.7μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'Q')
- 91.4±0.8μs 77.1±1μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'l')
- 83.3±1μs 70.2±0.5μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'I')
- 85.3±0.3μs 71.9±0.8μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'q')
- 91.5±3μs 77.1±1μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'l')
- 483±10μs 407±5μs 0.84 bench_lib.Nan.time_nanargmin(200000, 0.1)
- 94.9±1μs 79.9±0.3μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'B')
- 94.3±0.6μs 79.3±0.5μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'B')
- 85.0±0.9μs 71.5±0.8μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'q')
- 84.3±0.7μs 70.9±0.2μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'I')
- 5.28±0.08μs 4.43±0.07μs 0.84 bench_lib.Nan.time_nanmin(200, 0.1)
- 85.8±0.5μs 72.1±0.4μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'L')
- 85.3±0.3μs 71.6±0.3μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'Q')
- 94.5±0.9μs 79.3±0.5μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'B')
- 993±60μs 833±9μs 0.84 bench_lib.Nan.time_nanargmin(200000, 90.0)
- 60.4±1μs 50.7±0.5μs 0.84 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'd')
- 91.5±2μs 76.8±0.7μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'q')
- 18.9±0.3ms 15.9±0.9ms 0.84 bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'str'>)
- 95.3±0.6μs 79.9±0.9μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'B')
- 1.24±0.02ms 1.04±0.06ms 0.84 bench_core.Temporaries.time_large2
- 94.5±0.5μs 79.1±0.4μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'B')
- 93.4±0.6μs 78.3±0.5μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'B')
- 59.3±1μs 49.7±0.5μs 0.84 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'd')
- 95.9±0.4μs 80.1±0.7μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'B')
- 95.3±0.9μs 79.7±0.8μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'B')
- 95.1±1μs 79.5±0.5μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'B')
- 259±2μs 217±3μs 0.84 bench_ufunc.UFunc.time_ufunc_types('ceil')
- 93.6±0.5μs 78.2±0.8μs 0.84 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'B')
- 113±2μs 94.8±3μs 0.84 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'd')
- 85.7±1μs 71.6±0.5μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'l')
- 93.6±1μs 78.0±0.9μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'B')
- 113±3μs 94.5±2μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'd')
- 94.0±1μs 78.3±0.4μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'B')
- 93.0±0.1μs 77.4±0.6μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'B')
- 5.32±0.01μs 4.43±0.04μs 0.83 bench_lib.Nan.time_nanmax(200, 0)
- 94.4±0.3μs 78.6±0.3μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'B')
- 5.31±0.07μs 4.42±0.04μs 0.83 bench_lib.Nan.time_nanmin(200, 0)
- 93.2±0.5μs 77.5±2μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'B')
- 76.7±2μs 63.7±2μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'd')
- 153±3μs 127±3μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd')
- 92.7±0.2μs 77.0±0.6μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'B')
- 95.1±0.8μs 79.0±0.5μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'B')
- 94.3±0.9μs 78.3±0.3μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'B')
- 94.4±0.09μs 78.4±0.6μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'B')
- 95.6±0.9μs 79.4±1μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'B')
- 94.6±1μs 78.5±0.2μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'B')
- 75.5±2μs 62.7±2μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'd')
- 94.6±1μs 78.4±0.5μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'B')
- 61.4±0.7μs 50.9±0.4μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 1, 'd')
- 86.9±0.9μs 72.0±0.4μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'l')
- 5.34±0.04μs 4.42±0.08μs 0.83 bench_lib.Nan.time_nanmax(200, 0.1)
- 95.1±1μs 78.8±0.5μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'B')
- 50.1±1μs 41.5±0.8μs 0.83 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd')
- 94.3±0.8μs 78.0±0.2μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'B')
- 94.7±0.8μs 78.2±0.3μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'B')
- 94.3±0.7μs 77.8±0.3μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'B')
- 94.6±1μs 78.1±0.4μs 0.83 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'B')
- 94.8±0.2μs 78.1±0.1μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'B')
- 94.6±0.4μs 77.8±0.6μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'B')
- 95.6±1μs 78.6±0.5μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'B')
- 92.7±0.3μs 76.2±0.8μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'B')
- 93.3±0.4μs 76.7±0.8μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'B')
- 92.7±0.4μs 76.2±0.4μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'B')
- 93.1±0.3μs 76.5±0.5μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'B')
- 93.5±0.5μs 76.8±0.4μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'B')
- 154±4μs 126±1μs 0.82 bench_function_base.Sort.time_argsort('quick', 'uint32', ('reversed',))
- 17.5±0.5μs 14.3±0.3μs 0.82 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'D')
- 1.23±0.03ms 1.01±0.04ms 0.82 bench_core.Temporaries.time_large
- 95.5±0.6μs 78.1±0.5μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'B')
- 51.8±0.4μs 42.3±1μs 0.82 bench_core.VarComplex.time_var(10000)
- 79.6±2μs 65.0±3μs 0.82 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'd')
- 94.0±0.8μs 76.7±0.6μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'B')
- 94.6±1μs 77.2±0.4μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'B')
- 192±2μs 157±0.3μs 0.82 bench_reduce.ArgMin.time_argmin(<class 'numpy.float32'>)
- 94.1±1μs 76.7±0.5μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'B')
- 5.40±0.03μs 4.41±0.01μs 0.82 bench_lib.Nan.time_nanmin(200, 90.0)
- 92.9±0.2μs 75.7±0.5μs 0.82 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'B')
- 5.43±0.03μs 4.43±0.07μs 0.81 bench_lib.Nan.time_nanmax(200, 2.0)
- 94.1±0.2μs 76.6±0.2μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'B')
- 92.8±0.3μs 75.5±0.4μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'B')
- 93.7±0.6μs 76.0±3μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'B')
- 196±3μs 159±0.8μs 0.81 bench_reduce.ArgMax.time_argmax(<class 'numpy.float32'>)
- 121±0.4ms 98.0±2ms 0.81 bench_app.LaplaceInplace.time_it('inplace')
- 95.3±0.9μs 77.1±0.3μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'B')
- 94.2±0.8μs 76.2±0.3μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'B')
- 93.8±0.4μs 75.8±0.6μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'B')
- 94.8±1μs 76.5±0.4μs 0.81 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'B')
- 5.43±0.02μs 4.37±0.02μs 0.81 bench_lib.Nan.time_nanmax(200, 90.0)
- 76.2±2μs 61.3±2μs 0.80 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'd')
- 119±4μs 95.4±2μs 0.80 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'd')
- 94.0±0.8μs 75.6±0.5μs 0.80 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'B')
- 120±2μs 96.3±3μs 0.80 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'd')
- 95.9±0.7μs 77.0±0.4μs 0.80 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'B')
- 95.7±2μs 76.6±0.2μs 0.80 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'B')
- 94.5±1μs 75.7±0.4μs 0.80 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'B')
- 95.3±0.6μs 76.1±0.3μs 0.80 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'B')
- 51.2±0.3μs 40.8±0.4μs 0.80 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'd')
- 50.9±2μs 40.6±0.4μs 0.80 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd')
- 52.0±1μs 41.3±0.6μs 0.79 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'd')
- 488±5μs 387±3μs 0.79 bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 1000))
- 371±6μs 291±5μs 0.79 bench_core.VarComplex.time_var(100000)
- 8.18±0.4ms 6.39±0.3ms 0.78 bench_ufunc.Broadcast.time_broadcast
- 42.6±0.1μs 32.5±0.3μs 0.76 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'd')
- 42.7±0.5μs 32.4±0.2μs 0.76 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd')
- 168±3μs 127±4μs 0.76 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'd')
- 4.95±0.03ms 3.71±0.06ms 0.75 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f')
- 4.88±0.02ms 3.63±0.01ms 0.74 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f')
- 5.90±0.04μs 4.38±0.03μs 0.74 bench_lib.Nan.time_nanmin(200, 50.0)
- 4.87±0.01ms 3.61±0ms 0.74 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f')
- 4.89±0.03ms 3.63±0.01ms 0.74 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f')
- 67.3±2μs 49.7±0.9μs 0.74 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd')
- 43.4±0.5μs 32.0±0.3μs 0.74 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd')
- 6.05±0.05μs 4.45±0.06μs 0.74 bench_lib.Nan.time_nanmax(200, 50.0)
- 43.5±1μs 32.0±0.2μs 0.73 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'd')
- 4.95±0.06ms 3.63±0.02ms 0.73 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f')
- 4.94±0.06ms 3.62±0.01ms 0.73 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f')
- 4.95±0.02ms 3.62±0.02ms 0.73 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f')
- 4.94±0.04ms 3.61±0.01ms 0.73 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f')
- 18.7±0.6μs 13.5±0.06μs 0.72 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'D')
- 5.04±0.03ms 3.63±0.05ms 0.72 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f')
- 11.3±0.07μs 8.02±0.05μs 0.71 bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float64'>)
- 133±2μs 94.0±1μs 0.71 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'd')
- 14.8±0.3μs 10.4±0.3μs 0.70 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'D')
- 58.7±0.2μs 40.7±0.7μs 0.69 bench_core.Temporaries.time_mid2
- 59.0±0.5μs 40.7±0.9μs 0.69 bench_core.Temporaries.time_mid
- 14.8±0.3μs 10.2±0.09μs 0.69 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'D')
- 540±2μs 371±1μs 0.69 bench_reduce.AddReduceSeparate.time_reduce(0, 'float64')
- 140±4μs 96.0±3μs 0.69 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'd')
- 92.0±3μs 63.1±3μs 0.69 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'd')
- 187±4μs 128±4μs 0.68 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'd')
- 21.0±0.3μs 14.4±0.2μs 0.68 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'F')
- 21.3±0.5μs 14.4±0.3μs 0.68 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'F')
- 12.9±0.2μs 8.68±0.2μs 0.67 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'D')
- 64.6±0.5μs 43.2±1μs 0.67 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'd')
- 81.7±0.5μs 54.6±0.6μs 0.67 bench_ufunc.CustomInplace.time_double_add_temp
- 65.4±1μs 43.1±0.9μs 0.66 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'd')
- 65.7±0.6μs 43.3±0.8μs 0.66 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'd')
- 1.23±0.05ms 803±10μs 0.65 bench_reduce.AddReduceSeparate.time_reduce(0, 'complex128')
- 73.7±0.4μs 47.6±0.2μs 0.65 bench_ufunc.CustomInplace.time_double_add
- 40.9±0.4μs 26.2±0.3μs 0.64 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'd')
- 64.4±1μs 40.9±0.5μs 0.63 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'd')
- 20.9±0.3μs 13.2±0.08μs 0.63 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F')
- 41.2±0.9μs 26.0±0.1μs 0.63 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'd')
- 68.9±2μs 43.2±0.7μs 0.63 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'd')
- 41.3±0.6μs 25.8±0.4μs 0.63 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'd')
- 69.3±2μs 43.3±0.8μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'd')
- 41.2±0.7μs 25.7±0.2μs 0.62 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'd')
- 20.9±0.4μs 12.8±0.8μs 0.61 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'F')
- 21.1±0.3μs 12.9±0.2μs 0.61 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'F')
- 21.0±0.4μs 12.6±0.4μs 0.60 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'F')
- 81.6±0.6μs 49.0±0.5μs 0.60 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'd')
- 13.6±0.4μs 7.80±0.3μs 0.57 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'D')
- 158±0.8μs 88.8±1μs 0.56 bench_reduce.ArgMax.time_argmax(<class 'numpy.float64'>)
- 157±1μs 88.1±0.3μs 0.56 bench_reduce.ArgMin.time_argmin(<class 'numpy.float64'>)
- 1.96±0.01ms 1.08±0ms 0.55 bench_reduce.AddReduceSeparate.time_reduce(0, 'complex64')
- 66.5±0.3μs 33.8±0.05μs 0.51 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 43)
- 66.7±0.09μs 33.9±0.1μs 0.51 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -43)
- 82.3±2μs 41.7±0.6μs 0.51 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'd')
- 71.7±0.4μs 35.0±0.1μs 0.49 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 8)
- 71.6±0.3μs 34.4±0.06μs 0.48 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -8)
- 66.2±1μs 31.4±0.1μs 0.47 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd')
- 88.1±3μs 41.3±1μs 0.47 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd')
- 78.7±0.7μs 35.7±0.5μs 0.45 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'Q')
- 79.1±1μs 35.8±0.7μs 0.45 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'l')
- 79.8±1μs 35.8±0.5μs 0.45 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'L')
- 79.4±1μs 35.5±0.5μs 0.45 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'q')
- 434±1μs 194±4μs 0.45 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 4, 'd')
- 80.5±1μs 35.6±0.5μs 0.44 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'L')
- 12.6±0.2μs 5.57±0.02μs 0.44 bench_reduce.MinMax.time_min(<class 'numpy.int64'> (1))
- 443±2μs 196±3μs 0.44 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 4, 'd')
- 12.6±0.2μs 5.57±0.01μs 0.44 bench_reduce.MinMax.time_min(<class 'numpy.uint64'>)
- 13.2±0.3μs 5.82±0.02μs 0.44 bench_reduce.MinMax.time_max(<class 'numpy.uint64'>)
- 81.0±0.5μs 35.6±0.6μs 0.44 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'Q')
- 81.5±0.6μs 35.7±0.4μs 0.44 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'l')
- 81.4±1μs 35.6±0.8μs 0.44 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'q')
- 12.8±0.2μs 5.54±0.02μs 0.43 bench_reduce.MinMax.time_min(<class 'numpy.int64'> (0))
- 12.9±0.3μs 5.61±0.05μs 0.43 bench_reduce.MinMax.time_max(<class 'numpy.int64'> (1))
- 13.2±0.3μs 5.66±0.02μs 0.43 bench_reduce.MinMax.time_max(<class 'numpy.int64'> (0))
- 505±1μs 216±1μs 0.43 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f')
- 506±3μs 213±3μs 0.42 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'f')
- 61.7±0.9μs 25.9±0.1μs 0.42 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'd')
- 505±1μs 209±1μs 0.41 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'f')
- 432±2μs 177±10μs 0.41 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 4, 'd')
- 520±9μs 209±5μs 0.40 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 4, 'f')
- 508±6μs 200±4μs 0.39 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'f')
- 436±2μs 170±3μs 0.39 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 2, 'd')
- 503±1μs 195±2μs 0.39 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'f')
- 438±4μs 170±5μs 0.39 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 2, 'd')
- 36.4±0.2μs 14.1±0.1μs 0.39 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 4, 'F')
- 432±3μs 167±3μs 0.39 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 4, 'd')
- 432±3μs 167±3μs 0.39 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 4, 'd')
- 505±2μs 195±4μs 0.39 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'f')
- 81.6±2μs 31.4±0.3μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'd')
- 509±3μs 195±2μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'f')
- 436±2μs 167±5μs 0.38 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 4, 'd')
- 436±3μs 166±3μs 0.38 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 4, 'd')
- 218±2μs 83.0±2μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'f')
- 12.0±0.2μs 4.56±0.02μs 0.38 bench_reduce.MinMax.time_max(<class 'numpy.uint32'>)
- 215±0.9μs 81.0±0.3μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'f')
- 220±2μs 82.5±1μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'f')
- 12.0±0.2μs 4.50±0.05μs 0.38 bench_reduce.MinMax.time_min(<class 'numpy.int32'>)
- 505±2μs 189±1μs 0.38 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'f')
- 36.3±0.1μs 13.6±0.08μs 0.37 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'F')
- 2.71±0.02ms 1.01±0ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd')
- 2.76±0.03ms 1.03±0.01ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd')
- 425±3μs 159±6μs 0.37 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 4, 'd')
- 439±1μs 163±10μs 0.37 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 4, 'd')
- 218±0.9μs 80.8±0.6μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'f')
- 217±1μs 80.5±0.5μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'f')
- 2.74±0.08ms 1.01±0.04ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd')
- 2.74±0.03ms 1.01±0.01ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd')
- 12.2±0.2μs 4.48±0.06μs 0.37 bench_reduce.MinMax.time_min(<class 'numpy.uint32'>)
- 217±0.5μs 79.9±0.5μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'f')
- 219±2μs 80.6±0.4μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'f')
- 2.78±0.03ms 1.02±0.01ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd')
- 2.76±0.04ms 1.02±0.01ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd')
- 2.74±0.04ms 1.01±0ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd')
- 216±0.5μs 79.2±0.4μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'f')
- 2.72±0.04ms 996±20μs 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd')
- 2.87±0.09ms 1.05±0.03ms 0.37 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd')
- 219±5μs 79.4±0.4μs 0.36 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'f')
- 432±2μs 156±3μs 0.36 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 4, 'd')
- 36.2±0.2μs 13.0±0.08μs 0.36 bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'F')
- 12.2±0.2μs 4.39±0.02μs 0.36 bench_reduce.MinMax.time_max(<class 'numpy.int32'>)
- 448±6μs 160±10μs 0.36 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 4, 'd')
- 455±10μs 161±10μs 0.35 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 4, 'd')
- 443±10μs 156±4μs 0.35 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 1, 'd')
- 428±2μs 150±1μs 0.35 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 2, 'd')
- 430±1μs 148±5μs 0.34 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 2, 'd')
- 433±1μs 149±7μs 0.34 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 1, 'd')
- 430±0.8μs 145±4μs 0.34 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 2, 'd')
- 103±0.3μs 34.5±0.06μs 0.34 bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float32'>)
- 435±5μs 146±2μs 0.34 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 2, 'd')
- 103±1μs 34.5±0.1μs 0.34 bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
- 426±0.3μs 143±5μs 0.33 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 2, 'd')
- 8.84±0.04μs 2.96±0.04μs 0.33 bench_reduce.MinMax.time_min(<class 'numpy.uint8'>)
- 430±3μs 143±3μs 0.33 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 1, 'd')
- 428±1μs 142±7μs 0.33 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 1, 'd')
- 442±10μs 147±6μs 0.33 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 4, 'd')
- 438±3μs 145±6μs 0.33 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 2, 'd')
- 8.83±0.03μs 2.92±0.03μs 0.33 bench_reduce.MinMax.time_max(<class 'numpy.uint8'>)
- 158±0.5μs 52.2±1μs 0.33 bench_reduce.ArgMax.time_argmax(<class 'numpy.int64'>)
- 75.5±0.9μs 24.9±0.4μs 0.33 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'I')
- 156±0.2μs 51.5±0.6μs 0.33 bench_reduce.ArgMin.time_argmin(<class 'numpy.uint64'>)
- 433±4μs 142±8μs 0.33 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 2, 'd')
- 77.3±0.5μs 25.2±0.4μs 0.33 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'I')
- 76.2±0.4μs 24.8±0.5μs 0.33 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'i')
- 433±4μs 141±6μs 0.32 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 1, 'd')
- 76.0±2μs 24.7±0.6μs 0.32 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'i')
- 447±9μs 145±0.5μs 0.32 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 4, 'd')
- 447±9μs 145±4μs 0.32 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 4, 'd')
- 437±0.9μs 141±7μs 0.32 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 2, 'd')
- 159±0.5μs 51.4±0.4μs 0.32 bench_reduce.ArgMax.time_argmax(<class 'numpy.uint64'>)
- 432±4μs 139±5μs 0.32 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 1, 'd')
- 158±0.4μs 50.6±0.1μs 0.32 bench_reduce.ArgMin.time_argmin(<class 'numpy.int64'>)
- 81.3±0.6μs 25.7±0.3μs 0.32 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'd')
- 443±9μs 140±5μs 0.32 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 4, 'd')
- 432±4μs 130±10μs 0.30 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 1, 'd')
- 157±0.1μs 46.8±0.4μs 0.30 bench_reduce.ArgMin.time_argmin(<class 'numpy.int32'>)
- 158±1μs 46.4±0.3μs 0.29 bench_reduce.ArgMax.time_argmax(<class 'numpy.uint32'>)
- 157±0.4μs 45.5±0.5μs 0.29 bench_reduce.ArgMin.time_argmin(<class 'numpy.uint32'>)
- 157±0.4μs 45.5±0.3μs 0.29 bench_reduce.ArgMax.time_argmax(<class 'numpy.int32'>)
- 434±6μs 123±3μs 0.28 bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 1, 'd')
- 426±0.9μs 120±3μs 0.28 bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 1, 'd')
- 199±2μs 55.5±2μs 0.28 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'f')
- 199±2μs 55.2±1μs 0.28 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f')
- 198±2μs 54.9±1μs 0.28 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'f')
- 439±9μs 120±3μs 0.27 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 4, 'd')
- 198±2μs 54.2±1μs 0.27 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f')
- 199±2μs 54.5±3μs 0.27 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'f')
- 441±9μs 119±2μs 0.27 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 4, 'd')
- 437±2μs 118±6μs 0.27 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 1, 'd')
- 40.3±0.3μs 10.9±0.03μs 0.27 bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float32'>)
- 424±0.8μs 114±4μs 0.27 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 2, 'd')
- 195±0.7μs 51.7±0.4μs 0.27 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f')
- 430±3μs 113±3μs 0.26 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 2, 'd')
- 195±0.7μs 50.7±0.4μs 0.26 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'f')
- 13.6±0.04μs 3.52±0.01μs 0.26 bench_reduce.MinMax.time_max(<class 'numpy.uint16'>)
- 195±1μs 50.3±0.3μs 0.26 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'f')
- 200±2μs 51.0±1μs 0.26 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'f')
- 197±3μs 50.3±0.3μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f')
- 13.7±0.09μs 3.48±0.03μs 0.25 bench_reduce.MinMax.time_min(<class 'numpy.uint16'>)
- 200±3μs 50.4±1μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'f')
- 201±3μs 50.5±1μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'f')
- 200±3μs 50.0±1μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'f')
- 200±3μs 49.4±0.7μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f')
- 200±3μs 49.1±0.6μs 0.25 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'f')
- 1.95±0ms 477±3μs 0.24 bench_reduce.AddReduceSeparate.time_reduce(0, 'float32')
- 597±4μs 146±1μs 0.24 bench_ufunc.CustomInplace.time_float_add_temp
- 12.5±0.05μs 2.98±0.02μs 0.24 bench_reduce.MinMax.time_min(<class 'numpy.int8'>)
- 584±1μs 137±1μs 0.24 bench_ufunc.CustomInplace.time_float_add
- 15.0±0.05μs 3.52±0.02μs 0.24 bench_reduce.MinMax.time_min(<class 'numpy.int16'>)
- 12.6±0.08μs 2.95±0.02μs 0.23 bench_reduce.MinMax.time_max(<class 'numpy.int8'>)
- 15.0±0.08μs 3.51±0.02μs 0.23 bench_reduce.MinMax.time_max(<class 'numpy.int16'>)
- 913±8μs 203±7μs 0.22 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'd')
- 881±3μs 195±4μs 0.22 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'd')
- 197±0.8μs 42.7±0.4μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'f')
- 198±2μs 42.8±0.5μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f')
- 199±3μs 43.1±0.1μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'f')
- 197±2μs 42.7±0.7μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'f')
- 199±3μs 43.0±0.6μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f')
- 197±1μs 42.5±0.4μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'f')
- 196±0.5μs 42.3±0.4μs 0.22 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'f')
- 200±3μs 43.0±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'f')
- 199±1μs 42.6±0.5μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'f')
- 199±0.8μs 42.4±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'f')
- 195±1μs 41.4±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'f')
- 196±0.9μs 41.4±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'f')
- 195±1μs 41.2±0.5μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'f')
- 195±2μs 41.2±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'f')
- 196±1μs 41.3±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'f')
- 195±0.5μs 41.0±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'f')
- 194±0.6μs 40.9±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'f')
- 195±0.8μs 41.0±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'f')
- 195±0.2μs 40.9±0.5μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'f')
- 196±0.6μs 40.9±0.4μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'f')
- 196±1μs 40.8±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'f')
- 197±2μs 40.9±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'f')
- 195±0.4μs 40.4±0.1μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'f')
- 197±1μs 40.8±0.1μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'f')
- 198±0.9μs 40.9±0.3μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'f')
- 196±0.4μs 40.4±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'f')
- 196±2μs 40.3±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'f')
- 196±2μs 40.3±0.2μs 0.21 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'f')
- 198±2μs 40.7±0.2μs 0.20 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'f')
- 199±3μs 40.6±0.5μs 0.20 bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'f')
- 894±1μs 173±6μs 0.19 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'd')
- 880±2μs 166±2μs 0.19 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'd')
- 878±1μs 164±4μs 0.19 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'd')
- 901±5μs 167±4μs 0.19 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'd')
- 899±5μs 167±7μs 0.19 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'd')
- 424±2μs 78.8±2μs 0.19 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 2, 'd')
- 428±1μs 78.3±2μs 0.18 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 2, 'd')
- 887±6μs 159±5μs 0.18 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'd')
- 423±1μs 74.6±2μs 0.18 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 1, 'd')
- 878±2μs 155±6μs 0.18 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'd')
- 427±2μs 75.2±1μs 0.18 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 2, 'd')
- 425±3μs 74.6±2μs 0.18 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 1, 'd')
- 422±0.8μs 73.6±0.5μs 0.17 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 2, 'd')
- 906±6μs 156±6μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'd')
- 434±4μs 74.7±2μs 0.17 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 1, 'd')
- 914±20μs 157±9μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'd')
- 877±1μs 151±5μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'd')
- 898±10μs 154±5μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'd')
- 931±20μs 159±10μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'd')
- 429±5μs 73.5±3μs 0.17 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 1, 'd')
- 881±1μs 149±10μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'd')
- 895±3μs 152±4μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'd')
- 896±3μs 150±5μs 0.17 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'd')
- 24.4±0.2μs 4.06±0.03μs 0.17 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 43)
- 904±5μs 149±4μs 0.16 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'd')
- 880±2μs 144±4μs 0.16 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'd')
- 24.9±0.2μs 4.02±0.01μs 0.16 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 8)
- 879±2μs 141±1μs 0.16 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'd')
- 421±1μs 67.1±1μs 0.16 bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 1, 'd')
- 420±2μs 66.5±1μs 0.16 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 2, 'd')
- 890±2μs 141±2μs 0.16 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'd')
- 82.5±0.3μs 12.9±0.07μs 0.16 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'h')
- 875±2μs 136±1μs 0.16 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'd')
- 83.1±0.7μs 12.9±0.2μs 0.16 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'h')
- 427±0.8μs 66.1±1μs 0.15 bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 1, 'd')
- 427±2μs 65.3±1μs 0.15 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 2, 'd')
- 935±20μs 141±4μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'd')
- 921±20μs 139±1μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'd')
- 86.9±0.6μs 13.1±0.1μs 0.15 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'H')
- 937±20μs 141±3μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'd')
- 898±3μs 135±3μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'd')
- 917±20μs 138±3μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'd')
- 892±2μs 134±5μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'd')
- 86.3±0.7μs 12.8±0.08μs 0.15 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'H')
- 884±6μs 131±5μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'd')
- 65.9±0.4μs 9.76±0.1μs 0.15 bench_reduce.FMinMax.time_min(<class 'numpy.float64'>)
- 898±4μs 133±5μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'd')
- 883±3μs 130±4μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'd')
- 881±4μs 129±3μs 0.15 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'd')
- 67.1±0.5μs 9.62±0.1μs 0.14 bench_reduce.FMinMax.time_max(<class 'numpy.float64'>)
- 892±1μs 119±3μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'd')
- 660±6μs 87.8±1μs 0.13 bench_lib.Nan.time_nanmax(200000, 0)
- 672±5μs 88.5±1μs 0.13 bench_lib.Nan.time_nanmin(200000, 0)
- 921±30μs 121±3μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'd')
- 671±5μs 88.2±2μs 0.13 bench_lib.Nan.time_nanmax(200000, 2.0)
- 874±1μs 115±2μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'd')
- 669±3μs 87.5±1μs 0.13 bench_lib.Nan.time_nanmax(200000, 0.1)
- 667±10μs 87.3±2μs 0.13 bench_lib.Nan.time_nanmin(200000, 0.1)
- 672±5μs 87.7±2μs 0.13 bench_lib.Nan.time_nanmin(200000, 2.0)
- 877±3μs 114±9μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'd')
- 931±20μs 120±2μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'd')
- 895±6μs 114±3μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'd')
- 880±4μs 112±2μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'd')
- 891±2μs 112±9μs 0.13 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'd')
- 79.7±0.1μs 9.53±0.2μs 0.12 bench_reduce.ArgMax.time_argmax(<class 'bool'>)
- 418±0.9μs 45.2±0.9μs 0.11 bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 1, 'd')
- 427±3μs 45.2±0.8μs 0.11 bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 1, 'd')
- 68.5±0.3μs 7.19±0.07μs 0.10 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'b')
- 68.3±0.4μs 7.14±0.08μs 0.10 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'b')
- 156±0.2μs 15.9±0.2μs 0.10 bench_reduce.ArgMin.time_argmin(<class 'numpy.int16'>)
- 157±0.3μs 15.8±0.1μs 0.10 bench_reduce.ArgMax.time_argmax(<class 'numpy.int16'>)
- 70.5±0.2μs 6.45±0.04μs 0.09 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 43)
- 70.6±0.2μs 6.40±0.06μs 0.09 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -43)
- 75.7±0.5μs 6.48±0.04μs 0.09 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -8)
- 186±2μs 15.9±0.2μs 0.09 bench_reduce.ArgMin.time_argmin(<class 'numpy.uint16'>)
- 75.9±0.4μs 6.42±0.06μs 0.08 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 8)
- 189±0.3μs 15.8±0.09μs 0.08 bench_reduce.ArgMax.time_argmax(<class 'numpy.uint16'>)
- 901±20μs 74.7±1μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'd')
- 881±5μs 71.7±0.7μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'd')
- 89.1±0.2μs 7.22±0.1μs 0.08 bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'B')
- 1.10±0.01ms 88.9±1μs 0.08 bench_lib.Nan.time_nanmax(200000, 90.0)
- 88.6±0.1μs 7.08±0.07μs 0.08 bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'B')
- 1.10±0ms 87.3±1μs 0.08 bench_lib.Nan.time_nanmin(200000, 90.0)
- 881±5μs 69.3±1μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'd')
- 893±2μs 69.9±0.8μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'd')
- 872±5μs 68.1±1μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'd')
- 901±4μs 70.3±1μs 0.08 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'd')
- 890±9μs 65.1±0.6μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'd')
- 889±2μs 64.8±0.8μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'd')
- 41.6±0.07μs 3.02±0.01μs 0.07 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 8)
- 880±5μs 63.7±2μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'd')
- 41.6±0.2μs 2.99±0.02μs 0.07 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 43)
- 903±5μs 62.9±1μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'd')
- 876±4μs 60.5±0.2μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'd')
- 907±6μs 61.8±1μs 0.07 bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'd')
- 156±0.4μs 9.44±0.2μs 0.06 bench_reduce.ArgMin.time_argmin(<class 'numpy.int8'>)
- 157±0.7μs 9.41±0.2μs 0.06 bench_reduce.ArgMin.time_argmin(<class 'numpy.uint8'>)
- 157±0.7μs 9.32±0.09μs 0.06 bench_reduce.ArgMax.time_argmax(<class 'numpy.int8'>)
- 157±0.6μs 9.25±0.03μs 0.06 bench_reduce.ArgMax.time_argmax(<class 'numpy.uint8'>)
- 1.53±0ms 87.5±2μs 0.06 bench_lib.Nan.time_nanmax(200000, 50.0)
- 1.55±0.01ms 87.0±1μs 0.06 bench_lib.Nan.time_nanmin(200000, 50.0)
- 69.9±0.2μs 3.75±0.02μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -43)
- 70.3±0.4μs 3.77±0.04μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 43)
- 41.3±0.4μs 2.18±0μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 8)
- 41.3±0.3μs 2.18±0.01μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 43)
- 885±10μs 45.9±0.6μs 0.05 bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'd')
- 892±5μs 45.3±0.5μs 0.05 bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'd')
- 74.5±0.8μs 3.76±0.03μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -8)
- 74.0±0.1μs 3.73±0.05μs 0.05 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 8)
- 71.3±0.3μs 2.43±0.02μs 0.03 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 43)
- 71.3±0.4μs 2.42±0.01μs 0.03 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -43)
- 75.6±0.6μs 2.42±0.04μs 0.03 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 8)
- 75.3±0.7μs 2.39±0.01μs 0.03 bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -8)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED. |
cc @mattip |
Let's put this in now, at the beginning of the 1.24 release cycle, in hopes that some IBMZ users can chime in. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
01 - Enhancement
56 - Needs Release Note.
Needs an entry in doc/release/upcoming_changes
component: SIMD
Issues in SIMD (fast instruction sets) code or machinery
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Extend universal intrinsics to support IBMZ
It covers SIMD operations for all datatypes starting
from z/Arch11 a.k.a IBM Z13, except for single-precision
which requires minimum z/Arch12 a.k.a IBMZ 14 to be dispatched.
This patch rename the branch /simd/vsx to /simd/vec, the new
the path holds the definitions of universal intrinsics for
both Power and Z architectures.
This patch also adds new preprocessor identifiers:
NPY_SIMD_BIGENDIAN: 1 if the enabled SIMD extension
is running on big-endian mode otherwise 0.
NPY_SIMD_F32: 1 if the enabled SIMD extension
supports single-precision otherwise 0.
TODO:
Benchmark
The following benchmark is inferential and does not accurately reflect the true change, since it was accomplished using an unstable VM.
CPU
OS
Linux numpy 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:13 UTC 2022 s390x s390x s390x GNU/Linux Python 3.8.10 gcc (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0
Benchmark
VXE2
unset NPY_DISABLE_CPU_FEATURES python runtests.py --bench-compare parent/main
VXE