Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH, SIMD: Extend universal intrinsics to support IBMZ #20913

Merged
merged 6 commits into from Jun 12, 2022

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Jan 27, 2022

Extend universal intrinsics to support IBMZ

It covers SIMD operations for all datatypes starting
from z/Arch11 a.k.a IBM Z13, except for single-precision
which requires minimum z/Arch12 a.k.a IBMZ 14 to be dispatched.

This patch rename the branch /simd/vsx to /simd/vec, the new
the path holds the definitions of universal intrinsics for
both Power and Z architectures.

This patch also adds new preprocessor identifiers:

  • NPY_SIMD_BIGENDIAN: 1 if the enabled SIMD extension
    is running on big-endian mode otherwise 0.

  • NPY_SIMD_F32: 1 if the enabled SIMD extension
    supports single-precision otherwise 0.

TODO:

  • release note
  • benchmark

Benchmark

The following benchmark is inferential and does not accurately reflect the true change, since it was accomplished using an unstable VM.

CPU
Architecture:                    s390x
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Big Endian
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s) per book:              1
Book(s) per drawer:              1
Drawer(s):                       2
NUMA node(s):                    1
Vendor ID:                       IBM/S390
Machine type:                    8561
CPU dynamic MHz:                 5200
CPU static MHz:                  5200
BogoMIPS:                        3241.00
Hypervisor:                      z/VM 7.1.0
Hypervisor vendor:               IBM
Virtualization type:             full
Dispatching mode:                horizontal
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2d cache:                       8 MiB
L2i cache:                       8 MiB
L3 cache:                        256 MiB
L4 cache:                        960 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; etokens
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te
                                 vx vxd vxe gs vxe2 vxp sort dflt sie
OS
Linux numpy 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:13 UTC 2022 s390x s390x s390x GNU/Linux
Python 3.8.10
gcc (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0

Benchmark

VXE2
unset NPY_DISABLE_CPU_FEATURES
python runtests.py --bench-compare parent/main
    before           after         ratio
     [982fcd38]       [47d54c6d]
     <zsystem_sup~5>       <zsystem_sup>
+     8.41±0.03μs       10.2±0.3μs     1.21  bench_function_base.Sort.time_sort('merge', 'int64', ('uniform',))
+      82.8±0.4μs       99.0±0.3μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'h')
+      83.0±0.3μs       99.2±0.3μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'h')
+      83.5±0.7μs       99.7±0.7μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'h')
+      84.3±0.8μs        101±0.5μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'h')
+      83.1±0.2μs       99.2±0.2μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'h')
+      85.0±0.7μs        101±0.4μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'h')
+      83.1±0.4μs       99.0±0.3μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'h')
+     10.3±0.03μs      12.2±0.06μs     1.19  bench_function_base.Sort.time_sort('merge', 'float32', ('ordered',))
+      83.7±0.5μs       9.2±0.4μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'h')
+       127±0.4μs          149±3μs     1.17  bench_function_base.Sort.time_argsort('quick', 'int32', ('reversed',))
+         322±1μs          364±2μs     1.13  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 1000))
+        87.6±3μs       99.0±0.8μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'b')
+         249±1μs          281±2μs     1.13  bench_core.CountNonzero.time_count_nonzero(3, 10000, <class 'str'>)
+      87.4±0.8μs       98.6±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'b')
+         131±5μs          148±2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'q')
+     11.9±0.08μs       13.4±0.3μs     1.13  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'D')
+      87.6±0.5μs       98.5±0.2μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'b')
+      13.3±0.2μs       14.9±0.2μs     1.12  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'D')
+     1.47±0.02ms      1.64±0.05ms     1.12  bench_lib.Pad.time_pad((256, 128, 1), 8, 'edge')
+       167±0.9μs          186±4μs     1.11  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'str'>)
+         113±3μs          126±2μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'Q')
+      5.71±0.2ms       6.32±0.2ms     1.11  bench_lib.Pad.time_pad((4194304,), 1, 'mean')
+         283±7μs         312±10μs     1.10  bench_core.PackBits.time_packbits_axis0(<class 'bool'>)
+         329±3μs          363±2μs     1.10  bench_function_base.Sort.time_argsort('quick', 'int32', ('sorted_block', 1000))
+      70.6±0.5μs       77.9±0.7μs     1.10  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 10000)
+         114±1μs          126±3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'L')
+      87.7±0.9μs       96.6±0.2μs     1.10  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 100))
+       299±0.8μs          329±1μs     1.10  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 1000))
+      79.2±0.2μs       87.1±0.1μs     1.10  bench_function_base.Where.time_interleaved_zeros_x8
+      61.0±0.5μs       67.0±0.5μs     1.10  bench_function_base.Select.time_select
+         117±5μs          129±3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'q')
+      82.3±0.4μs         89.9±2μs     1.09  bench_core.CountNonzero.time_count_nonzero(1, 10000, <class 'str'>)
+      52.9±0.6μs       57.5±0.3μs     1.09  bench_function_base.Sort.time_argsort('merge', 'int16', ('sorted_block', 10))
+         195±1μs        212±0.6μs     1.09  bench_function_base.Sort.time_sort('merge', 'int32', ('sorted_block', 10))
+     3.33±0.04ms       3.60±0.3ms     1.08  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'edge')
+      7.01±0.1ms       7.57±0.2ms     1.08  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 1000000)
+      5.26±0.1ms       5.68±0.2ms     1.08  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
+         138±4μs          148±3μs     1.08  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'L')
+      20.5±0.4ms       22.0±0.3ms     1.07  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
+         408±2μs        438±0.8μs     1.07  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 100))
+         184±2μs          197±8μs     1.07  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'constant')
+        52.7±2ms         56.4±2ms     1.07  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
+         121±2μs          129±3μs     1.07  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'L')
+       104±0.7μs        111±0.3μs     1.07  bench_function_base.Sort.time_sort('quick', 'float32', ('uniform',))
+       368±0.7μs          393±1μs     1.07  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 100))
+         510±5μs          544±3μs     1.06  bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',))
+     1.80±0.02ms      1.91±0.03ms     1.06  bench_core.CountNonzero.time_count_nonzero(3, 1000000, <class 'numpy.int64'>)
+     3.44±0.03ms      3.65±0.07ms     1.06  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'numpy.int64'>)
+         534±2μs         566±10μs     1.06  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 4, 4, 'd')
+      49.3±0.4μs       52.3±0.4μs     1.06  bench_function_base.Sort.time_sort('merge', 'float32', ('sorted_block', 1000))
+         141±4μs          149±3μs     1.06  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'l')
+         121±3μs          129±4μs     1.06  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'q')
+      93.3±0.7μs         98.8±1μs     1.06  bench_function_base.Sort.time_argsort('merge', 'int64', ('sorted_block', 100))
+        942±30μs         997±20μs     1.06  bench_reduce.AddReduceSeparate.time_reduce(1, 'int16')
+         410±6μs        434±0.9μs     1.06  bench_function_base.Sort.time_argsort('quick', 'int32', ('sorted_block', 100))
+         531±2μs          561±2μs     1.06  bench_function_base.Sort.time_argsort('heap', 'float32', ('ordered',))
+         526±1μs         556±10μs     1.06  bench_function_base.Sort.time_argsort('merge', 'int64', ('random',))
+        75.5±1μs         79.8±2μs     1.06  bench_function_base.Sort.time_argsort('quick', 'uint32', ('ordered',))
+      57.0±0.7μs       60.2±0.4μs     1.06  bench_function_base.Sort.time_argsort('merge', 'int16', ('sorted_block', 100))
+         534±7μs          562±4μs     1.05  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'exp2'>, 4, 2, 'd')
+       727±0.9μs         765±30μs     1.05  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'log'>, 2, 2, 'd')
+        683±10μs          719±4μs     1.05  bench_lib.Pad.time_pad((256, 128, 1), 8, 'constant')
+        46.0±2μs       48.4±0.2μs     1.05  bench_function_base.Sort.time_argsort('heap', 'int64', ('uniform',))
+         121±1μs          128±4μs     1.05  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'Q')
+     2.10±0.02ms      2.21±0.03ms     1.05  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'mean')
+         237±1ms          249±2ms     1.05  bench_function_base.Histogram2D.time_fine_binning
+     1.20±0.01ms      1.26±0.01ms     1.05  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int64'>)
-      5.47±0.1μs      5.21±0.05μs     0.95  bench_ma.Indexing.time_1d(False, 1, 10)
-     1.50±0.02ms      1.43±0.01ms     0.95  bench_lib.Nan.time_nanargmin(200000, 50.0)
-      30.5±0.1μs       29.1±0.2μs     0.95  bench_function_base.Median.time_odd_inplace
-      88.8±0.7μs       84.6±0.4μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'H')
-      13.7±0.1μs      13.1±0.08μs     0.95  bench_ma.UFunc.time_2d(False, False, 10)
-         696±4μs        662±0.5μs     0.95  bench_function_base.Sort.time_sort('heap', 'float32', ('sorted_block', 10))
-      36.9±0.1μs       35.1±0.1μs     0.95  bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'numpy.int8'>)
-      72.6±0.6μs         69.0±1μs     0.95  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 10000)
-      90.9±0.6μs       86.3±0.8μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'H')
-      88.3±0.3μs       83.9±0.4μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'H')
-       473±0.7μs          449±1μs     0.95  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 10))
-      88.3±0.3μs       83.8±0.5μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'H')
-      38.8±0.9μs      36.8±0.07μs     0.95  bench_linalg.Linalg.time_op('norm', 'complex64')
-      88.9±0.7μs         84.2±1μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'H')
-      92.0±0.1μs         87.1±1μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'H')
-      5.79±0.1μs      5.48±0.01μs     0.95  bench_ma.Indexing.time_1d(True, 1, 1000)
-      87.2±0.3μs       82.5±0.4μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'H')
-     7.23±0.05ms      6.85±0.03ms     0.95  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 1000000)
-     2.34±0.02ms      2.21±0.02ms     0.95  bench_lib.Nan.time_nanstd(200000, 90.0)
-      5.46±0.1μs      5.17±0.05μs     0.95  bench_ma.Indexing.time_1d(False, 2, 10)
-      5.46±0.1μs       5.16±0.1μs     0.95  bench_ma.Indexing.time_1d(False, 2, 1000)
-     5.86±0.05μs      5.54±0.02μs     0.95  bench_indexing.ScalarIndexing.time_assign(0)
-      5.45±0.1μs       5.15±0.1μs     0.95  bench_ma.Indexing.time_0d(False, 2, 10)
-      5.80±0.1μs      5.48±0.04μs     0.94  bench_ma.Indexing.time_0d(True, 1, 10)
-      88.5±0.6μs       83.6±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'H')
-      87.7±0.4μs       82.8±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'H')
-      5.83±0.1μs      5.50±0.04μs     0.94  bench_ma.Indexing.time_0d(True, 1, 100)
-      90.6±0.3μs       85.5±0.8μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'H')
-       151±0.8μs        143±0.5μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('reversed',))
-      89.5±0.5μs       84.4±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'H')
-     2.33±0.02ms      2.20±0.01ms     0.94  bench_lib.Nan.time_nanvar(200000, 90.0)
-      87.8±0.3μs       82.8±0.4μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'H')
-      18.6±0.2μs       17.5±0.2μs     0.94  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 4, 'D')
-      92.8±0.9μs         87.4±1μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'H')
-      89.9±0.5μs       84.7±0.6μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'H')
-      90.1±0.4μs       84.8±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'H')
-      89.9±0.4μs       84.6±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'H')
-      5.73±0.1μs      5.40±0.03μs     0.94  bench_ma.Indexing.time_1d(True, 1, 100)
-        89.3±1μs         83.9±1μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'H')
-      91.4±0.5μs       85.9±0.8μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'H')
-      5.87±0.1μs      5.52±0.05μs     0.94  bench_ma.Indexing.time_0d(True, 2, 10)
-      78.0±0.8μs       73.2±0.2μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int32', ('ordered',))
-        90.1±1μs       84.7±0.7μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'H')
-      88.1±0.4μs       82.7±0.2μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'H')
-         817±4μs          767±2μs     0.94  bench_ufunc.UFunc.time_ufunc_types('reciprocal')
-     1.42±0.01ms      1.33±0.01ms     0.94  bench_lib.Nan.time_nanvar(200000, 0.1)
-      90.2±0.3μs       84.6±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'H')
-         491±3μs          461±2μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 100))
-        91.9±1μs         86.1±1μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'H')
-      5.89±0.2μs      5.52±0.06μs     0.94  bench_ma.Indexing.time_1d(True, 2, 100)
-      5.49±0.2μs      5.15±0.05μs     0.94  bench_ma.Indexing.time_1d(False, 1, 100)
-        91.4±1μs       85.6±0.2μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'H')
-         355±6μs          332±2μs     0.94  bench_ufunc.UFunc.time_ufunc_types('square')
-      87.8±0.5μs       82.2±0.2μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'H')
-      90.2±0.8μs       84.5±0.4μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'H')
-     5.46±0.05μs       5.11±0.1μs     0.94  bench_ma.Indexing.time_0d(False, 1, 100)
-     1.50±0.01ms      1.41±0.01ms     0.94  bench_lib.Nan.time_nanstd(200000, 2.0)
-     5.82±0.04μs      5.44±0.02μs     0.94  bench_ma.Indexing.time_0d(True, 1, 1000)
-      87.6±0.3μs       81.9±0.2μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'H')
-      89.7±0.2μs       83.8±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'H')
-     87.6±0.09μs       81.9±0.1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'H')
-      91.2±0.9μs       85.2±0.9μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'H')
-      90.5±0.9μs       84.5±0.8μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'H')
-      91.2±0.7μs         85.2±1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'H')
-     5.46±0.07μs      5.10±0.09μs     0.93  bench_ma.Indexing.time_0d(False, 2, 1000)
-     1.42±0.01ms      1.33±0.02ms     0.93  bench_lib.Nan.time_nanvar(200000, 0)
-      92.3±0.5μs       86.1±0.7μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'H')
-     5.50±0.07μs      5.13±0.08μs     0.93  bench_ma.Indexing.time_0d(False, 2, 100)
-      91.3±0.3μs       85.1±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'H')
-        89.8±1μs       83.7±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'H')
-         466±4μs          434±1μs     0.93  bench_function_base.Sort.time_sort('merge', 'uint32', ('random',))
-         277±3μs          258±1μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'd')
-      90.9±0.5μs       84.6±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'H')
-      94.1±0.8μs       87.6±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'H')
-      89.1±0.6μs       82.9±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'H')
-      17.8±0.2μs      16.5±0.09μs     0.93  bench_core.VarComplex.time_var(1000)
-         275±3μs          256±3μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'd')
-       273±0.5μs          254±2μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'd')
-      90.0±0.2μs       83.6±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'H')
-      89.4±0.6μs       83.1±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'H')
-      93.0±0.7μs       86.4±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'H')
-      89.9±0.4μs       83.4±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'H')
-         272±2μs          253±5μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'd')
-     1.42±0.01ms      1.32±0.01ms     0.93  bench_lib.Nan.time_nanstd(200000, 0.1)
-      88.4±0.4μs       82.0±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'H')
-      5.95±0.1μs      5.52±0.08μs     0.93  bench_ma.Indexing.time_1d(True, 2, 10)
-      91.4±0.5μs       84.8±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'H')
-         274±2μs          254±2μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'd')
-     5.48±0.04μs      5.07±0.04μs     0.93  bench_ma.Indexing.time_0d(False, 1, 10)
-     5.94±0.08μs      5.49±0.02μs     0.93  bench_ma.Indexing.time_0d(True, 2, 1000)
-      90.0±0.6μs       83.1±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'H')
-      93.6±0.6μs       86.4±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'H')
-      92.6±0.5μs       85.5±0.9μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'H')
-         276±3μs        253±0.8μs     0.92  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'd')
-     1.51±0.01ms      1.39±0.01ms     0.92  bench_lib.Nan.time_nanvar(200000, 2.0)
-     1.86±0.01μs         1.71±0μs     0.92  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'int64')
-      5.59±0.2μs      5.14±0.04μs     0.92  bench_ma.Indexing.time_1d(False, 1, 1000)
-     1.87±0.01μs      1.72±0.03μs     0.92  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'float32')
-      5.91±0.1μs      5.42±0.05μs     0.92  bench_ma.Indexing.time_1d(True, 1, 10)
-      5.97±0.1μs      5.47±0.04μs     0.92  bench_ma.Indexing.time_0d(True, 2, 100)
-      76.1±0.3μs       69.7±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'I')
-         248±2ms          227±1ms     0.92  bench_app.LaplaceInplace.time_it('normal')
-     6.08±0.02μs      5.56±0.04μs     0.92  bench_itemselection.PutMask.time_dense(False, 'complex256')
-     1.87±0.01μs      1.70±0.01μs     0.91  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'float64')
-     1.89±0.01μs      1.72±0.02μs     0.91  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'int32')
-     4.47±0.08μs      4.08±0.02μs     0.91  bench_ma.MA.time_masked_array
-        93.7±2μs       85.4±0.5μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'H')
-         357±2μs          325±1μs     0.91  bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 1000))
-      75.7±0.4μs       69.0±0.8μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'i')
-      77.6±0.8μs       70.6±0.4μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'I')
-      75.4±0.3μs       68.6±0.2μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'I')
-      82.5±0.7μs         74.9±3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'Q')
-     81.9±0.09μs         74.3±1μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'Q')
-      75.4±0.8μs       68.4±0.5μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'I')
-      75.6±0.3μs       68.6±0.6μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'i')
-     1.91±0.02μs      1.73±0.02μs     0.91  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'int16')
-      75.4±0.5μs       68.2±0.4μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'I')
-     1.16±0.03μs      1.05±0.01μs     0.90  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 100)
-        944±20μs          853±5μs     0.90  bench_lib.Nan.time_nanargmax(200000, 90.0)
-     3.28±0.03μs         2.96±0μs     0.90  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'float64')
-        87.9±1μs         79.3±1μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'l')
-      75.6±0.3μs       68.2±0.2μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'i')
-     4.02±0.04ms      3.63±0.04ms     0.90  bench_core.VarComplex.time_var(1000000)
-     1.88±0.01μs      1.70±0.02μs     0.90  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'float16')
-      76.8±0.5μs       69.1±0.2μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'I')
-      76.7±0.3μs       69.0±0.4μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'I')
-      75.8±0.5μs       68.1±0.5μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'i')
-      76.4±0.3μs       68.7±0.2μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'I')
-         563±4μs        506±0.7μs     0.90  bench_function_base.Sort.time_sort('heap', 'float32', ('reversed',))
-     1.89±0.02μs      1.69±0.01μs     0.90  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'float32')
-         107±4μs         96.4±2μs     0.90  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
-        89.7±2μs         80.6±2μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'L')
-     1.90±0.02μs         1.71±0μs     0.90  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'complex64')
-        76.7±1μs       68.8±0.2μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'I')
-         936±5μs         839±10μs     0.90  bench_lib.Nan.time_nanargmin(200000, 90.0)
-     1.45±0.06ms      1.30±0.04ms     0.90  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'mean')
-      79.2±0.6μs       71.0±0.8μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'i')
-         480±2μs          430±2μs     0.90  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 1000))
-        74.1±2μs         66.3±3μs     0.90  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'd')
-      77.4±0.4μs       69.2±0.3μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'i')
-        78.6±2μs       70.3±0.6μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'I')
-      4.78±0.2ms      4.28±0.08ms     0.89  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'linear_ramp')
-        82.8±1μs         74.0±2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'l')
-        61.6±1μs       55.1±0.9μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 2, 'd')
-        78.5±1μs       70.2±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'i')
-      79.9±0.6μs       71.5±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'I')
-      84.6±0.8μs         75.6±2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'I')
-      88.1±0.9μs         78.7±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'i')
-        88.4±1μs         79.0±2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'q')
-     1.90±0.02μs      1.70±0.01μs     0.89  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'int32')
-      79.0±0.8μs       70.5±0.8μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'I')
-      76.5±0.6μs       68.3±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'i')
-         269±2μs          240±1μs     0.89  bench_ufunc.UFunc.time_ufunc_types('maximum')
-      76.9±0.2μs       68.6±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'i')
-      60.6±0.6μs       54.1±0.3μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 1, 'd')
-      77.5±0.5μs       69.1±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'i')
-      80.3±0.6μs       71.7±0.8μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'i')
-        81.3±1μs         72.5±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'I')
-        77.4±1μs       69.0±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'I')
-      89.7±0.9μs       79.9±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'i')
-      82.9±0.5μs         73.8±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'q')
-         430±8μs          383±3μs     0.89  bench_ufunc.UFunc.time_ufunc_types('multiply')
-      79.6±0.4μs       70.9±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'i')
-      80.9±0.2μs       72.0±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'i')
-      81.5±0.2μs         72.6±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'i')
-      88.6±0.9μs       78.8±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'q')
-        89.3±2μs       79.5±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'l')
-     1.48±0.01μs      1.31±0.02μs     0.89  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int16')
-     3.25±0.02μs      2.89±0.02μs     0.89  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'int32')
-      83.1±0.7μs       73.8±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'l')
-        79.3±2μs       70.4±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'I')
-      78.1±0.7μs       69.3±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'i')
-         263±3μs        234±0.6μs     0.89  bench_ufunc.UFunc.time_ufunc_types('minimum')
-      79.5±0.3μs       70.6±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'I')
-        78.8±2μs       69.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'i')
-        81.3±1μs       72.1±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'I')
-        89.2±1μs         79.1±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'I')
-        88.4±1μs       78.4±0.8μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'q')
-      89.2±0.5μs       79.1±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'Q')
-      81.2±0.8μs       71.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'i')
-        84.0±1μs       74.4±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'i')
-      78.0±0.7μs       69.1±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'I')
-      80.5±0.6μs       71.3±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'I')
-     3.28±0.01μs      2.90±0.02μs     0.89  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'int64')
-      81.4±0.5μs       72.1±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'I')
-        83.1±1μs       73.6±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'L')
-        82.8±1μs         73.3±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'i')
-        80.9±2μs       71.6±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'I')
-      84.8±0.8μs         75.0±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'I')
-      78.1±0.4μs       69.1±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'i')
-      82.2±0.5μs       72.7±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'q')
-        87.8±1μs       77.7±0.8μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'I')
-      88.7±0.9μs         78.4±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'L')
-     3.29±0.02μs      2.91±0.04μs     0.88  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'complex64')
-      80.6±0.5μs       71.3±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'i')
-      57.8±0.6μs       51.1±0.8μs     0.88  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd')
-      89.0±0.8μs       78.6±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'I')
-      82.6±0.5μs       73.0±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'i')
-      81.1±0.5μs       71.7±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'I')
-      88.7±0.6μs       78.3±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'I')
-      81.4±0.8μs         71.8±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'I')
-      80.2±0.6μs       70.8±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'i')
-      81.0±0.9μs       71.5±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'I')
-         325±9μs          287±5μs     0.88  bench_ufunc.UFunc.time_ufunc_types('add')
-     3.26±0.03μs      2.87±0.02μs     0.88  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'float32')
-      57.7±0.9μs       50.9±0.8μs     0.88  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd')
-        82.3±1μs         72.5±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'I')
-      83.5±0.7μs         73.6±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'Q')
-      89.0±0.6μs         78.5±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'i')
-        89.7±1μs         79.0±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'i')
-      89.0±0.8μs         78.5±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'Q')
-      82.4±0.7μs       72.6±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'q')
-      81.7±0.3μs       72.0±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'i')
-      81.6±0.6μs       71.8±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'I')
-      85.5±0.6μs       75.2±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'i')
-      82.3±0.6μs       72.4±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'i')
-      81.6±0.7μs       71.7±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'I')
-        81.1±1μs       71.3±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'I')
-      85.3±0.5μs       75.0±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'I')
-      81.0±0.3μs       71.1±0.8μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'i')
-        89.3±2μs       78.4±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'Q')
-      82.6±0.3μs       72.5±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'I')
-        81.8±1μs         71.8±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'I')
-      89.2±0.5μs         78.3±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'I')
-        81.5±1μs       71.5±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'i')
-        83.5±1μs         73.3±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'Q')
-      86.1±0.4μs       75.5±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'i')
-        81.2±1μs       71.2±0.8μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'i')
-        82.5±1μs       72.3±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'l')
-        86.2±1μs       75.5±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'I')
-        80.0±2μs       70.1±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'i')
-      83.1±0.8μs       72.9±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'I')
-      83.7±0.3μs       73.4±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'L')
-      83.2±0.5μs       72.9±0.8μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'Q')
-      81.3±0.4μs       71.2±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'I')
-        84.3±1μs         73.8±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'l')
-        83.0±1μs       72.7±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'i')
-        81.9±1μs         71.7±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'i')
-      87.8±0.7μs       76.9±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'l')
-         109±1μs         95.8±3μs     0.88  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'd')
-        89.4±1μs       78.2±0.8μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'L')
-        83.9±1μs       73.3±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'L')
-        90.1±1μs         78.7±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'l')
-         110±3μs         96.5±3μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd')
-        83.2±1μs       72.7±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'I')
-      85.9±0.8μs         75.0±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'I')
-      86.1±0.7μs         75.2±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'i')
-      82.5±0.6μs       72.1±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'i')
-        81.4±1μs       71.1±0.7μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'i')
-        85.2±1μs       74.3±0.2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'i')
-      83.1±0.8μs       72.5±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'L')
-      89.5±0.7μs       78.1±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'i')
-     1.41±0.05ms      1.23±0.03ms     0.87  bench_lib.Pad.time_pad((1024, 1024), 8, 'mean')
-     1.48±0.01μs      1.29±0.01μs     0.87  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float16')
-        83.0±1μs       72.4±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'L')
-      80.8±0.9μs       70.4±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'I')
-         487±2μs          425±2μs     0.87  bench_function_base.Sort.time_sort('heap', 'float64', ('ordered',))
-        90.1±3μs         78.5±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'Q')
-      86.6±0.5μs       75.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'i')
-        89.5±1μs         77.9±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'q')
-      83.2±0.7μs       72.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'Q')
-        89.0±1μs         77.4±2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'Q')
-      85.1±0.6μs       74.0±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'I')
-         150±3μs          131±4μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'd')
-        83.0±2μs       72.2±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'I')
-        83.6±1μs       72.7±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'i')
-      83.2±0.4μs       72.3±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'q')
-     11.7±0.05μs       10.2±0.1μs     0.87  bench_reduce.MinMax.time_max(<class 'numpy.float64'>)
-      83.4±0.4μs       72.4±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'l')
-      88.3±0.6μs       76.7±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'L')
-      84.3±0.5μs       73.2±0.7μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'I')
-        77.3±3μs         67.0±1μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'd')
-        90.4±2μs       78.5±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'i')
-      89.3±0.3μs       77.4±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'Q')
-      81.7±0.7μs       70.7±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'i')
-        91.4±2μs         79.1±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'L')
-        81.8±1μs       70.8±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'i')
-        83.9±2μs       72.6±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'i')
-        82.2±1μs       71.1±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'i')
-     1.24±0.03ms      1.07±0.06ms     0.86  bench_core.Temporaries.time_large2
-        321±10μs          277±2μs     0.86  bench_ufunc.UFunc.time_ufunc_types('subtract')
-        90.8±2μs         78.4±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'q')
-        90.9±1μs       78.5±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'L')
-         415±5μs          358±4μs     0.86  bench_ufunc.UFunc.time_ufunc_types('rint')
-         475±3μs         410±10μs     0.86  bench_lib.Nan.time_nanargmin(200000, 0)
-       111±0.7μs         96.0±3μs     0.86  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'd')
-        89.1±1μs       76.7±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'q')
-         522±4μs          449±9μs     0.86  bench_lib.Nan.time_nanargmax(200000, 2.0)
-        84.7±1μs         72.8±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'q')
-         474±5μs          407±6μs     0.86  bench_lib.Nan.time_nanargmax(200000, 0.1)
-        91.3±2μs         78.4±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'l')
-         108±3μs         92.8±2μs     0.86  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'd')
-      84.5±0.6μs         72.4±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'l')
-        86.2±2μs         73.8±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'i')
-        84.4±2μs       72.2±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'L')
-        91.2±1μs       78.0±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'l')
-         385±2μs          329±5μs     0.86  bench_ufunc.UFunc.time_ufunc_types('fmin')
-      94.0±0.5μs       80.4±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'B')
-        90.4±2μs       77.3±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'I')
-         480±2μs         410±10μs     0.85  bench_lib.Nan.time_nanargmax(200000, 0)
-         112±2μs         95.8±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'd')
-      58.5±0.6μs       49.9±0.4μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'd')
-        84.4±1μs       72.0±0.5μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'I')
-     2.51±0.02μs      2.14±0.04μs     0.85  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float16')
-         113±3μs         96.5±2μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'd')
-         150±2μs          128±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd')
-      60.5±0.3μs       51.5±0.7μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 1, 'd')
-     11.8±0.03μs      10.1±0.08μs     0.85  bench_reduce.MinMax.time_min(<class 'numpy.float64'>)
-      1.51±0.08s       1.29±0.02s     0.85  bench_io.Savez.time_vb_savez_squares
-      93.7±0.4μs       79.6±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'B')
-        94.5±1μs       80.3±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'B')
-      94.3±0.3μs         80.0±1μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'B')
-         479±5μs          405±4μs     0.85  bench_lib.Nan.time_nanargmin(200000, 0.1)
-         522±2μs          442±3μs     0.85  bench_lib.Nan.time_nanargmin(200000, 2.0)
-         254±2μs          214±2μs     0.84  bench_ufunc.UFunc.time_ufunc_types('floor')
-      93.5±0.4μs       79.0±0.2μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'B')
-      93.5±0.2μs         79.0±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'B')
-      94.5±0.8μs       79.8±0.9μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'B')
-     2.50±0.02μs      2.11±0.01μs     0.84  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int16')
-      94.0±0.6μs       79.0±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'B')
-      94.1±0.2μs       79.0±0.3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'B')
-      95.2±0.4μs         79.9±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'B')
-      94.8±0.7μs       79.6±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'B')
-         110±3μs         91.9±3μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'd')
-      93.8±0.4μs       78.5±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'B')
-      49.8±0.9μs         41.6±1μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd')
-         388±4μs          324±2μs     0.84  bench_ufunc.UFunc.time_ufunc_types('fmax')
-      94.6±0.9μs         79.1±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'B')
-      93.4±0.2μs       78.1±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'B')
-      93.3±0.5μs       78.0±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'B')
-      92.5±0.8μs         77.2±3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'B')
-         258±2μs          215±1μs     0.84  bench_ufunc.UFunc.time_ufunc_types('ceil')
-         151±3μs          126±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd')
-         527±5μs        439±0.7μs     0.83  bench_function_base.Sort.time_sort('heap', 'float32', ('ordered',))
-      93.4±0.4μs       77.8±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'B')
-         258±8μs          215±4μs     0.83  bench_ufunc.UFunc.time_ufunc_types('trunc')
-        95.3±1μs         79.3±1μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'B')
-      58.7±0.9μs       48.8±0.7μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'd')
-      93.8±0.2μs       78.0±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'B')
-      93.7±0.4μs       78.0±0.7μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'B')
-      94.2±0.9μs       78.3±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'B')
-      93.9±0.9μs       78.1±0.7μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'B')
-      94.3±0.3μs       78.4±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'B')
-      95.0±0.7μs       79.0±0.8μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'B')
-      94.6±0.8μs       78.6±0.8μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'B')
-      93.6±0.6μs         77.8±1μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'B')
-     5.28±0.04μs      4.38±0.04μs     0.83  bench_lib.Nan.time_nanmin(200, 0.1)
-         153±2μs        127±0.6μs     0.83  bench_function_base.Sort.time_argsort('quick', 'uint32', ('reversed',))
-      93.8±0.2μs       77.9±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'B')
-      94.9±0.5μs       78.7±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'B')
-      92.8±0.9μs       76.9±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'B')
-      93.0±0.3μs       77.1±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'B')
-      93.4±0.8μs       77.3±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'B')
-      93.2±0.3μs       77.0±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'B')
-      93.9±0.4μs       77.3±0.6μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'B')
-      94.9±0.3μs       78.0±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'B')
-     5.24±0.03μs      4.31±0.04μs     0.82  bench_lib.Nan.time_nanmin(200, 2.0)
-      94.5±0.3μs       77.7±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'B')
-        93.2±2μs       76.6±0.7μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'B')
-        92.8±1μs       76.2±0.8μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'B')
-      95.3±0.4μs       78.3±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'B')
-      92.9±0.5μs       76.2±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'B')
-      93.5±0.4μs       76.7±0.9μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'B')
-         150±2μs          123±2μs     0.82  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd')
-     1.24±0.02ms      1.02±0.04ms     0.82  bench_core.Temporaries.time_large
-      92.9±0.5μs       76.1±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'B')
-      94.0±0.4μs       76.9±0.3μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'B')
-      94.0±0.1μs       76.9±0.9μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'B')
-      92.6±0.2μs       75.6±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'B')
-     5.31±0.02μs      4.33±0.01μs     0.82  bench_lib.Nan.time_nanmin(200, 0)
-      93.4±0.3μs       76.2±0.3μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'B')
-      50.9±0.8μs       41.5±0.7μs     0.81  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd')
-      94.5±0.4μs       76.9±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'B')
-        93.9±1μs       76.2±0.5μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'B')
-      94.6±0.9μs       76.6±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'B')
-      94.5±0.6μs         76.3±6μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'B')
-        94.5±2μs       76.2±0.5μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'B')
-      93.9±0.8μs       75.7±0.8μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'B')
-      5.42±0.1μs      4.36±0.03μs     0.80  bench_lib.Nan.time_nanmax(200, 0.1)
-      51.8±0.3μs       41.7±0.2μs     0.80  bench_core.VarComplex.time_var(10000)
-       122±0.3ms         97.9±1ms     0.80  bench_app.LaplaceInplace.time_it('inplace')
-        94.3±2μs       75.6±0.7μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'B')
-        51.6±1μs       41.3±0.9μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'd')
-      17.3±0.1μs      13.9±0.08μs     0.80  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'D')
-     5.40±0.04μs      4.31±0.06μs     0.80  bench_lib.Nan.time_nanmin(200, 90.0)
-     5.40±0.09μs      4.32±0.04μs     0.80  bench_lib.Nan.time_nanmax(200, 2.0)
-      5.47±0.1μs      4.32±0.04μs     0.79  bench_lib.Nan.time_nanmax(200, 0)
-        52.1±1μs       41.1±0.8μs     0.79  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'd')
-      5.54±0.1μs      4.33±0.02μs     0.78  bench_lib.Nan.time_nanmax(200, 90.0)
-         164±4μs          127±2μs     0.78  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'd')
-         119±1μs         91.4±3μs     0.77  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'd')
-         372±2μs          282±3μs     0.76  bench_core.VarComplex.time_var(100000)
-        21.6±5μs      16.3±0.05μs     0.75  bench_scalar.ScalarMath.time_power_of_two('complex64')
-        66.9±1μs       50.1±0.4μs     0.75  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd')
-      43.4±0.5μs       32.4±0.3μs     0.75  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'd')
-         129±2μs         95.8±2μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'd')
-      43.3±0.8μs       32.2±0.4μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd')
-        43.6±1μs       32.4±0.2μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd')
-        90.6±2μs         67.0±4μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'd')
-      43.0±0.3μs       31.8±0.3μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'd')
-         136±3μs         99.0±3μs     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'd')
-      5.98±0.1μs      4.35±0.02μs     0.73  bench_lib.Nan.time_nanmax(200, 50.0)
-      19.0±0.1μs       13.7±0.2μs     0.72  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'D')
-     6.00±0.08μs      4.31±0.03μs     0.72  bench_lib.Nan.time_nanmin(200, 50.0)
-      14.5±0.1μs       10.4±0.2μs     0.72  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'D')
-     11.3±0.05μs      8.02±0.04μs     0.71  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float64'>)
-         180±2μs          127±2μs     0.70  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'd')
-        542±10μs          380±2μs     0.70  bench_reduce.AddReduceSeparate.time_reduce(0, 'float64')
-     1.24±0.01ms         848±20μs     0.69  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex128')
-      59.0±0.4μs       40.2±0.6μs     0.68  bench_core.Temporaries.time_mid2
-      15.0±0.3μs       10.2±0.2μs     0.68  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'D')
-      59.2±0.3μs       39.9±0.4μs     0.67  bench_core.Temporaries.time_mid
-      64.7±0.7μs       43.4±0.4μs     0.67  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'd')
-      63.8±0.8μs       42.8±0.4μs     0.67  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'd')
-      64.9±0.6μs       43.4±0.3μs     0.67  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'd')
-        82.4±1μs       54.8±0.4μs     0.66  bench_ufunc.CustomInplace.time_double_add_temp
-     13.0±0.08μs      8.57±0.07μs     0.66  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'D')
-      80.1±0.9μs       52.2±0.3μs     0.65  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'd')
-      66.1±0.5μs       43.0±0.3μs     0.65  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'd')
-      63.0±0.7μs         40.5±1μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'd')
-      41.0±0.6μs       26.3±0.2μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'd')
-      40.8±0.5μs       26.0±0.2μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'd')
-      40.9±0.5μs       26.0±0.2μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'd')
-      68.1±0.7μs       43.3±0.2μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'd')
-      40.8±0.5μs      25.7±0.05μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'd')
-      75.2±0.9μs       47.0±0.3μs     0.63  bench_ufunc.CustomInplace.time_double_add
-      13.2±0.2μs      7.70±0.06μs     0.58  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'D')
-     11.6±0.04μs      6.69±0.06μs     0.58  bench_reduce.MinMax.time_min(<class 'numpy.float32'>)
-     11.8±0.08μs      6.63±0.05μs     0.56  bench_reduce.MinMax.time_max(<class 'numpy.float32'>)
-      80.6±0.7μs         43.6±1μs     0.54  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'd')
-     2.00±0.03ms         1.08±0ms     0.54  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex64')
-         160±1μs       82.4±0.4μs     0.52  bench_reduce.ArgMax.time_argmax(<class 'numpy.float64'>)
-         159±1μs       81.4±0.5μs     0.51  bench_reduce.ArgMin.time_argmin(<class 'numpy.float64'>)
-      66.8±0.4μs       34.0±0.2μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 43)
-      66.7±0.3μs       33.9±0.2μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -43)
-      85.7±0.7μs       42.6±0.1μs     0.50  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd')
-     20.7±0.07μs       10.2±0.1μs     0.49  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'F')
-      64.5±0.8μs       31.6±0.2μs     0.49  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd')
-      71.8±0.4μs      34.4±0.04μs     0.48  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -8)
-      72.1±0.4μs       34.4±0.1μs     0.48  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 8)
-         430±3μs          199±2μs     0.46  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 4, 'd')
-      78.7±0.8μs       36.0±0.6μs     0.46  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'Q')
-        78.7±1μs       35.9±0.7μs     0.46  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'l')
-      78.3±0.4μs       35.6±0.4μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'l')
-      78.2±0.6μs       35.3±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'L')
-      78.9±0.5μs       35.6±0.6μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'q')
-        78.6±1μs       35.5±0.3μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'Q')
-      79.1±0.3μs       35.6±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'L')
-      79.1±0.4μs       35.5±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'q')
-      12.7±0.2μs      5.65±0.04μs     0.45  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (0))
-      12.7±0.2μs      5.59±0.02μs     0.44  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (1))
-         438±4μs          192±3μs     0.44  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 4, 'd')
-      20.7±0.1μs      9.06±0.04μs     0.44  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'F')
-     20.8±0.09μs      9.02±0.03μs     0.43  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'F')
-      12.9±0.1μs      5.58±0.02μs     0.43  bench_reduce.MinMax.time_min(<class 'numpy.uint64'>)
-      12.8±0.2μs      5.53±0.03μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (0))
-      12.8±0.2μs      5.51±0.04μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (1))
-      21.0±0.2μs       9.00±0.2μs     0.43  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'F')
-      60.9±0.5μs       25.8±0.1μs     0.42  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'd')
-     13.1±0.04μs      5.55±0.02μs     0.42  bench_reduce.MinMax.time_max(<class 'numpy.uint64'>)
-      36.2±0.3μs       14.4±0.3μs     0.40  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 4, 'F')
-         426±3μs          166±8μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 2, 'd')
-      80.0±0.9μs       31.0±0.2μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'd')
-       424±0.8μs          164±2μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 4, 'd')
-         426±3μs         164±10μs     0.38  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 4, 'd')
-         436±7μs         167±10μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 4, 'd')
-      2.75±0.1ms      1.05±0.03ms     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd')
-         424±2μs          161±4μs     0.38  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 4, 'd')
-     2.70±0.01ms      1.02±0.02ms     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd')
-         435±2μs          164±6μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 4, 'd')
-      12.2±0.4μs      4.62±0.06μs     0.38  bench_reduce.MinMax.time_min(<class 'numpy.uint32'>)
-      12.3±0.2μs      4.65±0.03μs     0.38  bench_reduce.MinMax.time_max(<class 'numpy.int32'>)
-         435±4μs          164±6μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 2, 'd')
-        2.72±0ms         1.01±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd')
-      12.3±0.2μs      4.58±0.01μs     0.37  bench_reduce.MinMax.time_max(<class 'numpy.uint32'>)
-         435±4μs          162±6μs     0.37  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 4, 'd')
-     2.75±0.03ms         1.02±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd')
-      12.4±0.4μs      4.57±0.03μs     0.37  bench_reduce.MinMax.time_min(<class 'numpy.int32'>)
-     2.77±0.03ms      1.02±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd')
-     2.76±0.03ms         1.02±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd')
-     2.73±0.01ms      1.00±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd')
-      36.6±0.2μs      13.4±0.03μs     0.37  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'F')
-     2.86±0.06ms      1.05±0.03ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd')
-        523±10μs          192±4μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 1, 4, 'f')
-         505±1μs          185±2μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 1, 2, 'f')
-         425±1μs          155±7μs     0.37  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 1, 'd')
-         507±7μs          185±3μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 4, 'f')
-         504±1μs        184±0.9μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 1, 'f')
-         506±1μs        184±0.9μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 1, 1, 'f')
-     2.74±0.04ms         997±10μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd')
-         507±3μs          183±1μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 1, 'f')
-         514±4μs          185±1μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 2, 'f')
-         426±7μs          153±3μs     0.36  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 4, 'd')
-         513±5μs        184±0.8μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 2, 'f')
-         510±5μs          182±1μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 4, 'f')
-        440±10μs          156±6μs     0.35  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 4, 'd')
-        449±10μs          159±7μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 4, 'd')
-         436±8μs          152±6μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 1, 'd')
-         434±7μs          150±4μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 4, 'd')
-      36.2±0.3μs       12.5±0.2μs     0.34  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'F')
-     21.0±0.08μs      7.10±0.02μs     0.34  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F')
-     8.90±0.04μs      3.00±0.03μs     0.34  bench_reduce.MinMax.time_min(<class 'numpy.uint8'>)
-         434±1μs          146±3μs     0.34  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 2, 'd')
-     8.92±0.06μs      2.96±0.03μs     0.33  bench_reduce.MinMax.time_max(<class 'numpy.uint8'>)
-         432±2μs          143±3μs     0.33  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 2, 'd')
-      74.3±0.6μs       24.6±0.7μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'I')
-         421±2μs          139±4μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 2, 'd')
-       157±0.5μs       51.7±0.4μs     0.33  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint64'>)
-      74.9±0.7μs       24.6±0.5μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'i')
-         423±3μs          138±2μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 2, 'd')
-      74.7±0.6μs       24.4±0.5μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'i')
-        440±10μs          143±2μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 4, 'd')
-      74.4±0.5μs       24.2±0.5μs     0.32  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'I')
-         423±3μs          137±6μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 2, 'd')
-       158±0.6μs       51.4±0.4μs     0.32  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint64'>)
-         423±3μs          137±4μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 2, 'd')
-       159±0.6μs       51.3±0.4μs     0.32  bench_reduce.ArgMax.time_argmax(<class 'numpy.int64'>)
-         159±2μs       51.3±0.2μs     0.32  bench_reduce.ArgMin.time_argmin(<class 'numpy.int64'>)
-         425±2μs          136±7μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 1, 'd')
-         422±1μs          135±9μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 1, 'd')
-        451±10μs          143±4μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 4, 'd')
-      79.7±0.3μs       25.1±0.2μs     0.31  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'd')
-         430±3μs          135±4μs     0.31  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 2, 'd')
-         437±3μs          137±7μs     0.31  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 1, 'd')
-        439±20μs          137±6μs     0.31  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 4, 'd')
-        448±10μs          138±3μs     0.31  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 4, 'd')
-      20.7±0.2μs      6.39±0.07μs     0.31  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'F')
-       102±0.3μs       31.2±0.2μs     0.31  bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float32'>)
-       102±0.4μs      31.1±0.09μs     0.30  bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
-         431±2μs          129±5μs     0.30  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 2, 'd')
-        525±10μs          155±4μs     0.30  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 4, 'f')
-         507±3μs          150±1μs     0.30  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'f')
-         506±4μs        149±0.7μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'f')
-         432±4μs          126±5μs     0.29  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 1, 'd')
-       158±0.8μs       46.1±0.4μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint32'>)
-         159±1μs       46.2±0.6μs     0.29  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint32'>)
-         506±2μs          147±1μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'f')
-         511±6μs          149±1μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'f')
-         505±3μs        147±0.7μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'f')
-         510±4μs        147±0.6μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'f')
-       159±0.8μs       45.8±0.7μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.int32'>)
-         508±2μs        147±0.4μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f')
-         518±9μs          149±3μs     0.29  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'f')
-         433±1μs         124±10μs     0.29  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 1, 'd')
-         433±1μs          123±3μs     0.29  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 1, 'd')
-         161±2μs       45.7±0.8μs     0.28  bench_reduce.ArgMin.time_argmin(<class 'numpy.int32'>)
-         428±5μs          119±8μs     0.28  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 1, 'd')
-         419±2μs          115±3μs     0.27  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 1, 'd')
-         187±2μs       51.0±0.5μs     0.27  bench_reduce.ArgMin.time_argmin(<class 'numpy.float32'>)
-         191±2μs       50.8±0.2μs     0.27  bench_reduce.ArgMax.time_argmax(<class 'numpy.float32'>)
-     1.23±0.01ms          321±2μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 4, 4, 'f')
-     1.23±0.01ms          318±2μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 4, 2, 'f')
-     13.7±0.04μs      3.54±0.06μs     0.26  bench_reduce.MinMax.time_max(<class 'numpy.uint16'>)
-        1.23±0ms          317±3μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 2, 4, 'f')
-      13.8±0.1μs      3.50±0.02μs     0.25  bench_reduce.MinMax.time_min(<class 'numpy.uint16'>)
-        437±10μs          111±2μs     0.25  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 4, 'd')
-         432±3μs          110±1μs     0.25  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 2, 'd')
-         428±4μs          109±2μs     0.25  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 2, 'd')
-     1.24±0.01ms          315±2μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 2, 2, 'f')
-     1.25±0.01ms          316±2μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 4, 4, 'f')
-     1.25±0.01ms          313±4μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 2, 4, 'f')
-     1.25±0.01ms          311±5μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 4, 2, 'f')
-     1.23±0.02ms        306±0.8μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 2, 1, 'f')
-     1.23±0.01ms        306±0.8μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 4, 1, 'f')
-        450±20μs          110±2μs     0.24  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 4, 'd')
-     1.28±0.02ms          311±5μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 2, 2, 'f')
-     1.24±0.01ms        302±0.8μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 2, 1, 'f')
-        1.25±0ms        302±0.9μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 4, 1, 'f')
-         197±3μs         47.3±1μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'f')
-         195±2μs         46.7±1μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'f')
-     1.25±0.02ms          300±3μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 1, 4, 'f')
-     12.7±0.04μs      3.03±0.01μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int8'>)
-         197±2μs       46.9±0.7μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f')
-     1.23±0.01ms          293±1μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 1, 2, 'f')
-         199±2μs       47.2±0.6μs     0.24  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f')
-     15.1±0.08μs      3.55±0.03μs     0.24  bench_reduce.MinMax.time_max(<class 'numpy.int16'>)
-     15.1±0.04μs      3.55±0.07μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int16'>)
-     12.7±0.07μs      2.95±0.02μs     0.23  bench_reduce.MinMax.time_max(<class 'numpy.int8'>)
-         198±2μs       45.8±0.4μs     0.23  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'f')
-     1.27±0.01ms          293±5μs     0.23  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 1, 4, 'f')
-         476±6μs          109±5μs     0.23  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 4, 'f')
-         475±4μs          109±8μs     0.23  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 4, 'f')
-     1.24±0.01ms          283±1μs     0.23  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'cos'>, 1, 1, 'f')
-     1.26±0.01ms          288±3μs     0.23  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 1, 2, 'f')
-     1.26±0.01ms          279±2μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sin'>, 1, 1, 'f')
-         197±3μs         42.5±2μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f')
-         195±1μs       41.9±0.4μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'f')
-         882±3μs          187±3μs     0.21  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'd')
-         216±3μs       45.7±0.4μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'f')
-         896±3μs          190±5μs     0.21  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'd')
-         195±2μs         41.2±1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'f')
-         196±3μs         41.0±1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f')
-         196±1μs       40.8±0.9μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'f')
-         197±1μs       39.3±0.2μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f')
-         195±1μs       38.7±0.1μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'f')
-         196±1μs       38.9±0.4μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'f')
-       196±0.9μs       38.8±0.5μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'f')
-         196±1μs         38.2±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'f')
-         196±1μs       38.2±0.4μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'f')
-         196±2μs         37.9±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'f')
-       214±0.4μs         41.2±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'f')
-         200±3μs         38.3±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f')
-       195±0.9μs         37.1±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'f')
-         196±1μs         37.2±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'f')
-         197±2μs         37.3±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'f')
-         201±3μs       37.5±0.4μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'f')
-         200±3μs         37.4±1μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'f')
-         201±3μs       37.2±0.5μs     0.19  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'f')
-         201±3μs         36.9±1μs     0.18  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'f')
-         882±4μs        157±0.6μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'd')
-         880±4μs          156±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'd')
-         215±1μs         38.1±2μs     0.18  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'f')
-         898±3μs          158±3μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'd')
-         896±5μs          156±4μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'd')
-         216±1μs       37.5±0.4μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'f')
-         423±5μs       72.5±0.5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 2, 'd')
-         646±2μs          111±6μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'f')
-        896±10μs          153±4μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'd')
-         429±1μs       73.3±0.4μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 2, 'd')
-         901±4μs          154±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'd')
-         640±8μs          109±7μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'f')
-         421±6μs       70.7±0.5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 1, 'd')
-         879±1μs          147±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'd')
-       196±0.7μs       32.7±0.4μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'f')
-         220±2μs         36.6±1μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'f')
-       195±0.6μs       32.5±0.3μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'f')
-         195±1μs       32.4±0.3μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'f')
-         430±1μs       71.4±0.3μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 1, 'd')
-         196±1μs       32.4±0.3μs     0.17  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'f')
-     24.4±0.07μs      4.03±0.01μs     0.17  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 43)
-        884±10μs          146±7μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'd')
-         892±2μs          147±8μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'd')
-         197±3μs       32.3±0.6μs     0.16  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'f')
-         428±5μs         69.6±2μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 1, 'd')
-         883±4μs          144±5μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'd')
-         892±4μs          145±5μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'd')
-      25.0±0.2μs      4.04±0.02μs     0.16  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 8)
-         422±4μs       68.2±0.4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 2, 'd')
-        913±20μs          147±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'd')
-         433±5μs         69.5±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 1, 'd')
-        82.3±1μs       13.1±0.2μs     0.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'h')
-         874±3μs          139±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'd')
-         878±5μs          140±5μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'd')
-         433±1μs       68.5±0.3μs     0.16  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 2, 'd')
-        897±10μs          140±3μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'd')
-         894±2μs          138±6μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'd')
-      83.3±0.3μs      12.8±0.09μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'h')
-        933±20μs          142±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'd')
-         895±3μs          135±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'd')
-         423±4μs         63.8±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 1, 'd')
-       428±0.8μs       63.9±0.5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 1, 'd')
-      86.4±0.3μs      12.9±0.05μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'H')
-      87.7±0.5μs      13.0±0.05μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'H')
-         873±2μs          129±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'd')
-         215±1μs       31.8±0.5μs     0.15  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'f')
-         475±8μs         68.9±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 2, 'f')
-         473±1μs       68.6±0.5μs     0.14  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 4, 'f')
-         898±3μs          130±3μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'd')
-         883±7μs        128±0.7μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'd')
-         478±5μs         69.0±1μs     0.14  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 4, 'f')
-         470±1μs         67.7±1μs     0.14  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 4, 'f')
-         895±6μs          129±3μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'd')
-         472±2μs       67.7±0.9μs     0.14  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 2, 'f')
-         477±3μs         68.0±1μs     0.14  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 4, 'f')
-        913±30μs          128±2μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'd')
-         898±8μs          123±5μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'd')
-         884±6μs          121±2μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'd')
-         879±4μs          119±6μs     0.14  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'd')
-         422±1μs         57.4±2μs     0.14  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 2, 'd')
-        930±20μs          125±4μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'd')
-        921±30μs          122±4μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'd')
-        934±40μs          124±5μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'd')
-         433±5μs       57.1±0.8μs     0.13  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 2, 'd')
-         893±2μs          117±4μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'd')
-         471±2μs         61.1±1μs     0.13  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 2, 'f')
-         472±4μs       61.1±0.4μs     0.13  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 2, 'f')
-         470±3μs       60.5±0.5μs     0.13  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 4, 'f')
-        476±10μs       61.4±0.6μs     0.13  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 4, 'f')
-         473±4μs       60.7±0.1μs     0.13  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 2, 'f')
-         473±2μs       60.5±0.4μs     0.13  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 2, 'f')
-       197±0.4μs       24.6±0.5μs     0.13  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f')
-         197±1μs       24.5±0.4μs     0.12  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'f')
-       196±0.4μs       24.3±0.5μs     0.12  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'f')
-         467±2μs       57.8±0.7μs     0.12  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 4, 'f')
-       197±0.5μs       24.3±0.8μs     0.12  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'f')
-         196±1μs       24.1±0.5μs     0.12  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'f')
-      40.2±0.3μs      4.90±0.08μs     0.12  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float32'>)
-         469±2μs       57.1±0.6μs     0.12  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 2, 'f')
-         476±4μs       57.9±0.7μs     0.12  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 4, 'f')
-         893±6μs          108±4μs     0.12  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'd')
-         476±6μs       56.8±0.4μs     0.12  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 2, 'f')
-         478±3μs         56.7±1μs     0.12  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 4, 'f')
-         881±9μs          104±4μs     0.12  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'd')
-         474±1μs         56.2±1μs     0.12  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 4, 'f')
-       214±0.2μs       24.9±0.6μs     0.12  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'f')
-         885±2μs          100±2μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'd')
-         891±1μs          100±3μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'd')
-         890±1μs         99.5±6μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'd')
-       196±0.6μs       21.7±0.3μs     0.11  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'f')
-         639±2μs         70.4±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'f')
-        911±30μs          100±2μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'd')
-         472±3μs       51.8±0.5μs     0.11  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 1, 'f')
-         642±3μs         70.0±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'f')
-         196±2μs       21.4±0.3μs     0.11  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'f')
-         641±4μs         69.9±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'f')
-         884±2μs         96.3±3μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'd')
-        931±30μs          101±2μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'd')
-         196±1μs       21.3±0.4μs     0.11  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'f')
-       195±0.8μs       21.2±0.3μs     0.11  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'f')
-       196±0.7μs       21.2±0.3μs     0.11  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'f')
-         642±3μs         69.2±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'f')
-         640±5μs         68.8±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'f')
-         474±5μs       50.9±0.7μs     0.11  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 4, 'f')
-         474±5μs       50.9±0.5μs     0.11  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 1, 'f')
-         478±9μs       51.0±0.8μs     0.11  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 4, 'f')
-         642±4μs         68.4±1μs     0.11  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'f')
-         470±1μs       50.0±0.4μs     0.11  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 4, 'f')
-        472±10μs       50.1±0.7μs     0.11  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 4, 'f')
-      68.6±0.6μs       7.16±0.2μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'b')
-        68.5±3μs       7.05±0.1μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'b')
-         472±2μs       48.0±0.3μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 2, 'f')
-         469±3μs       47.7±0.2μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 1, 'f')
-         471±1μs       47.8±0.4μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 2, 'f')
-         472±3μs       47.9±0.2μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 2, 'f')
-     1.97±0.01ms          197±7μs     0.10  bench_reduce.AddReduceSeparate.time_reduce(0, 'float32')
-         473±2μs       47.4±0.7μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 1, 'f')
-         475±7μs       47.6±0.4μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 2, 'f')
-         158±3μs      15.8±0.09μs     0.10  bench_reduce.ArgMin.time_argmin(<class 'numpy.int16'>)
-       157±0.7μs      15.7±0.08μs     0.10  bench_reduce.ArgMax.time_argmax(<class 'numpy.int16'>)
-         471±4μs       47.1±0.2μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 1, 'f')
-         474±5μs       47.2±0.7μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 1, 'f')
-         419±4μs       41.7±0.3μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 1, 'd')
-         426±2μs       41.8±0.5μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 1, 'd')
-         480±5μs         46.9±1μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 4, 'f')
-         641±2μs         62.3±1μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'f')
-         470±1μs       45.4±0.4μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 1, 'f')
-       215±0.2μs       20.7±0.1μs     0.10  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'f')
-         476±5μs         45.9±1μs     0.10  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 4, 'f')
-         641±8μs       61.5±0.7μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'f')
-         643±3μs         61.6±1μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'f')
-         641±4μs       61.3±0.8μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'f')
-         644±5μs       61.5±0.8μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'f')
-      81.0±0.6μs      7.72±0.02μs     0.10  bench_reduce.ArgMax.time_argmax(<class 'bool'>)
-         473±5μs       45.1±0.3μs     0.10  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 1, 'f')
-         640±3μs       60.9±0.7μs     0.10  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'f')
-         598±7μs       56.7±0.8μs     0.09  bench_ufunc.CustomInplace.time_float_add_temp
-      70.9±0.2μs      6.45±0.05μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -43)
-         468±2μs       42.6±0.1μs     0.09  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 2, 'f')
-      71.0±0.2μs      6.44±0.06μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 43)
-         466±5μs       42.0±0.2μs     0.09  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 2, 'f')
-         642±5μs       57.8±0.5μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'f')
-         640±3μs       57.4±0.8μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'f')
-         644±3μs       57.8±0.4μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'f')
-         643±4μs       57.2±0.8μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'f')
-         472±4μs       41.9±0.4μs     0.09  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 2, 'f')
-         473±7μs       41.8±0.2μs     0.09  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 2, 'f')
-         470±6μs       41.5±0.6μs     0.09  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 1, 'f')
-         468±3μs       41.0±0.2μs     0.09  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 1, 'f')
-         641±4μs       55.6±0.8μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'f')
-         641±2μs       55.5±0.4μs     0.09  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'f')
-      76.1±0.2μs      6.53±0.05μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 8)
-         475±4μs       40.2±0.1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 1, 'f')
-      76.2±0.6μs      6.42±0.05μs     0.08  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -8)
-         473±5μs       39.7±0.3μs     0.08  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 1, 'f')
-         189±2μs       15.8±0.1μs     0.08  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint16'>)
-         190±2μs      15.8±0.07μs     0.08  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint16'>)
-      89.2±0.6μs       7.28±0.2μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'B')
-         469±2μs       38.0±0.3μs     0.08  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 2, 'f')
-         465±2μs       37.6±0.2μs     0.08  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 2, 'f')
-         655±9μs       52.8±0.8μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'f')
-         590±6μs       47.1±0.2μs     0.08  bench_ufunc.CustomInplace.time_float_add
-         654±8μs         52.1±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'f')
-         196±2μs       15.6±0.2μs     0.08  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'f')
-       195±0.5μs      15.6±0.08μs     0.08  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'f')
-       196±0.4μs       15.5±0.2μs     0.08  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'f')
-         196±1μs       15.5±0.2μs     0.08  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'f')
-      89.0±0.5μs       7.04±0.1μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'B')
-         637±2μs       50.3±0.5μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'f')
-         639±3μs       50.4±0.4μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'f')
-         642±6μs       50.6±0.4μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'f')
-         198±2μs       15.5±0.2μs     0.08  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'f')
-         642±1μs       50.2±0.7μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'f')
-         641±1μs       49.8±0.5μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'f')
-       641±0.9μs       49.2±0.6μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'f')
-         638±2μs       48.9±0.6μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'f')
-         640±1μs         48.4±2μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'f')
-         466±2μs       35.2±0.5μs     0.08  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 1, 'f')
-         643±6μs       48.1±0.3μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'f')
-         642±2μs       47.8±0.4μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'f')
-         475±6μs       34.7±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 1, 'f')
-         640±1μs       46.0±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'f')
-         467±3μs       33.5±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 1, 'f')
-         475±5μs       33.9±0.3μs     0.07  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 1, 'f')
-         639±4μs         45.5±2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'f')
-         639±2μs       45.2±0.5μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'f')
-       638±0.8μs       45.1±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'f')
-      41.7±0.4μs      2.94±0.02μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 43)
-       215±0.9μs       15.2±0.2μs     0.07  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'f')
-         638±2μs       45.0±0.3μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'f')
-         886±7μs       61.7±0.7μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'd')
-         637±1μs       44.3±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'f')
-         876±4μs       60.8±0.8μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'd')
-         639±1μs       44.4±0.3μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'f')
-         890±1μs         61.6±2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'd')
-     42.3±0.08μs      2.92±0.01μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 8)
-     4.91±0.03ms          335±5μs     0.07  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f')
-         896±5μs       60.9±0.5μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'd')
-         890±3μs       60.3±0.7μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'd')
-         888±6μs         60.0±1μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'd')
-     4.88±0.02ms          329±1μs     0.07  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f')
-      45.8±0.2μs      3.07±0.02μs     0.07  bench_reduce.FMinMax.time_max(<class 'numpy.float32'>)
-     4.93±0.04ms          330±4μs     0.07  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f')
-     4.93±0.01ms          328±2μs     0.07  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f')
-         638±2μs       42.0±0.7μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'f')
-         639±1μs       42.0±0.3μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'f')
-      46.0±0.4μs      3.01±0.01μs     0.07  bench_reduce.FMinMax.time_min(<class 'numpy.float32'>)
-         639±3μs       41.2±0.5μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'f')
-         640±1μs       41.3±0.3μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'f')
-         639±1μs       41.2±0.4μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'f')
-         639±2μs       40.9±0.3μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'f')
-     4.90±0.02ms        310±0.9μs     0.06  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f')
-         643±9μs       40.6±0.2μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'f')
-     4.92±0.02ms          309±1μs     0.06  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f')
-      65.7±0.5μs      4.12±0.02μs     0.06  bench_reduce.FMinMax.time_min(<class 'numpy.float64'>)
-      65.8±0.4μs      4.08±0.03μs     0.06  bench_reduce.FMinMax.time_max(<class 'numpy.float64'>)
-         642±3μs       39.0±0.2μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'f')
-         638±1μs       38.5±0.2μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'f')
-         158±2μs      9.24±0.07μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint8'>)
-         157±1μs      9.19±0.07μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.int8'>)
-     4.95±0.05ms          288±3μs     0.06  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f')
-       158±0.8μs      9.18±0.02μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.int8'>)
-         158±2μs      9.18±0.05μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint8'>)
-         890±6μs         51.3±1μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'd')
-         877±6μs         50.1±1μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'd')
-     4.92±0.03ms          281±2μs     0.06  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f')
-         465±4μs       26.6±0.5μs     0.06  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 1, 'f')
-         471±4μs       26.6±0.4μs     0.06  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 1, 'f')
-         874±3μs       49.2±0.3μs     0.06  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'd')
-         889±4μs       48.5±0.6μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'd')
-         877±5μs       47.8±0.4μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'd')
-     4.90±0.02ms          263±1μs     0.05  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f')
-      70.8±0.5μs      3.72±0.04μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 43)
-      41.6±0.3μs         2.19±0μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 8)
-         899±7μs       47.0±0.3μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'd')
-      41.7±0.1μs      2.18±0.01μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 43)
-      71.1±0.4μs      3.71±0.02μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -43)
-         640±2μs       33.2±0.4μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'f')
-         640±2μs       32.5±0.3μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'f')
-         643±5μs       32.5±0.4μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'f')
-         637±2μs       31.9±0.4μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'f')
-      74.5±0.3μs      3.73±0.03μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -8)
-        74.3±1μs      3.70±0.01μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 8)
-         674±3μs       30.0±0.2μs     0.04  bench_lib.Nan.time_nanmax(200000, 0.1)
-         669±9μs       29.7±0.2μs     0.04  bench_lib.Nan.time_nanmin(200000, 0.1)
-         674±4μs       29.9±0.5μs     0.04  bench_lib.Nan.time_nanmax(200000, 2.0)
-         675±6μs       29.7±0.5μs     0.04  bench_lib.Nan.time_nanmin(200000, 0)
-         674±8μs       29.6±0.2μs     0.04  bench_lib.Nan.time_nanmin(200000, 2.0)
-         681±8μs       29.7±0.4μs     0.04  bench_lib.Nan.time_nanmax(200000, 0)
-         639±2μs       24.5±0.7μs     0.04  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'f')
-         640±2μs       24.3±0.7μs     0.04  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'f')
-         888±3μs       31.9±0.3μs     0.04  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'd')
-         878±5μs       31.5±0.6μs     0.04  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'd')
-      71.1±0.3μs      2.37±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -43)
-      72.1±0.2μs      2.35±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 43)
-      75.0±0.5μs      2.34±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -8)
-      75.6±0.6μs      2.36±0.02μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 8)
-     1.10±0.01ms       30.1±0.2μs     0.03  bench_lib.Nan.time_nanmax(200000, 90.0)
-     1.10±0.01ms       29.7±0.2μs     0.03  bench_lib.Nan.time_nanmin(200000, 90.0)
-     1.53±0.01ms       29.8±0.4μs     0.02  bench_lib.Nan.time_nanmax(200000, 50.0)
-     1.55±0.01ms       29.6±0.2μs     0.02  bench_lib.Nan.time_nanmin(200000, 50.0)
VXE
export NPY_DISABLE_CPU_FEATURES="VXE2"
python runtests.py --bench-compare parent/main
    before           after         ratio
     [982fcd38]       [47d54c6d]
     <zsystem_sup~5>       <zsystem_sup>
+      7.37±0.1ms      10.1±0.05ms     1.37  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 1000000)
+      74.3±0.2μs          101±1μs     1.36  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 10000)
+     3.28±0.05ms       4.43±0.1ms     1.35  bench_reduce.AddReduceSeparate.time_reduce(1, 'float16')
+      83.4±0.9μs          102±1μs     1.22  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'h')
+      9.46±0.2ms       11.4±0.2ms     1.21  bench_reduce.AddReduce.time_axis_1
+       127±0.6μs          153±3μs     1.20  bench_function_base.Sort.time_argsort('quick', 'int32', ('reversed',))
+        85.1±1μs        102±0.4μs     1.20  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'h')
+      83.6±0.9μs       99.9±0.9μs     1.20  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'h')
+      84.4±0.3μs        101±0.3μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'h')
+        84.1±1μs       99.9±0.4μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'h')
+      84.5±0.6μs      100.0±0.6μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'h')
+        84.1±1μs       99.4±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'h')
+        84.7±3μs        100±0.4μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'b')
+        84.6±1μs        100.0±1μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'h')
+        560±20ms         661±20ms     1.18  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'wrap')
+        83.5±1μs      98.6±0.09μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'h')
+        84.9±1μs        100±0.4μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'h')
+        85.2±1μs        100±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'h')
+        85.5±2μs        101±0.5μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'h')
+        85.6±1μs        101±0.6μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'h')
+      84.4±0.2μs       99.3±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'h')
+      85.0±0.3μs       99.6±0.6μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'h')
+      84.9±0.7μs       99.6±0.5μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'b')
+      85.9±0.7μs        101±0.5μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'h')
+        85.6±1μs          100±2μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'h')
+        85.3±1μs       99.6±0.2μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'h')
+      85.3±0.9μs       99.5±0.7μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'b')
+      85.6±0.1μs       99.9±0.6μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'h')
+        84.7±1μs       98.8±0.7μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'b')
+        85.5±1μs       99.6±0.4μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'h')
+        86.5±2μs        101±0.4μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'h')
+      85.0±0.9μs       99.0±.1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'h')
+        85.8±2μs       99.8±0.7μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'h')
+        86.1±1μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'h')
+      85.3±0.8μs       99.2±0.9μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'b')
+        86.3±1μs        100±0.4μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'h')
+      84.6±0.6μs       98.3±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'b')
+      85.0±0.6μs       98.6±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'b')
+        86.5±2μs          100±1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'b')
+        87.0±1μs        101±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'h')
+      84.9±0.3μs       98.5±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'b')
+      86.4±0.4μs        100±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'h')
+        87.0±2μs        101±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'h')
+      86.1±0.8μs       99.7±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'h')
+        86.6±2μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'h')
+      85.7±0.8μs         99.1±1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'b')
+      87.3±0.9μs        101±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'h')
+      85.3±0.7μs       98.7±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'b')
+        85.0±1μs       98.3±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'b')
+      87.5±0.6μs        101±0.7μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'h')
+      86.3±0.8μs       99.8±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'b')
+      86.6±0.9μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'h')
+        85.7±2μs       99.0±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'h')
+        87.4±1μs        101±0.7μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'h')
+        85.2±1μs       98.3±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'b')
+        87.6±1μs        101±0.4μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'h')
+         301±4μs        347±0.8μs     1.15  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 1000))
+        87.5±1μs        101±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'h')
+        86.5±2μs       99.7±0.8μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'h')
+        87.8±2μs        101±0.9μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'b')
+        85.7±1μs       98.6±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'b')
+      85.7±0.9μs       98.6±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'b')
+      87.5±0.7μs        101±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'h')
+      86.1±0.6μs       99.0±0.7μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'b')
+      85.6±0.6μs       98.4±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'b')
+      86.3±0.3μs       99.2±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'b')
+        87.8±2μs        101±0.9μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'h')
+      87.5±0.6μs        100±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'h')
+        88.6±1μs        102±0.4μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'h')
+        86.1±2μs       98.8±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'b')
+      12.3±0.2μs       14.1±0.2μs     1.15  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'D')
+      85.8±0.8μs       98.3±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'b')
+     1.15±0.03μs      1.32±0.02μs     1.15  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 100)
+        86.9±2μs       99.4±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'h')
+        88.2±1μs        101±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'h')
+        88.9±1μs        102±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'h')
+      86.2±0.4μs       98.6±0.1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'b')
+      87.2±0.3μs       99.6±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'b')
+      87.7±0.2μs        100±0.6μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'h')
+      86.2±0.4μs       98.3±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'b')
+        87.0±2μs       99.0±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'b')
+        86.5±1μs       98.4±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'b')
+        89.1±2μs          101±1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'h')
+        86.8±1μs         98.6±1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'b')
+        86.9±1μs       98.8±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'b')
+      87.7±0.4μs       99.7±0.2μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'h')
+        87.2±1μs       99.0±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'b')
+      87.2±0.4μs       99.0±0.2μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'h')
+        86.6±1μs       98.3±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'b')
+      87.5±0.4μs         99.2±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'b')
+      87.2±0.4μs       98.8±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'b')
+      87.7±0.3μs       99.4±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'b')
+        88.3±1μs       99.9±0.5μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'h')
+      87.0±0.3μs       98.4±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'b')
+        89.0±2μs          101±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'b')
+      86.9±0.8μs       98.2±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'b')
+      88.0±0.6μs         99.4±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'b')
+        87.4±2μs       98.6±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'b')
+      88.0±0.9μs       99.3±0.6μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'b')
+         326±1μs          368±2μs     1.13  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 1000))
+        87.3±2μs       98.5±0.1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'b')
+      87.1±0.4μs       98.2±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'b')
+      87.2±0.6μs       98.2±0.4μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'b')
+        87.7±2μs       98.7±0.1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'b')
+        87.4±2μs       98.4±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'b')
+      88.8±0.4μs       99.8±0.7μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'h')
+        88.3±2μs       99.0±0.4μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'b')
+        88.4±2μs       98.9±0.5μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'b')
+        89.2±1μs       99.2±0.4μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'h')
+         120±4μs          133±3μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'L')
+      89.7±0.6μs       99.5±0.5μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'b')
+        88.7±2μs       98.3±0.1μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'b')
+        89.4±2μs       98.8±0.3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'b')
+      79.2±0.1μs       87.4±0.2μs     1.10  bench_function_base.Where.time_interleaved_zeros_x8
+         133±2μs          147±3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'Q')
+      13.4±0.2μs       14.7±0.3μs     1.10  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'D')
+         505±4μs        554±0.6μs     1.10  bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',))
+     6.90±0.05ms      7.56±0.02ms     1.10  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 1000000)
+        89.9±2μs       98.5±0.1μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'b')
+        89.9±3μs       98.4±0.6μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'b')
+         330±3μs          361±3μs     1.09  bench_function_base.Sort.time_argsort('quick', 'int32', ('sorted_block', 1000))
+         134±3μs          146±4μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'q')
+      70.1±0.3μs       76.6±0.6μs     1.09  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 10000)
+         119±2μs          130±2μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'Q')
+         409±2μs        443±0.5μs     1.08  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 100))
+       371±0.4μs          401±3μs     1.08  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 100))
+     2.68±0.03μs      2.89±0.08μs     1.08  bench_core.Core.time_hstack_l
+     17.2±0.04μs       18.6±0.7μs     1.08  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int32'>)
+      86.8±0.3μs       93.1±0.5μs     1.07  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 100))
+         123±2μs          131±2μs     1.07  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'L')
+      13.5±0.1μs       14.4±0.3μs     1.07  bench_ma.MA.time_masked_array_l100_t100
+       139±0.6μs          148±2μs     1.07  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'Q')
+     17.8±0.07μs       19.0±0.4μs     1.07  bench_lib.Nan.time_nanmean(200, 0)
+     5.04±0.06ms       5.37±0.1ms     1.07  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'wrap')
+     10.3±0.06μs       11.0±0.3μs     1.07  bench_ma.MA.time_masked_array_l100
+     19.3±0.06μs       20.6±0.6μs     1.07  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
+     1.15±0.02μs      1.23±0.03μs     1.07  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int64')
+     1.20±0.01ms      1.28±0.05ms     1.06  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int16'>)
+        55.1±1ms         58.6±2ms     1.06  bench_ma.Concatenate.time_it('masked', 2000)
+         128±2ms          136±2ms     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'constant')
+      21.0±0.1μs       22.3±0.4μs     1.06  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
+        638±10μs         677±20μs     1.06  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'numpy.int16'>)
+     1.32±0.01μs      1.40±0.05μs     1.06  bench_core.Core.time_ones_100
+     18.3±0.08μs       19.4±0.4μs     1.06  bench_ma.UFunc.time_2d(True, True, 10)
+        64.8±1ms         68.7±1ms     1.06  bench_ma.Concatenate.time_it('unmasked+masked', 2000)
+         108±2μs        115±0.4μs     1.06  bench_function_base.Sort.time_argsort('merge', 'int32', ('sorted_block', 100))
+         335±3μs          355±5μs     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'linear_ramp')
+     27.9±0.09μs       29.5±0.5μs     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'constant')
+         492±1ns          520±8ns     1.06  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs([1])
+     9.82±0.04μs       10.4±0.3μs     1.06  bench_lib.Unique.time_unique(200, 90.0)
+         565±2μs          597±3μs     1.06  bench_function_base.Sort.time_argsort('heap', 'float64', ('reversed',))
+         110±1μs          116±4μs     1.06  bench_lib.Pad.time_pad((256, 128, 1), 1, 'reflect')
+      60.7±0.6ms         64.1±2ms     1.06  bench_ma.Concatenate.time_it('ndarray+masked', 2000)
+     4.27±0.01μs       4.50±0.1μs     1.06  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'numpy.int64'>)
+      20.1±0.1μs       21.2±0.6μs     1.06  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
+      14.6±0.2μs      15.4±0.04μs     1.06  bench_ma.UFunc.time_scalar_1d(False, False, 100)
+      14.8±0.2μs       15.6±0.2μs     1.06  bench_ma.UFunc.time_scalar_1d(False, False, 1000)
+     10.8±0.02μs      11.4±0.07μs     1.06  bench_lib.Unique.time_unique(200, 0)
+        837±20ns         883±20ns     1.05  bench_io.Copy.time_memcpy('int8')
+       110±0.4μs          116±1μs     1.05  bench_function_base.Sort.time_argsort('merge', 'float32', ('sorted_block', 100))
+      21.2±0.2μs       22.4±0.8μs     1.05  bench_ma.UFunc.time_scalar_1d(False, True, 100)
+      14.5±0.2μs       15.3±0.1μs     1.05  bench_ma.UFunc.time_scalar_1d(False, False, 10)
+     2.18±0.01ms      2.30±0.04ms     1.05  bench_indexing.IndexingSeparate.time_mmap_fancy_indexing
+        673±10ns          709±7ns     1.05  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'numpy.int8'>)
+         532±2μs          559±2μs     1.05  bench_function_base.Sort.time_argsort('heap', 'float32', ('ordered',))
+     2.85±0.01μs      3.00±0.02μs     1.05  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'str'>)
+        1.52±0μs      1.60±0.05μs     1.05  bench_reduce.AnyAll.time_all_fast
+     1.21±0.02ms      1.27±0.05ms     1.05  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int64'>)
+      56.2±0.2μs       59.1±0.9μs     1.05  bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 1000))
-        90.6±1μs       86.2±0.3μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'H')
-         344±5μs          327±1μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 1, 'd')
-       342±0.9μs          325±2μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 2, 'd')
-     1.41±0.01ms      1.34±0.02ms     0.95  bench_lib.Nan.time_nanvar(200000, 0)
-         215±2μs          204±2μs     0.95  bench_function_base.Sort.time_sort('quick', 'float32', ('reversed',))
-     1.07±0.02ms      1.02±0.01ms     0.95  bench_reduce.AddReduceSeparate.time_reduce(0, 'longfloat')
-         718±7μs          682±5μs     0.95  bench_indexing.Indexing.time_op('indexes_rand_', 'np.ix_(I, I)', '=1')
-      13.2±0.4ms      12.5±0.07ms     0.95  bench_linalg.Linalg.time_op('svd', 'complex128')
-        71.4±1μs       67.7±0.9μs     0.95  bench_function_base.Sort.time_sort('quick', 'int32', ('ordered',))
-      88.2±0.5μs       83.6±0.8μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'H')
-         273±6μs          259±6μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'd')
-      62.2±0.9μs       59.0±0.3μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 2, 'd')
-      74.8±0.4μs       70.8±0.1μs     0.95  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 10000)
-         346±4μs          328±1μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 1, 'd')
-      92.7±0.7μs         87.8±1μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'H')
-         347±5μs          328±3μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 2, 'd')
-        88.3±1μs       83.6±0.4μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'H')
-     1.41±0.01ms      1.34±0.01ms     0.95  bench_lib.Nan.time_nanvar(200000, 0.1)
-       151±0.8μs        143±0.8μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('reversed',))
-         246±2ms          232±4ms     0.94  bench_app.LaplaceInplace.time_it('normal')
-        89.0±1μs       83.8±0.6μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'H')
-         347±3μs          327±1μs     0.94  bench_ufunc.UFunc.time_ufunc_types('square')
-     1.54±0.01ms      1.45±0.01ms     0.94  bench_lib.Nan.time_nanargmin(200000, 50.0)
-      91.1±0.9μs       85.5±0.8μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'H')
-        90.3±1μs       84.7±0.9μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'H')
-         430±1μs          404±2μs     0.94  bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 100))
-       526±0.5μs          493±4μs     0.94  bench_function_base.Sort.time_sort('heap', 'float64', ('reversed',))
-         478±7μs          448±3μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 10))
-        92.8±2μs         87.0±1μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'H')
-        63.4±2μs       59.5±0.1μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 2, 1, 'f')
-     2.31±0.01ms      2.16±0.02ms     0.94  bench_lib.Nan.time_nanvar(200000, 90.0)
-        79.5±1μs         74.4±1μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int32', ('ordered',))
-       272±0.7μs          255±3μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'd')
-     2.33±0.01ms      2.18±0.02ms     0.94  bench_lib.Nan.time_nanstd(200000, 90.0)
-      87.9±0.6μs       82.3±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'H')
-         273±2μs          255±2μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'd')
-     16.3±0.08μs         15.3±1μs     0.93  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int64'>)
-     1.43±0.02ms      1.33±0.03ms     0.93  bench_lib.Pad.time_pad((1024, 1024), 1, 'mean')
-        89.1±1μs       83.1±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'H')
-         275±1μs          256±3μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'd')
-         725±3μs          675±3μs     0.93  bench_indexing.Indexing.time_op('indexes_', 'np.ix_(I, I)', '=1')
-        91.8±1μs       85.4±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'H')
-        91.0±2μs       84.6±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'H')
-        92.0±2μs       85.6±0.4μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'H')
-      88.0±0.8μs       81.8±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'H')
-        93.4±2μs         86.7±1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'H')
-      91.9±0.6μs         85.3±1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'H')
-        89.9±1μs       83.4±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'H')
-        92.0±1μs       85.3±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'H')
-      91.1±0.5μs       84.5±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'H')
-         276±4μs          256±4μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'd')
-      88.6±0.7μs       82.1±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'H')
-      92.5±0.8μs       85.6±0.9μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'H')
-        93.4±1μs       86.3±0.5μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'H')
-        90.8±2μs       83.9±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'H')
-      95.0±0.3μs       87.8±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'H')
-        89.6±1μs       82.7±0.2μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'H')
-      91.5±0.5μs       84.5±0.2μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'H')
-        89.4±1μs       82.4±0.3μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'H')
-      89.9±0.8μs       82.9±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'H')
-      92.4±0.9μs       85.1±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'H')
-         420±3μs          386±2μs     0.92  bench_ufunc.UFunc.time_ufunc_types('multiply')
-        91.9±2μs         84.6±1μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'H')
-        93.8±1μs       86.2±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'H')
-     1.38±0.04ms      1.27±0.03ms     0.92  bench_lib.Pad.time_pad((1024, 1024), 8, 'mean')
-      91.8±0.8μs       84.3±0.3μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'H')
-        91.5±1μs       83.9±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'H')
-      95.3±0.9μs       87.4±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'H')
-         265±3μs          243±4μs     0.92  bench_ufunc.UFunc.time_ufunc_types('minimum')
-        94.0±2μs       86.1±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'H')
-        93.6±1μs       85.8±0.5μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'H')
-      90.8±0.8μs       83.1±0.4μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'H')
-        91.5±2μs       83.6±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'H')
-         557±3μs          509±1μs     0.91  bench_function_base.Sort.time_sort('heap', 'float32', ('reversed',))
-      91.2±0.7μs       83.3±0.6μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'H')
-        91.4±1μs       83.4±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'H')
-         277±1μs        253±0.5μs     0.91  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'd')
-     6.03±0.07μs      5.49±0.02μs     0.91  bench_itemselection.PutMask.time_dense(False, 'complex256')
-        91.0±1μs       83.0±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'H')
-         277±1μs          253±2μs     0.91  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'd')
-        92.3±1μs       84.1±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'H')
-      94.2±0.8μs       85.6±0.8μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'H')
-      93.0±0.2μs       84.5±0.4μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'H')
-      87.4±0.9μs       79.4±0.5μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'q')
-      91.4±0.9μs       82.9±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'H')
-      89.9±0.5μs       81.4±0.2μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'H')
-         503±3μs          455±2μs     0.91  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 100))
-        93.1±1μs       84.1±0.4μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'H')
-        92.9±2μs       84.0±0.5μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'H')
-         269±4μs          243±2μs     0.90  bench_ufunc.UFunc.time_ufunc_types('maximum')
-         315±3μs          284±3μs     0.90  bench_ufunc.UFunc.time_ufunc_types('subtract')
-        77.9±1μs       70.0±0.3μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'I')
-         362±4μs          324±2μs     0.90  bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 1000))
-        85.5±2μs       76.5±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'i')
-        78.6±2μs       70.2±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'i')
-        76.6±1μs       68.4±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'I')
-         107±6μs         95.4±1μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'd')
-        78.9±1μs       70.4±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'I')
-      76.6±0.9μs       68.3±0.1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'i')
-         108±4μs         96.5±3μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
-        83.4±1μs         74.3±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'Q')
-        78.2±1μs       69.6±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'i')
-        79.6±2μs       70.8±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'i')
-        82.5±1μs         73.4±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'i')
-        82.0±1μs       72.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'i')
-         938±6μs         833±10μs     0.89  bench_lib.Nan.time_nanargmax(200000, 90.0)
-         413±1μs          367±3μs     0.89  bench_ufunc.UFunc.time_ufunc_types('rint')
-        79.3±1μs       70.4±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'I')
-         110±3μs         97.9±2μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'd')
-         391±5μs          348±2μs     0.89  bench_ufunc.UFunc.time_ufunc_types('fmin')
-      77.3±0.9μs       68.7±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'i')
-        78.5±2μs       69.7±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'i')
-      89.7±0.9μs       79.6±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'L')
-        88.9±2μs       78.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'Q')
-      77.7±0.6μs       68.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'I')
-        78.3±2μs       69.3±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'I')
-      89.4±0.4μs         79.2±2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'q')
-      77.7±0.6μs       68.6±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'i')
-        88.8±2μs         78.5±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'I')
-      82.8±0.6μs       73.2±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'i')
-        81.7±3μs       72.1±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'i')
-         6.63±0s          5.85±0s     0.88  bench_ufunc_strides.LogisticRegression.time_train(<class 'numpy.float32'>)
-      4.01±0.1ms      3.53±0.07ms     0.88  bench_core.VarComplex.time_var(1000000)
-        69.6±1ms       61.4±0.9ms     0.88  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>)
-        78.7±1μs       69.3±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'i')
-        89.1±1μs         78.5±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'Q')
-        77.1±2μs       67.9±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'I')
-        83.2±1μs       73.2±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'Q')
-        78.5±1μs       69.0±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'i')
-        88.4±1μs       77.7±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'l')
-        88.8±1μs       78.0±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'l')
-      81.8±0.9μs       71.8±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'I')
-        82.0±1μs       72.0±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'i')
-        81.5±2μs       71.6±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'I')
-        90.4±2μs         79.3±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'I')
-        82.3±1μs         72.2±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'I')
-        79.0±1μs       69.3±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'I')
-        78.0±1μs       68.4±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'I')
-        77.6±1μs       68.1±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'i')
-        89.2±2μs       78.2±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'l')
-        82.1±1μs         71.9±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'i')
-        79.7±1μs       69.9±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'I')
-        77.7±1μs       68.2±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'i')
-      81.5±0.2μs       71.4±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'i')
-        90.3±1μs         79.0±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'L')
-         473±3μs          414±2μs     0.88  bench_lib.Nan.time_nanargmax(200000, 0.1)
-      61.4±0.9μs       53.8±0.6μs     0.88  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 1, 'd')
-      78.1±0.2μs       68.3±0.2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'i')
-         150±1μs          131±3μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd')
-        82.9±2μs       72.5±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'i')
-      81.5±0.6μs       71.3±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'I')
-         639±2μs          559±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'f')
-        79.7±2μs       69.7±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'i')
-        84.4±2μs       73.8±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'i')
-         473±3μs          413±2μs     0.87  bench_lib.Nan.time_nanargmax(200000, 0)
-         637±3μs          556±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'f')
-        82.9±1μs       72.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'i')
-         523±1μs          457±3μs     0.87  bench_lib.Nan.time_nanargmax(200000, 2.0)
-      89.5±0.8μs       78.1±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'Q')
-      89.1±0.8μs         77.8±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'q')
-      80.8±0.7μs       70.5±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'i')
-      62.7±0.9μs       54.7±0.7μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 2, 'd')
-      82.0±0.6μs       71.5±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'I')
-         636±1μs          555±4μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'f')
-        82.5±1μs       71.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'I')
-        86.1±2μs       75.1±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'I')
-         638±1μs          556±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'f')
-        78.0±2μs       67.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'I')
-         639±2μs          557±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'f')
-         108±1ms         93.9±3ms     0.87  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'str'>)
-     1.49±0.01μs      1.29±0.01μs     0.87  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int16')
-         641±2μs          558±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'f')
-        84.9±2μs       73.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'q')
-        90.8±2μs         79.0±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'i')
-         527±5μs          458±9μs     0.87  bench_lib.Nan.time_nanargmin(200000, 2.0)
-        85.5±1μs         74.4±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'I')
-        90.0±1μs         78.3±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'i')
-        90.3±2μs         78.6±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'I')
-      83.1±0.9μs       72.3±0.2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'q')
-         641±1μs          557±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'f')
-      82.6±0.9μs       71.8±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'i')
-         642±3μs          557±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'f')
-         110±3μs         95.6±1μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'd')
-      78.8±0.6μs       68.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'i')
-       639±0.6μs          554±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'f')
-     1.49±0.01μs         1.30±0μs     0.87  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float16')
-        90.3±2μs       78.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'i')
-        82.9±2μs       71.9±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'I')
-         112±2μs         97.2±2μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd')
-         640±3μs          555±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'f')
-         636±2μs          552±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'f')
-         642±3μs          557±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'f')
-        85.6±1μs       74.3±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'i')
-         641±6μs          555±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'f')
-         640±3μs          555±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'f')
-         642±2μs          557±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'f')
-         641±1μs          556±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'f')
-         490±2μs          425±1μs     0.87  bench_function_base.Sort.time_sort('heap', 'float64', ('ordered',))
-        89.9±1μs         77.9±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'L')
-        84.1±1μs       72.8±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'I')
-      82.2±0.9μs         71.2±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'i')
-         639±3μs          553±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'f')
-         638±3μs          552±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'f')
-        82.6±2μs       71.5±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'I')
-        86.7±1μs       75.0±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'I')
-      81.1±0.4μs       70.2±0.7μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'I')
-         638±1μs        552±0.6μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'f')
-        90.0±2μs         77.9±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'i')
-         642±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'f')
-         638±1μs        551±0.8μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'f')
-        88.7±1μs         76.7±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'I')
-         643±5μs          556±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'f')
-        83.8±1μs         72.5±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'I')
-      90.5±0.9μs       78.2±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'L')
-         644±2μs          557±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'f')
-      88.1±0.5μs       76.1±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'i')
-         642±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'f')
-         659±7μs          570±7μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'f')
-         641±2μs          554±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'f')
-      81.5±0.9μs       70.4±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'I')
-      84.3±0.9μs       72.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'i')
-        89.4±1μs       77.2±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'q')
-         652±8μs          563±5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'f')
-         643±4μs          556±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'f')
-        82.8±2μs         71.5±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'i')
-         639±2μs        551±0.6μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'f')
-        80.8±1μs       69.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'i')
-         642±3μs          554±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'f')
-      83.6±0.7μs       72.2±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'L')
-        81.5±2μs       70.3±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'I')
-         639±3μs          552±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'f')
-        81.0±2μs       69.8±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'i')
-        79.2±2μs       68.3±0.3μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'I')
-         643±2μs        554±0.9μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'f')
-        83.3±1μs       71.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'i')
-         641±2μs          553±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'f')
-         640±2μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'f')
-        83.4±2μs       71.9±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'I')
-         639±2μs        551±0.5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'f')
-       640±0.7μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'f')
-        83.8±1μs       72.3±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'Q')
-        84.9±2μs       73.2±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'L')
-         642±2μs          553±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'f')
-         641±3μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'f')
-         643±2μs          554±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'f')
-         641±3μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'f')
-        84.3±2μs       72.7±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'l')
-         645±4μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'f')
-      86.9±0.7μs       74.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'I')
-        89.7±1μs         77.3±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'l')
-     2.47±0.02μs      2.13±0.02μs     0.86  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float16')
-         642±1μs          553±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'f')
-         641±5μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'f')
-         642±1μs          553±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'f')
-        90.6±1μs         77.9±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'Q')
-        82.6±1μs       71.0±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'i')
-         645±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'f')
-      83.5±0.8μs       71.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'l')
-         391±2μs          336±4μs     0.86  bench_ufunc.UFunc.time_ufunc_types('fmax')
-        83.4±1μs       71.7±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'L')
-        89.3±2μs       76.8±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'Q')
-      86.9±0.8μs       74.7±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'i')
-        85.3±1μs       73.3±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'l')
-      87.0±0.9μs       74.7±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'I')
-         108±2ms         93.0±2ms     0.86  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'str'>)
-        85.5±1μs       73.4±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'i')
-         645±6μs          553±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'f')
-        83.7±1μs         71.9±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'I')
-         644±3μs          553±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'f')
-         643±2μs          551±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'f')
-        91.5±2μs       78.4±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'i')
-        87.6±1μs       75.1±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'i')
-        84.7±1μs       72.6±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'q')
-         644±3μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'f')
-        69.4±2ms         59.5±1ms     0.86  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>)
-         482±6μs          413±6μs     0.86  bench_lib.Nan.time_nanargmin(200000, 0)
-      79.2±0.5μs       67.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'I')
-         645±5μs          552±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'f')
-        90.0±1μs       77.0±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'L')
-         649±4μs          556±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'f')
-         647±2μs          554±5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'f')
-         521±3μs          446±1μs     0.86  bench_function_base.Sort.time_sort('heap', 'float32', ('ordered',))
-      79.7±0.4μs       68.2±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'I')
-         648±7μs          554±2μs     0.85  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'f')
-        85.0±2μs       72.7±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'l')
-      91.5±0.7μs       78.2±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'q')
-        86.0±1μs         73.4±1μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'Q')
-         149±4μs          127±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd')
-        84.9±1μs       72.4±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'L')
-      83.5±0.8μs       71.3±0.7μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'i')
-      86.5±0.8μs         73.8±1μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'Q')
-        85.3±2μs       72.8±0.3μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'L')
-        84.7±1μs       72.3±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'I')
-      83.0±0.9μs       70.7±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'I')
-     2.47±0.02μs      2.10±0.02μs     0.85  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int16')
-      58.6±0.6μs       50.0±0.6μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd')
-        87.3±1μs       74.4±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'i')
-         109±3μs         92.6±4μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'd')
-        91.4±1μs         77.8±2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'I')
-         653±5μs          556±3μs     0.85  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'f')
-      83.9±0.8μs       71.4±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'I')
-      57.8±0.6μs       49.2±0.5μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd')
-      84.7±0.8μs       72.0±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'i')
-        82.4±2μs       70.1±0.2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'I')
-      82.7±0.8μs       70.3±0.5μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'i')
-         260±1μs          221±2μs     0.85  bench_ufunc.UFunc.time_ufunc_types('floor')
-        86.2±2μs       73.3±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'q')
-        84.4±2μs       71.7±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'I')
-        82.7±2μs       70.2±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'i')
-        87.8±1μs       74.5±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'I')
-        85.6±1μs       72.7±0.2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'I')
-         151±2μs          128±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'd')
-     5.24±0.01μs      4.44±0.04μs     0.85  bench_lib.Nan.time_nanmin(200, 2.0)
-        91.8±1μs       77.8±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'L')
-        92.1±2μs       78.0±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'I')
-      84.1±0.8μs       71.1±0.3μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'I')
-         258±3μs          218±5μs     0.85  bench_ufunc.UFunc.time_ufunc_types('trunc')
-         327±4μs         277±10μs     0.85  bench_ufunc.UFunc.time_ufunc_types('add')
-        78.1±2μs         66.0±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'd')
-        90.4±1μs       76.3±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'Q')
-      91.4±0.8μs         77.1±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'l')
-        83.3±1μs       70.2±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'I')
-      85.3±0.3μs       71.9±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'q')
-        91.5±3μs         77.1±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'l')
-        483±10μs          407±5μs     0.84  bench_lib.Nan.time_nanargmin(200000, 0.1)
-        94.9±1μs       79.9±0.3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'B')
-      94.3±0.6μs       79.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'B')
-      85.0±0.9μs       71.5±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'q')
-      84.3±0.7μs       70.9±0.2μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'I')
-     5.28±0.08μs      4.43±0.07μs     0.84  bench_lib.Nan.time_nanmin(200, 0.1)
-      85.8±0.5μs       72.1±0.4μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'L')
-      85.3±0.3μs       71.6±0.3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'Q')
-      94.5±0.9μs       79.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'B')
-        993±60μs          833±9μs     0.84  bench_lib.Nan.time_nanargmin(200000, 90.0)
-        60.4±1μs       50.7±0.5μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'd')
-        91.5±2μs       76.8±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'q')
-      18.9±0.3ms       15.9±0.9ms     0.84  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'str'>)
-      95.3±0.6μs       79.9±0.9μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'B')
-     1.24±0.02ms      1.04±0.06ms     0.84  bench_core.Temporaries.time_large2
-      94.5±0.5μs       79.1±0.4μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'B')
-      93.4±0.6μs       78.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'B')
-        59.3±1μs       49.7±0.5μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'd')
-      95.9±0.4μs       80.1±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'B')
-      95.3±0.9μs       79.7±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'B')
-        95.1±1μs       79.5±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'B')
-         259±2μs          217±3μs     0.84  bench_ufunc.UFunc.time_ufunc_types('ceil')
-      93.6±0.5μs       78.2±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'B')
-         113±2μs         94.8±3μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'd')
-        85.7±1μs       71.6±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'l')
-        93.6±1μs       78.0±0.9μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'B')
-         113±3μs         94.5±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'd')
-        94.0±1μs       78.3±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'B')
-      93.0±0.1μs       77.4±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'B')
-     5.32±0.01μs      4.43±0.04μs     0.83  bench_lib.Nan.time_nanmax(200, 0)
-      94.4±0.3μs       78.6±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'B')
-     5.31±0.07μs      4.42±0.04μs     0.83  bench_lib.Nan.time_nanmin(200, 0)
-      93.2±0.5μs         77.5±2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'B')
-        76.7±2μs         63.7±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'd')
-         153±3μs          127±3μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd')
-      92.7±0.2μs       77.0±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'B')
-      95.1±0.8μs       79.0±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'B')
-      94.3±0.9μs       78.3±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'B')
-     94.4±0.09μs       78.4±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'B')
-      95.6±0.9μs         79.4±1μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'B')
-        94.6±1μs       78.5±0.2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'B')
-        75.5±2μs         62.7±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'd')
-        94.6±1μs       78.4±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'B')
-      61.4±0.7μs       50.9±0.4μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 1, 'd')
-      86.9±0.9μs       72.0±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'l')
-     5.34±0.04μs      4.42±0.08μs     0.83  bench_lib.Nan.time_nanmax(200, 0.1)
-        95.1±1μs       78.8±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'B')
-        50.1±1μs       41.5±0.8μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd')
-      94.3±0.8μs       78.0±0.2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'B')
-      94.7±0.8μs       78.2±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'B')
-      94.3±0.7μs       77.8±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'B')
-        94.6±1μs       78.1±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'B')
-      94.8±0.2μs       78.1±0.1μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'B')
-      94.6±0.4μs       77.8±0.6μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'B')
-        95.6±1μs       78.6±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'B')
-      92.7±0.3μs       76.2±0.8μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'B')
-      93.3±0.4μs       76.7±0.8μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'B')
-      92.7±0.4μs       76.2±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'B')
-      93.1±0.3μs       76.5±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'B')
-      93.5±0.5μs       76.8±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'B')
-         154±4μs          126±1μs     0.82  bench_function_base.Sort.time_argsort('quick', 'uint32', ('reversed',))
-      17.5±0.5μs       14.3±0.3μs     0.82  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'D')
-     1.23±0.03ms      1.01±0.04ms     0.82  bench_core.Temporaries.time_large
-      95.5±0.6μs       78.1±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'B')
-      51.8±0.4μs         42.3±1μs     0.82  bench_core.VarComplex.time_var(10000)
-        79.6±2μs         65.0±3μs     0.82  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'd')
-      94.0±0.8μs       76.7±0.6μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'B')
-        94.6±1μs       77.2±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'B')
-         192±2μs        157±0.3μs     0.82  bench_reduce.ArgMin.time_argmin(<class 'numpy.float32'>)
-        94.1±1μs       76.7±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'B')
-     5.40±0.03μs      4.41±0.01μs     0.82  bench_lib.Nan.time_nanmin(200, 90.0)
-      92.9±0.2μs       75.7±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'B')
-     5.43±0.03μs      4.43±0.07μs     0.81  bench_lib.Nan.time_nanmax(200, 2.0)
-      94.1±0.2μs       76.6±0.2μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'B')
-      92.8±0.3μs       75.5±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'B')
-      93.7±0.6μs         76.0±3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'B')
-         196±3μs        159±0.8μs     0.81  bench_reduce.ArgMax.time_argmax(<class 'numpy.float32'>)
-       121±0.4ms         98.0±2ms     0.81  bench_app.LaplaceInplace.time_it('inplace')
-      95.3±0.9μs       77.1±0.3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'B')
-      94.2±0.8μs       76.2±0.3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'B')
-      93.8±0.4μs       75.8±0.6μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'B')
-        94.8±1μs       76.5±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'B')
-     5.43±0.02μs      4.37±0.02μs     0.81  bench_lib.Nan.time_nanmax(200, 90.0)
-        76.2±2μs         61.3±2μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'd')
-         119±4μs         95.4±2μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'd')
-      94.0±0.8μs       75.6±0.5μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'B')
-         120±2μs         96.3±3μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'd')
-      95.9±0.7μs       77.0±0.4μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'B')
-        95.7±2μs       76.6±0.2μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'B')
-        94.5±1μs       75.7±0.4μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'B')
-      95.3±0.6μs       76.1±0.3μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'B')
-      51.2±0.3μs       40.8±0.4μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'd')
-        50.9±2μs       40.6±0.4μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd')
-        52.0±1μs       41.3±0.6μs     0.79  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'd')
-         488±5μs          387±3μs     0.79  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 1000))
-         371±6μs          291±5μs     0.79  bench_core.VarComplex.time_var(100000)
-      8.18±0.4ms       6.39±0.3ms     0.78  bench_ufunc.Broadcast.time_broadcast
-      42.6±0.1μs       32.5±0.3μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'd')
-      42.7±0.5μs       32.4±0.2μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd')
-         168±3μs          127±4μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'd')
-     4.95±0.03ms      3.71±0.06ms     0.75  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f')
-     4.88±0.02ms      3.63±0.01ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f')
-     5.90±0.04μs      4.38±0.03μs     0.74  bench_lib.Nan.time_nanmin(200, 50.0)
-     4.87±0.01ms         3.61±0ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f')
-     4.89±0.03ms      3.63±0.01ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f')
-        67.3±2μs       49.7±0.9μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd')
-      43.4±0.5μs       32.0±0.3μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd')
-     6.05±0.05μs      4.45±0.06μs     0.74  bench_lib.Nan.time_nanmax(200, 50.0)
-        43.5±1μs       32.0±0.2μs     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'd')
-     4.95±0.06ms      3.63±0.02ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f')
-     4.94±0.06ms      3.62±0.01ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f')
-     4.95±0.02ms      3.62±0.02ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f')
-     4.94±0.04ms      3.61±0.01ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f')
-      18.7±0.6μs      13.5±0.06μs     0.72  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'D')
-     5.04±0.03ms      3.63±0.05ms     0.72  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f')
-     11.3±0.07μs      8.02±0.05μs     0.71  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float64'>)
-         133±2μs         94.0±1μs     0.71  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'd')
-      14.8±0.3μs       10.4±0.3μs     0.70  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'D')
-      58.7±0.2μs       40.7±0.7μs     0.69  bench_core.Temporaries.time_mid2
-      59.0±0.5μs       40.7±0.9μs     0.69  bench_core.Temporaries.time_mid
-      14.8±0.3μs      10.2±0.09μs     0.69  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'D')
-         540±2μs          371±1μs     0.69  bench_reduce.AddReduceSeparate.time_reduce(0, 'float64')
-         140±4μs         96.0±3μs     0.69  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'd')
-        92.0±3μs         63.1±3μs     0.69  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'd')
-         187±4μs          128±4μs     0.68  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'd')
-      21.0±0.3μs       14.4±0.2μs     0.68  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'F')
-      21.3±0.5μs       14.4±0.3μs     0.68  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'F')
-      12.9±0.2μs       8.68±0.2μs     0.67  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'D')
-      64.6±0.5μs         43.2±1μs     0.67  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'd')
-      81.7±0.5μs       54.6±0.6μs     0.67  bench_ufunc.CustomInplace.time_double_add_temp
-        65.4±1μs       43.1±0.9μs     0.66  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'd')
-      65.7±0.6μs       43.3±0.8μs     0.66  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'd')
-     1.23±0.05ms         803±10μs     0.65  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex128')
-      73.7±0.4μs       47.6±0.2μs     0.65  bench_ufunc.CustomInplace.time_double_add
-      40.9±0.4μs       26.2±0.3μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'd')
-        64.4±1μs       40.9±0.5μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'd')
-      20.9±0.3μs      13.2±0.08μs     0.63  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F')
-      41.2±0.9μs       26.0±0.1μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'd')
-        68.9±2μs       43.2±0.7μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'd')
-      41.3±0.6μs       25.8±0.4μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'd')
-        69.3±2μs       43.3±0.8μs     0.62  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'd')
-      41.2±0.7μs       25.7±0.2μs     0.62  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'd')
-      20.9±0.4μs       12.8±0.8μs     0.61  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'F')
-      21.1±0.3μs       12.9±0.2μs     0.61  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'F')
-      21.0±0.4μs       12.6±0.4μs     0.60  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'F')
-      81.6±0.6μs       49.0±0.5μs     0.60  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'd')
-      13.6±0.4μs       7.80±0.3μs     0.57  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'D')
-       158±0.8μs         88.8±1μs     0.56  bench_reduce.ArgMax.time_argmax(<class 'numpy.float64'>)
-         157±1μs       88.1±0.3μs     0.56  bench_reduce.ArgMin.time_argmin(<class 'numpy.float64'>)
-     1.96±0.01ms         1.08±0ms     0.55  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex64')
-      66.5±0.3μs      33.8±0.05μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 43)
-     66.7±0.09μs       33.9±0.1μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -43)
-        82.3±2μs       41.7±0.6μs     0.51  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'd')
-      71.7±0.4μs       35.0±0.1μs     0.49  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 8)
-      71.6±0.3μs      34.4±0.06μs     0.48  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -8)
-        66.2±1μs       31.4±0.1μs     0.47  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd')
-        88.1±3μs         41.3±1μs     0.47  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd')
-      78.7±0.7μs       35.7±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'Q')
-        79.1±1μs       35.8±0.7μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'l')
-        79.8±1μs       35.8±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'L')
-        79.4±1μs       35.5±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'q')
-         434±1μs          194±4μs     0.45  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 4, 'd')
-        80.5±1μs       35.6±0.5μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'L')
-      12.6±0.2μs      5.57±0.02μs     0.44  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (1))
-         443±2μs          196±3μs     0.44  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 4, 'd')
-      12.6±0.2μs      5.57±0.01μs     0.44  bench_reduce.MinMax.time_min(<class 'numpy.uint64'>)
-      13.2±0.3μs      5.82±0.02μs     0.44  bench_reduce.MinMax.time_max(<class 'numpy.uint64'>)
-      81.0±0.5μs       35.6±0.6μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'Q')
-      81.5±0.6μs       35.7±0.4μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'l')
-        81.4±1μs       35.6±0.8μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'q')
-      12.8±0.2μs      5.54±0.02μs     0.43  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (0))
-      12.9±0.3μs      5.61±0.05μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (1))
-      13.2±0.3μs      5.66±0.02μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (0))
-         505±1μs          216±1μs     0.43  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f')
-         506±3μs          213±3μs     0.42  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'f')
-      61.7±0.9μs       25.9±0.1μs     0.42  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'd')
-         505±1μs          209±1μs     0.41  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'f')
-         432±2μs         177±10μs     0.41  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 4, 'd')
-         520±9μs          209±5μs     0.40  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 4, 'f')
-         508±6μs          200±4μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'f')
-         436±2μs          170±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 2, 'd')
-         503±1μs          195±2μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'f')
-         438±4μs          170±5μs     0.39  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 2, 'd')
-      36.4±0.2μs       14.1±0.1μs     0.39  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 4, 'F')
-         432±3μs          167±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 4, 'd')
-         432±3μs          167±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 4, 'd')
-         505±2μs          195±4μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'f')
-        81.6±2μs       31.4±0.3μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'd')
-         509±3μs          195±2μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'f')
-         436±2μs          167±5μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 4, 'd')
-         436±3μs          166±3μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 4, 'd')
-         218±2μs         83.0±2μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'f')
-      12.0±0.2μs      4.56±0.02μs     0.38  bench_reduce.MinMax.time_max(<class 'numpy.uint32'>)
-       215±0.9μs       81.0±0.3μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'f')
-         220±2μs         82.5±1μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'f')
-      12.0±0.2μs      4.50±0.05μs     0.38  bench_reduce.MinMax.time_min(<class 'numpy.int32'>)
-         505±2μs          189±1μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'f')
-      36.3±0.1μs      13.6±0.08μs     0.37  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'F')
-     2.71±0.02ms         1.01±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd')
-     2.76±0.03ms      1.03±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd')
-         425±3μs          159±6μs     0.37  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 4, 'd')
-         439±1μs         163±10μs     0.37  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 4, 'd')
-       218±0.9μs       80.8±0.6μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'f')
-         217±1μs       80.5±0.5μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'f')
-     2.74±0.08ms      1.01±0.04ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd')
-     2.74±0.03ms      1.01±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd')
-      12.2±0.2μs      4.48±0.06μs     0.37  bench_reduce.MinMax.time_min(<class 'numpy.uint32'>)
-       217±0.5μs       79.9±0.5μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'f')
-         219±2μs       80.6±0.4μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'f')
-     2.78±0.03ms      1.02±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd')
-     2.76±0.04ms      1.02±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd')
-     2.74±0.04ms         1.01±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd')
-       216±0.5μs       79.2±0.4μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'f')
-     2.72±0.04ms         996±20μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd')
-     2.87±0.09ms      1.05±0.03ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd')
-         219±5μs       79.4±0.4μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'f')
-         432±2μs          156±3μs     0.36  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 4, 'd')
-      36.2±0.2μs      13.0±0.08μs     0.36  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'F')
-      12.2±0.2μs      4.39±0.02μs     0.36  bench_reduce.MinMax.time_max(<class 'numpy.int32'>)
-         448±6μs         160±10μs     0.36  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 4, 'd')
-        455±10μs         161±10μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 4, 'd')
-        443±10μs          156±4μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 1, 'd')
-         428±2μs          150±1μs     0.35  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 2, 'd')
-         430±1μs          148±5μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 2, 'd')
-         433±1μs          149±7μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 1, 'd')
-       430±0.8μs          145±4μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 2, 'd')
-       103±0.3μs      34.5±0.06μs     0.34  bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float32'>)
-         435±5μs          146±2μs     0.34  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 2, 'd')
-         103±1μs       34.5±0.1μs     0.34  bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
-       426±0.3μs          143±5μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 2, 'd')
-     8.84±0.04μs      2.96±0.04μs     0.33  bench_reduce.MinMax.time_min(<class 'numpy.uint8'>)
-         430±3μs          143±3μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 1, 'd')
-         428±1μs          142±7μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 1, 'd')
-        442±10μs          147±6μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 4, 'd')
-         438±3μs          145±6μs     0.33  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 2, 'd')
-     8.83±0.03μs      2.92±0.03μs     0.33  bench_reduce.MinMax.time_max(<class 'numpy.uint8'>)
-       158±0.5μs         52.2±1μs     0.33  bench_reduce.ArgMax.time_argmax(<class 'numpy.int64'>)
-      75.5±0.9μs       24.9±0.4μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'I')
-       156±0.2μs       51.5±0.6μs     0.33  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint64'>)
-         433±4μs          142±8μs     0.33  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 2, 'd')
-      77.3±0.5μs       25.2±0.4μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'I')
-      76.2±0.4μs       24.8±0.5μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'i')
-         433±4μs          141±6μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 1, 'd')
-        76.0±2μs       24.7±0.6μs     0.32  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'i')
-         447±9μs        145±0.5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 4, 'd')
-         447±9μs          145±4μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 4, 'd')
-       437±0.9μs          141±7μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 2, 'd')
-       159±0.5μs       51.4±0.4μs     0.32  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint64'>)
-         432±4μs          139±5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 1, 'd')
-       158±0.4μs       50.6±0.1μs     0.32  bench_reduce.ArgMin.time_argmin(<class 'numpy.int64'>)
-      81.3±0.6μs       25.7±0.3μs     0.32  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'd')
-         443±9μs          140±5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 4, 'd')
-         432±4μs         130±10μs     0.30  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 1, 'd')
-       157±0.1μs       46.8±0.4μs     0.30  bench_reduce.ArgMin.time_argmin(<class 'numpy.int32'>)
-         158±1μs       46.4±0.3μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint32'>)
-       157±0.4μs       45.5±0.5μs     0.29  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint32'>)
-       157±0.4μs       45.5±0.3μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.int32'>)
-         434±6μs          123±3μs     0.28  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 1, 'd')
-       426±0.9μs          120±3μs     0.28  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 1, 'd')
-         199±2μs         55.5±2μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'f')
-         199±2μs         55.2±1μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f')
-         198±2μs         54.9±1μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'f')
-         439±9μs          120±3μs     0.27  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 4, 'd')
-         198±2μs         54.2±1μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f')
-         199±2μs         54.5±3μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'f')
-         441±9μs          119±2μs     0.27  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 4, 'd')
-         437±2μs          118±6μs     0.27  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 1, 'd')
-      40.3±0.3μs      10.9±0.03μs     0.27  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float32'>)
-       424±0.8μs          114±4μs     0.27  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 2, 'd')
-       195±0.7μs       51.7±0.4μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f')
-         430±3μs          113±3μs     0.26  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 2, 'd')
-       195±0.7μs       50.7±0.4μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'f')
-     13.6±0.04μs      3.52±0.01μs     0.26  bench_reduce.MinMax.time_max(<class 'numpy.uint16'>)
-         195±1μs       50.3±0.3μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'f')
-         200±2μs         51.0±1μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'f')
-         197±3μs       50.3±0.3μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f')
-     13.7±0.09μs      3.48±0.03μs     0.25  bench_reduce.MinMax.time_min(<class 'numpy.uint16'>)
-         200±3μs         50.4±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'f')
-         201±3μs         50.5±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'f')
-         200±3μs         50.0±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'f')
-         200±3μs       49.4±0.7μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f')
-         200±3μs       49.1±0.6μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'f')
-        1.95±0ms          477±3μs     0.24  bench_reduce.AddReduceSeparate.time_reduce(0, 'float32')
-         597±4μs          146±1μs     0.24  bench_ufunc.CustomInplace.time_float_add_temp
-     12.5±0.05μs      2.98±0.02μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int8'>)
-         584±1μs          137±1μs     0.24  bench_ufunc.CustomInplace.time_float_add
-     15.0±0.05μs      3.52±0.02μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int16'>)
-     12.6±0.08μs      2.95±0.02μs     0.23  bench_reduce.MinMax.time_max(<class 'numpy.int8'>)
-     15.0±0.08μs      3.51±0.02μs     0.23  bench_reduce.MinMax.time_max(<class 'numpy.int16'>)
-         913±8μs          203±7μs     0.22  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'd')
-         881±3μs          195±4μs     0.22  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'd')
-       197±0.8μs       42.7±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'f')
-         198±2μs       42.8±0.5μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f')
-         199±3μs       43.1±0.1μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'f')
-         197±2μs       42.7±0.7μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'f')
-         199±3μs       43.0±0.6μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f')
-         197±1μs       42.5±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'f')
-       196±0.5μs       42.3±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'f')
-         200±3μs       43.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'f')
-         199±1μs       42.6±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'f')
-       199±0.8μs       42.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'f')
-         195±1μs       41.4±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'f')
-       196±0.9μs       41.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'f')
-         195±1μs       41.2±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'f')
-         195±2μs       41.2±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'f')
-         196±1μs       41.3±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'f')
-       195±0.5μs       41.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'f')
-       194±0.6μs       40.9±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'f')
-       195±0.8μs       41.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'f')
-       195±0.2μs       40.9±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'f')
-       196±0.6μs       40.9±0.4μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'f')
-         196±1μs       40.8±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'f')
-         197±2μs       40.9±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'f')
-       195±0.4μs       40.4±0.1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'f')
-         197±1μs       40.8±0.1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'f')
-       198±0.9μs       40.9±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'f')
-       196±0.4μs       40.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'f')
-         196±2μs       40.3±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'f')
-         196±2μs       40.3±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'f')
-         198±2μs       40.7±0.2μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'f')
-         199±3μs       40.6±0.5μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'f')
-         894±1μs          173±6μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'd')
-         880±2μs          166±2μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'd')
-         878±1μs          164±4μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'd')
-         901±5μs          167±4μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'd')
-         899±5μs          167±7μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'd')
-         424±2μs         78.8±2μs     0.19  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 2, 'd')
-         428±1μs         78.3±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 2, 'd')
-         887±6μs          159±5μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'd')
-         423±1μs         74.6±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 1, 'd')
-         878±2μs          155±6μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'd')
-         427±2μs         75.2±1μs     0.18  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 2, 'd')
-         425±3μs         74.6±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 1, 'd')
-       422±0.8μs       73.6±0.5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 2, 'd')
-         906±6μs          156±6μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'd')
-         434±4μs         74.7±2μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 1, 'd')
-        914±20μs          157±9μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'd')
-         877±1μs          151±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'd')
-        898±10μs          154±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'd')
-        931±20μs         159±10μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'd')
-         429±5μs         73.5±3μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 1, 'd')
-         881±1μs         149±10μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'd')
-         895±3μs          152±4μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'd')
-         896±3μs          150±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'd')
-      24.4±0.2μs      4.06±0.03μs     0.17  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 43)
-         904±5μs          149±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'd')
-         880±2μs          144±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'd')
-      24.9±0.2μs      4.02±0.01μs     0.16  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 8)
-         879±2μs          141±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'd')
-         421±1μs         67.1±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 1, 'd')
-         420±2μs         66.5±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 2, 'd')
-         890±2μs          141±2μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'd')
-      82.5±0.3μs      12.9±0.07μs     0.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'h')
-         875±2μs          136±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'd')
-      83.1±0.7μs       12.9±0.2μs     0.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'h')
-       427±0.8μs         66.1±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 1, 'd')
-         427±2μs         65.3±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 2, 'd')
-        935±20μs          141±4μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'd')
-        921±20μs          139±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'd')
-      86.9±0.6μs       13.1±0.1μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'H')
-        937±20μs          141±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'd')
-         898±3μs          135±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'd')
-        917±20μs          138±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'd')
-         892±2μs          134±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'd')
-      86.3±0.7μs      12.8±0.08μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'H')
-         884±6μs          131±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'd')
-      65.9±0.4μs       9.76±0.1μs     0.15  bench_reduce.FMinMax.time_min(<class 'numpy.float64'>)
-         898±4μs          133±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'd')
-         883±3μs          130±4μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'd')
-         881±4μs          129±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'd')
-      67.1±0.5μs       9.62±0.1μs     0.14  bench_reduce.FMinMax.time_max(<class 'numpy.float64'>)
-         892±1μs          119±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'd')
-         660±6μs         87.8±1μs     0.13  bench_lib.Nan.time_nanmax(200000, 0)
-         672±5μs         88.5±1μs     0.13  bench_lib.Nan.time_nanmin(200000, 0)
-        921±30μs          121±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'd')
-         671±5μs         88.2±2μs     0.13  bench_lib.Nan.time_nanmax(200000, 2.0)
-         874±1μs          115±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'd')
-         669±3μs         87.5±1μs     0.13  bench_lib.Nan.time_nanmax(200000, 0.1)
-        667±10μs         87.3±2μs     0.13  bench_lib.Nan.time_nanmin(200000, 0.1)
-         672±5μs         87.7±2μs     0.13  bench_lib.Nan.time_nanmin(200000, 2.0)
-         877±3μs          114±9μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'd')
-        931±20μs          120±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'd')
-         895±6μs          114±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'd')
-         880±4μs          112±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'd')
-         891±2μs          112±9μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'd')
-      79.7±0.1μs       9.53±0.2μs     0.12  bench_reduce.ArgMax.time_argmax(<class 'bool'>)
-       418±0.9μs       45.2±0.9μs     0.11  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 1, 'd')
-         427±3μs       45.2±0.8μs     0.11  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 1, 'd')
-      68.5±0.3μs      7.19±0.07μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'b')
-      68.3±0.4μs      7.14±0.08μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'b')
-       156±0.2μs       15.9±0.2μs     0.10  bench_reduce.ArgMin.time_argmin(<class 'numpy.int16'>)
-       157±0.3μs       15.8±0.1μs     0.10  bench_reduce.ArgMax.time_argmax(<class 'numpy.int16'>)
-      70.5±0.2μs      6.45±0.04μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 43)
-      70.6±0.2μs      6.40±0.06μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -43)
-      75.7±0.5μs      6.48±0.04μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -8)
-         186±2μs       15.9±0.2μs     0.09  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint16'>)
-      75.9±0.4μs      6.42±0.06μs     0.08  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 8)
-       189±0.3μs      15.8±0.09μs     0.08  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint16'>)
-        901±20μs         74.7±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'd')
-         881±5μs       71.7±0.7μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'd')
-      89.1±0.2μs       7.22±0.1μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'B')
-     1.10±0.01ms         88.9±1μs     0.08  bench_lib.Nan.time_nanmax(200000, 90.0)
-      88.6±0.1μs      7.08±0.07μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'B')
-        1.10±0ms         87.3±1μs     0.08  bench_lib.Nan.time_nanmin(200000, 90.0)
-         881±5μs         69.3±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'd')
-         893±2μs       69.9±0.8μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'd')
-         872±5μs         68.1±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'd')
-         901±4μs         70.3±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'd')
-         890±9μs       65.1±0.6μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'd')
-         889±2μs       64.8±0.8μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'd')
-     41.6±0.07μs      3.02±0.01μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 8)
-         880±5μs         63.7±2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'd')
-      41.6±0.2μs      2.99±0.02μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 43)
-         903±5μs         62.9±1μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'd')
-         876±4μs       60.5±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'd')
-         907±6μs         61.8±1μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'd')
-       156±0.4μs       9.44±0.2μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.int8'>)
-       157±0.7μs       9.41±0.2μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint8'>)
-       157±0.7μs      9.32±0.09μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.int8'>)
-       157±0.6μs      9.25±0.03μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint8'>)
-        1.53±0ms         87.5±2μs     0.06  bench_lib.Nan.time_nanmax(200000, 50.0)
-     1.55±0.01ms         87.0±1μs     0.06  bench_lib.Nan.time_nanmin(200000, 50.0)
-      69.9±0.2μs      3.75±0.02μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -43)
-      70.3±0.4μs      3.77±0.04μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 43)
-      41.3±0.4μs         2.18±0μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 8)
-      41.3±0.3μs      2.18±0.01μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 43)
-        885±10μs       45.9±0.6μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'd')
-         892±5μs       45.3±0.5μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'd')
-      74.5±0.8μs      3.76±0.03μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -8)
-      74.0±0.1μs      3.73±0.05μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 8)
-      71.3±0.3μs      2.43±0.02μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 43)
-      71.3±0.4μs      2.42±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -43)
-      75.6±0.6μs      2.42±0.04μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 8)
-      75.3±0.7μs      2.39±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -8)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@seiko2plus seiko2plus force-pushed the zsystem_sup branch 3 times, most recently from b66ba82 to 75d77af Compare January 28, 2022 05:12
@seiko2plus seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Feb 10, 2022
@seiko2plus seiko2plus force-pushed the zsystem_sup branch 3 times, most recently from 6224f8c to 50407a6 Compare February 12, 2022 07:17
@seiko2plus seiko2plus marked this pull request as ready for review February 13, 2022 14:28
@seiko2plus seiko2plus added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 13, 2022
@seiko2plus seiko2plus force-pushed the zsystem_sup branch 2 times, most recently from 873ea68 to 982ba4b Compare February 16, 2022 16:37
  It covers SIMD operations for all datatypes starting
  from z/Arch11 a.k.a IBM Z13, except for single-precision
  which requires minimum z/Arch12 a.k.a IBMZ 14 to be dispatched.

  This patch rename the branch /simd/vsx to /simd/vec, the new
  the path is hold the definitions of universal intrinsics for
  both Power and Z architectures.

  This patch also adds new preprocessor identifiers:

    * NPY_SIMD_BIGENDIAN: 1 if the enabled SIMD extension
    is running on big-endian mode otherwise 0.

    * NPY_SIMD_F32: 1 if the enabled SIMD extension
    supports single-precision otherwise 0.
@seiko2plus
Copy link
Member Author

Github doesn't allow big body of the pr desc, the rest as follows:

VX
export NPY_DISABLE_CPU_FEATURES="VXE VXE2"
python runtests.py --bench-compare parent/main
       before           after         ratio
     [982fcd38]       [47d54c6d]
     <zsystem_sup~5>       <zsystem_sup>
+      35.8±0.2μs        141±0.7μs     3.95  bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 1000))
+      79.8±0.5μs          264±2μs     3.31  bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 100))
+       183±0.9μs          268±2μs     1.47  bench_function_base.Sort.time_sort('merge', 'uint32', ('sorted_block', 10))
+      7.37±0.1ms      10.1±0.05ms     1.37  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 1000000)
+      74.3±0.2μs          101±1μs     1.36  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 10000)
+     3.28±0.05ms       4.43±0.1ms     1.35  bench_reduce.AddReduceSeparate.time_reduce(1, 'float16')
+      83.4±0.9μs          102±1μs     1.22  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'h')
+      9.46±0.2ms       11.4±0.2ms     1.21  bench_reduce.AddReduce.time_axis_1
+       127±0.6μs          153±3μs     1.20  bench_function_base.Sort.time_argsort('quick', 'int32', ('reversed',))
+        85.1±1μs        102±0.4μs     1.20  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'h')
+      83.6±0.9μs       99.9±0.9μs     1.20  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'h')
+      84.4±0.3μs        101±0.3μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'h')
+        84.1±1μs       99.9±0.4μs     1.19  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'h')
+      84.5±0.6μs      100.0±0.6μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'h')
+        84.1±1μs       99.4±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'h')
+        84.7±3μs        100±0.4μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'b')
+        84.6±1μs        100.0±1μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'h')
+        560±20ms         661±20ms     1.18  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'wrap')
+        83.5±1μs      98.6±0.09μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'h')
+        84.9±1μs        100±0.4μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'h')
+        85.2±1μs        100±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'h')
+        85.5±2μs        101±0.5μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'h')
+        85.6±1μs        101±0.6μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'h')
+      84.4±0.2μs       99.3±0.3μs     1.18  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'h')
+      85.0±0.3μs       99.6±0.6μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'h')
+      84.9±0.7μs       99.6±0.5μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'b')
+      85.9±0.7μs        101±0.5μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'h')
+        85.6±1μs          100±2μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'h')
+        85.3±1μs       99.6±0.2μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'h')
+      85.3±0.9μs       99.5±0.7μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'b')
+      85.6±0.1μs       99.9±0.6μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'h')
+        84.7±1μs       98.8±0.7μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'b')
+        85.5±1μs       99.6±0.4μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'h')
+        86.5±2μs        101±0.4μs     1.17  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'h')
+      85.0±0.9μs       99.0±0.1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'h')
+        85.8±2μs       99.8±0.7μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'h')
+        86.1±1μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'h')
+      85.3±0.8μs       99.2±0.9μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'b')
+        86.3±1μs        100±0.4μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'h')
+      84.6±0.6μs       98.3±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'b')
+      85.0±0.6μs       98.6±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'b')
+        86.5±2μs          100±1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'b')
+        87.0±1μs        101±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'h')
+      84.9±0.3μs       98.5±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'b')
+      86.4±0.4μs        100±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'h')
+        87.0±2μs        101±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'h')
+      86.1±0.8μs       99.7±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'h')
+        86.6±2μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'h')
+      85.7±0.8μs         99.1±1μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'b')
+      87.3±0.9μs        101±0.5μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'h')
+      85.3±0.7μs       98.7±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'b')
+        85.0±1μs       98.3±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'b')
+      87.5±0.6μs        101±0.7μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'h')
+      86.3±0.8μs       99.8±0.3μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'b')
+      86.6±0.9μs        100±0.6μs     1.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'h')
+        85.7±2μs       99.0±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'h')
+        87.4±1μs        101±0.7μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'h')
+        85.2±1μs       98.3±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'b')
+        87.6±1μs        101±0.4μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'h')
+         301±4μs        347±0.8μs     1.15  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 1000))
+        87.5±1μs        101±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'h')
+        86.5±2μs       99.7±0.8μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'h')
+        87.8±2μs        101±0.9μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'b')
+        85.7±1μs       98.6±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'b')
+      85.7±0.9μs       98.6±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'b')
+      87.5±0.7μs        101±0.5μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'h')
+      86.1±0.6μs       99.0±0.7μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'b')
+      85.6±0.6μs       98.4±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'b')
+      86.3±0.3μs       99.2±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'b')
+        87.8±2μs        101±0.9μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'h')
+      87.5±0.6μs        100±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'h')
+        88.6±1μs        102±0.4μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'h')
+        86.1±2μs       98.8±0.2μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'b')
+      12.3±0.2μs       14.1±0.2μs     1.15  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'D')
+      85.8±0.8μs       98.3±0.3μs     1.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'b')
+     1.15±0.03μs      1.32±0.02μs     1.15  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 100)
+        86.9±2μs       99.4±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'h')
+        88.2±1μs        101±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'h')
+        88.9±1μs        102±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'h')
+      86.2±0.4μs       98.6±0.1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'b')
+      87.2±0.3μs       99.6±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'b')
+      87.7±0.2μs        100±0.6μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'h')
+      86.2±0.4μs       98.3±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'b')
+        87.0±2μs       99.0±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'b')
+        86.5±1μs       98.4±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'b')
+        89.1±2μs          101±1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'h')
+        86.8±1μs         98.6±1μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'b')
+        86.9±1μs       98.8±0.3μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'b')
+      87.7±0.4μs       99.7±0.2μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'h')
+        87.2±1μs       99.0±0.5μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'b')
+      87.2±0.4μs       99.0±0.2μs     1.14  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'h')
+        86.6±1μs       98.3±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'b')
+      87.5±0.4μs         99.2±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'b')
+      87.2±0.4μs       98.8±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'b')
+      87.7±0.3μs       99.4±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'b')
+        88.3±1μs       99.9±0.5μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'h')
+      87.0±0.3μs       98.4±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'b')
+        89.0±2μs          101±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'b')
+      86.9±0.8μs       98.2±0.3μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'b')
+      88.0±0.6μs         99.4±1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'b')
+        87.4±2μs       98.6±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'b')
+      88.0±0.9μs       99.3±0.6μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'b')
+         326±1μs          368±2μs     1.13  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 1000))
+        87.3±2μs       98.5±0.1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'b')
+      87.1±0.4μs       98.2±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'b')
+      87.2±0.6μs       98.2±0.4μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'b')
+        87.7±2μs       98.7±0.1μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'b')
+        87.4±2μs       98.4±0.2μs     1.13  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'b')
+      88.8±0.4μs       99.8±0.7μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'h')
+        88.3±2μs       99.0±0.4μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'b')
+        88.4±2μs       98.9±0.5μs     1.12  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'b')
+        89.2±1μs       99.2±0.4μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'h')
+         120±4μs          133±3μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'L')
+      89.7±0.6μs       99.5±0.5μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'b')
+        88.7±2μs       98.3±0.1μs     1.11  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'b')
+        89.4±2μs       98.8±0.3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'b')
+      79.2±0.1μs       87.4±0.2μs     1.10  bench_function_base.Where.time_interleaved_zeros_x8
+         133±2μs          147±3μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'Q')
+      13.4±0.2μs       14.7±0.3μs     1.10  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'D')
+         505±4μs        554±0.6μs     1.10  bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',))
+     6.90±0.05ms      7.56±0.02ms     1.10  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 1000000)
+        89.9±2μs       98.5±0.1μs     1.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'b')
+        89.9±3μs       98.4±0.6μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'b')
+         330±3μs          361±3μs     1.09  bench_function_base.Sort.time_argsort('quick', 'int32', ('sorted_block', 1000))
+         134±3μs          146±4μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'q')
+      70.1±0.3μs       76.6±0.6μs     1.09  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 10000)
+         119±2μs          130±2μs     1.09  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'Q')
+         409±2μs        443±0.5μs     1.08  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 100))
+       371±0.4μs          401±3μs     1.08  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 100))
+     2.68±0.03μs      2.89±0.08μs     1.08  bench_core.Core.time_hstack_l
+     17.2±0.04μs       18.6±0.7μs     1.08  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int32'>)
+      86.8±0.3μs       93.1±0.5μs     1.07  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 100))
+         123±2μs          131±2μs     1.07  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'L')
+      13.5±0.1μs       14.4±0.3μs     1.07  bench_ma.MA.time_masked_array_l100_t100
+       139±0.6μs          148±2μs     1.07  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'Q')
+     17.8±0.07μs       19.0±0.4μs     1.07  bench_lib.Nan.time_nanmean(200, 0)
+     5.04±0.06ms       5.37±0.1ms     1.07  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'wrap')
+     10.3±0.06μs       11.0±0.3μs     1.07  bench_ma.MA.time_masked_array_l100
+     19.3±0.06μs       20.6±0.6μs     1.07  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
+     1.15±0.02μs      1.23±0.03μs     1.07  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int64')
+     1.20±0.01ms      1.28±0.05ms     1.06  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int16'>)
+        55.1±1ms         58.6±2ms     1.06  bench_ma.Concatenate.time_it('masked', 2000)
+         128±2ms          136±2ms     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'constant')
+      21.0±0.1μs       22.3±0.4μs     1.06  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
+        638±10μs         677±20μs     1.06  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'numpy.int16'>)
+     1.32±0.01μs      1.40±0.05μs     1.06  bench_core.Core.time_ones_100
+     18.3±0.08μs       19.4±0.4μs     1.06  bench_ma.UFunc.time_2d(True, True, 10)
+        64.8±1ms         68.7±1ms     1.06  bench_ma.Concatenate.time_it('unmasked+masked', 2000)
+         108±2μs        115±0.4μs     1.06  bench_function_base.Sort.time_argsort('merge', 'int32', ('sorted_block', 100))
+         335±3μs          355±5μs     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'linear_ramp')
+     27.9±0.09μs       29.5±0.5μs     1.06  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'constant')
+         492±1ns          520±8ns     1.06  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs([1])
+     9.82±0.04μs       10.4±0.3μs     1.06  bench_lib.Unique.time_unique(200, 90.0)
+         565±2μs          597±3μs     1.06  bench_function_base.Sort.time_argsort('heap', 'float64', ('reversed',))
+         110±1μs          116±4μs     1.06  bench_lib.Pad.time_pad((256, 128, 1), 1, 'reflect')
+      60.7±0.6ms         64.1±2ms     1.06  bench_ma.Concatenate.time_it('ndarray+masked', 2000)
+     4.27±0.01μs       4.50±0.1μs     1.06  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'numpy.int64'>)
+      20.1±0.1μs       21.2±0.6μs     1.06  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
+      14.6±0.2μs      15.4±0.04μs     1.06  bench_ma.UFunc.time_scalar_1d(False, False, 100)
+      14.8±0.2μs       15.6±0.2μs     1.06  bench_ma.UFunc.time_scalar_1d(False, False, 1000)
+     10.8±0.02μs      11.4±0.07μs     1.06  bench_lib.Unique.time_unique(200, 0)
+        837±20ns         883±20ns     1.05  bench_io.Copy.time_memcpy('int8')
+       110±0.4μs          116±1μs     1.05  bench_function_base.Sort.time_argsort('merge', 'float32', ('sorted_block', 100))
+      21.2±0.2μs       22.4±0.8μs     1.05  bench_ma.UFunc.time_scalar_1d(False, True, 100)
+      14.5±0.2μs       15.3±0.1μs     1.05  bench_ma.UFunc.time_scalar_1d(False, False, 10)
+     2.18±0.01ms      2.30±0.04ms     1.05  bench_indexing.IndexingSeparate.time_mmap_fancy_indexing
+        673±10ns          709±7ns     1.05  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'numpy.int8'>)
+         532±2μs          559±2μs     1.05  bench_function_base.Sort.time_argsort('heap', 'float32', ('ordered',))
+     2.85±0.01μs      3.00±0.02μs     1.05  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'str'>)
+        1.52±0μs      1.60±0.05μs     1.05  bench_reduce.AnyAll.time_all_fast
+     1.21±0.02ms      1.27±0.05ms     1.05  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'numpy.int64'>)
+      56.2±0.2μs       59.1±0.9μs     1.05  bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 1000))
-        90.6±1μs       86.2±0.3μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'H')
-         344±5μs          327±1μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 1, 'd')
-       342±0.9μs          325±2μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 2, 2, 'd')
-     1.41±0.01ms      1.34±0.02ms     0.95  bench_lib.Nan.time_nanvar(200000, 0)
-         215±2μs          204±2μs     0.95  bench_function_base.Sort.time_sort('quick', 'float32', ('reversed',))
-     1.07±0.02ms      1.02±0.01ms     0.95  bench_reduce.AddReduceSeparate.time_reduce(0, 'longfloat')
-         718±7μs          682±5μs     0.95  bench_indexing.Indexing.time_op('indexes_rand_', 'np.ix_(I, I)', '=1')
-      13.2±0.4ms      12.5±0.07ms     0.95  bench_linalg.Linalg.time_op('svd', 'complex128')
-        71.4±1μs       67.7±0.9μs     0.95  bench_function_base.Sort.time_sort('quick', 'int32', ('ordered',))
-      88.2±0.5μs       83.6±0.8μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'H')
-         273±6μs          259±6μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'd')
-      62.2±0.9μs       59.0±0.3μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 2, 'd')
-      74.8±0.4μs       70.8±0.1μs     0.95  bench_ufunc.CustomArrayFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 10000)
-         346±4μs          328±1μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 1, 'd')
-      92.7±0.7μs         87.8±1μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'H')
-         347±5μs          328±3μs     0.95  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sqrt'>, 4, 2, 'd')
-        88.3±1μs       83.6±0.4μs     0.95  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'H')
-     1.41±0.01ms      1.34±0.01ms     0.95  bench_lib.Nan.time_nanvar(200000, 0.1)
-       151±0.8μs        143±0.8μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('reversed',))
-         246±2ms          232±4ms     0.94  bench_app.LaplaceInplace.time_it('normal')
-        89.0±1μs       83.8±0.6μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'H')
-         347±3μs          327±1μs     0.94  bench_ufunc.UFunc.time_ufunc_types('square')
-     1.54±0.01ms      1.45±0.01ms     0.94  bench_lib.Nan.time_nanargmin(200000, 50.0)
-      91.1±0.9μs       85.5±0.8μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'H')
-        90.3±1μs       84.7±0.9μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'H')
-         430±1μs          404±2μs     0.94  bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 100))
-       526±0.5μs          493±4μs     0.94  bench_function_base.Sort.time_sort('heap', 'float64', ('reversed',))
-         478±7μs          448±3μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 10))
-        92.8±2μs         87.0±1μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'H')
-        63.4±2μs       59.5±0.1μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc '_ones_like'>, 2, 1, 'f')
-     2.31±0.01ms      2.16±0.02ms     0.94  bench_lib.Nan.time_nanvar(200000, 90.0)
-        79.5±1μs         74.4±1μs     0.94  bench_function_base.Sort.time_argsort('quick', 'int32', ('ordered',))
-       272±0.7μs          255±3μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'd')
-     2.33±0.01ms      2.18±0.02ms     0.94  bench_lib.Nan.time_nanstd(200000, 90.0)
-      87.9±0.6μs       82.3±0.5μs     0.94  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'H')
-         273±2μs          255±2μs     0.94  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'd')
-     16.3±0.08μs         15.3±1μs     0.93  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'numpy.int64'>)
-     1.43±0.02ms      1.33±0.03ms     0.93  bench_lib.Pad.time_pad((1024, 1024), 1, 'mean')
-        89.1±1μs       83.1±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'H')
-         275±1μs          256±3μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'd')
-         725±3μs          675±3μs     0.93  bench_indexing.Indexing.time_op('indexes_', 'np.ix_(I, I)', '=1')
-        91.8±1μs       85.4±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'H')
-        91.0±2μs       84.6±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'H')
-        92.0±2μs       85.6±0.4μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'H')
-      88.0±0.8μs       81.8±0.3μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'H')
-        93.4±2μs         86.7±1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'H')
-      91.9±0.6μs         85.3±1μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'H')
-        89.9±1μs       83.4±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'H')
-        92.0±1μs       85.3±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'H')
-      91.1±0.5μs       84.5±0.6μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'H')
-         276±4μs          256±4μs     0.93  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'd')
-      88.6±0.7μs       82.1±0.5μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'H')
-      92.5±0.8μs       85.6±0.9μs     0.93  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'H')
-        93.4±1μs       86.3±0.5μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'H')
-        90.8±2μs       83.9±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'H')
-      95.0±0.3μs       87.8±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'H')
-        89.6±1μs       82.7±0.2μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'H')
-      91.5±0.5μs       84.5±0.2μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'H')
-        89.4±1μs       82.4±0.3μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'H')
-      89.9±0.8μs       82.9±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'H')
-      92.4±0.9μs       85.1±0.4μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'H')
-         420±3μs          386±2μs     0.92  bench_ufunc.UFunc.time_ufunc_types('multiply')
-        91.9±2μs         84.6±1μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'H')
-        93.8±1μs       86.2±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'H')
-     1.38±0.04ms      1.27±0.03ms     0.92  bench_lib.Pad.time_pad((1024, 1024), 8, 'mean')
-      91.8±0.8μs       84.3±0.3μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'H')
-        91.5±1μs       83.9±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'H')
-      95.3±0.9μs       87.4±0.6μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'H')
-         265±3μs          243±4μs     0.92  bench_ufunc.UFunc.time_ufunc_types('minimum')
-        94.0±2μs       86.1±0.7μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'H')
-        93.6±1μs       85.8±0.5μs     0.92  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'H')
-      90.8±0.8μs       83.1±0.4μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'H')
-        91.5±2μs       83.6±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'H')
-         557±3μs          509±1μs     0.91  bench_function_base.Sort.time_sort('heap', 'float32', ('reversed',))
-      91.2±0.7μs       83.3±0.6μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'H')
-        91.4±1μs       83.4±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'H')
-         277±1μs        253±0.5μs     0.91  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'd')
-     6.03±0.07μs      5.49±0.02μs     0.91  bench_itemselection.PutMask.time_dense(False, 'complex256')
-        91.0±1μs       83.0±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'H')
-         277±1μs          253±2μs     0.91  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'd')
-        92.3±1μs       84.1±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'H')
-      94.2±0.8μs       85.6±0.8μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'H')
-      93.0±0.2μs       84.5±0.4μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'H')
-      87.4±0.9μs       79.4±0.5μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'q')
-      91.4±0.9μs       82.9±0.3μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'H')
-      89.9±0.5μs       81.4±0.2μs     0.91  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'H')
-         503±3μs          455±2μs     0.91  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 100))
-        93.1±1μs       84.1±0.4μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'H')
-        92.9±2μs       84.0±0.5μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'H')
-         269±4μs          243±2μs     0.90  bench_ufunc.UFunc.time_ufunc_types('maximum')
-         315±3μs          284±3μs     0.90  bench_ufunc.UFunc.time_ufunc_types('subtract')
-        77.9±1μs       70.0±0.3μs     0.90  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'I')
-         362±4μs          324±2μs     0.90  bench_function_base.Sort.time_argsort('quick', 'uint32', ('sorted_block', 1000))
-        85.5±2μs       76.5±0.7μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'i')
-        78.6±2μs       70.2±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'i')
-        76.6±1μs       68.4±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'I')
-         107±6μs         95.4±1μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'd')
-        78.9±1μs       70.4±0.6μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'I')
-      76.6±0.9μs       68.3±0.1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'i')
-         108±4μs         96.5±3μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'd')
-        83.4±1μs         74.3±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'Q')
-        78.2±1μs       69.6±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'i')
-        79.6±2μs       70.8±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'i')
-        82.5±1μs         73.4±1μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'i')
-        82.0±1μs       72.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'i')
-         938±6μs         833±10μs     0.89  bench_lib.Nan.time_nanargmax(200000, 90.0)
-         413±1μs          367±3μs     0.89  bench_ufunc.UFunc.time_ufunc_types('rint')
-        79.3±1μs       70.4±0.9μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'I')
-         110±3μs         97.9±2μs     0.89  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'd')
-         391±5μs          348±2μs     0.89  bench_ufunc.UFunc.time_ufunc_types('fmin')
-      77.3±0.9μs       68.7±0.2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'i')
-        78.5±2μs       69.7±0.3μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'i')
-      89.7±0.9μs       79.6±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'L')
-        88.9±2μs       78.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'Q')
-      77.7±0.6μs       68.9±0.5μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'I')
-        78.3±2μs       69.3±0.4μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'I')
-      89.4±0.4μs         79.2±2μs     0.89  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'q')
-      77.7±0.6μs       68.6±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'i')
-        88.8±2μs         78.5±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'I')
-      82.8±0.6μs       73.2±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'i')
-        81.7±3μs       72.1±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'i')
-         6.63±0s          5.85±0s     0.88  bench_ufunc_strides.LogisticRegression.time_train(<class 'numpy.float32'>)
-      4.01±0.1ms      3.53±0.07ms     0.88  bench_core.VarComplex.time_var(1000000)
-        69.6±1ms       61.4±0.9ms     0.88  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>)
-        78.7±1μs       69.3±0.5μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'i')
-        89.1±1μs         78.5±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'Q')
-        77.1±2μs       67.9±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'I')
-        83.2±1μs       73.2±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'Q')
-        78.5±1μs       69.0±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'i')
-        88.4±1μs       77.7±0.4μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'l')
-        88.8±1μs       78.0±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'l')
-      81.8±0.9μs       71.8±0.7μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'I')
-        82.0±1μs       72.0±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'i')
-        81.5±2μs       71.6±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'I')
-        90.4±2μs         79.3±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'I')
-        82.3±1μs         72.2±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'I')
-        79.0±1μs       69.3±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'I')
-        78.0±1μs       68.4±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'I')
-        77.6±1μs       68.1±0.2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'i')
-        89.2±2μs       78.2±0.6μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'l')
-        82.1±1μs         71.9±1μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'i')
-        79.7±1μs       69.9±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'I')
-        77.7±1μs       68.2±0.3μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'i')
-      81.5±0.2μs       71.4±0.9μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'i')
-        90.3±1μs         79.0±2μs     0.88  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'L')
-         473±3μs          414±2μs     0.88  bench_lib.Nan.time_nanargmax(200000, 0.1)
-      61.4±0.9μs       53.8±0.6μs     0.88  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 2, 1, 'd')
-      78.1±0.2μs       68.3±0.2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'i')
-         150±1μs          131±3μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'd')
-        82.9±2μs       72.5±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'i')
-      81.5±0.6μs       71.3±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'I')
-         639±2μs          559±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'f')
-        79.7±2μs       69.7±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'i')
-        84.4±2μs       73.8±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'i')
-         473±3μs          413±2μs     0.87  bench_lib.Nan.time_nanargmax(200000, 0)
-         637±3μs          556±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'f')
-        82.9±1μs       72.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'i')
-         523±1μs          457±3μs     0.87  bench_lib.Nan.time_nanargmax(200000, 2.0)
-      89.5±0.8μs       78.1±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'Q')
-      89.1±0.8μs         77.8±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'q')
-      80.8±0.7μs       70.5±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'i')
-      62.7±0.9μs       54.7±0.7μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 2, 'd')
-      82.0±0.6μs       71.5±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'I')
-         636±1μs          555±4μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'f')
-        82.5±1μs       71.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'I')
-        86.1±2μs       75.1±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'I')
-         638±1μs          556±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'f')
-        78.0±2μs       67.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'I')
-         639±2μs          557±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'f')
-         108±1ms         93.9±3ms     0.87  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'str'>)
-     1.49±0.01μs      1.29±0.01μs     0.87  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int16')
-         641±2μs          558±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'f')
-        84.9±2μs       73.9±0.3μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'q')
-        90.8±2μs         79.0±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'i')
-         527±5μs          458±9μs     0.87  bench_lib.Nan.time_nanargmin(200000, 2.0)
-        85.5±1μs         74.4±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'I')
-        90.0±1μs         78.3±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'i')
-        90.3±2μs         78.6±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'I')
-      83.1±0.9μs       72.3±0.2μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'q')
-         641±1μs          557±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'f')
-      82.6±0.9μs       71.8±0.6μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'i')
-         642±3μs          557±7μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'f')
-         110±3μs         95.6±1μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'd')
-      78.8±0.6μs       68.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'i')
-       639±0.6μs          554±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'f')
-     1.49±0.01μs         1.30±0μs     0.87  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float16')
-        90.3±2μs       78.4±0.4μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'i')
-        82.9±2μs       71.9±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'I')
-         112±2μs         97.2±2μs     0.87  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'd')
-         640±3μs          555±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'f')
-         636±2μs          552±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'f')
-         642±3μs          557±3μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'f')
-        85.6±1μs       74.3±0.8μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'i')
-         641±6μs          555±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'f')
-         640±3μs          555±1μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'f')
-         642±2μs          557±5μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'f')
-         641±1μs          556±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'f')
-         490±2μs          425±1μs     0.87  bench_function_base.Sort.time_sort('heap', 'float64', ('ordered',))
-        89.9±1μs         77.9±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'L')
-        84.1±1μs       72.8±0.9μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'I')
-      82.2±0.9μs         71.2±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'i')
-         639±3μs          553±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'f')
-         638±3μs          552±2μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'f')
-        82.6±2μs       71.5±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'I')
-        86.7±1μs       75.0±0.5μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'I')
-      81.1±0.4μs       70.2±0.7μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'I')
-         638±1μs        552±0.6μs     0.87  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'f')
-        90.0±2μs         77.9±1μs     0.87  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'i')
-         642±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'f')
-         638±1μs        551±0.8μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'f')
-        88.7±1μs         76.7±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'I')
-         643±5μs          556±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'f')
-        83.8±1μs         72.5±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'I')
-      90.5±0.9μs       78.2±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'L')
-         644±2μs          557±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'f')
-      88.1±0.5μs       76.1±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'i')
-         642±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'f')
-         659±7μs          570±7μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'f')
-         641±2μs          554±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'f')
-      81.5±0.9μs       70.4±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'I')
-      84.3±0.9μs       72.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'i')
-        89.4±1μs       77.2±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'q')
-         652±8μs          563±5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'f')
-         643±4μs          556±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'f')
-        82.8±2μs         71.5±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'i')
-         639±2μs        551±0.6μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'f')
-        80.8±1μs       69.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'i')
-         642±3μs          554±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'f')
-      83.6±0.7μs       72.2±0.9μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'L')
-        81.5±2μs       70.3±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'I')
-         639±3μs          552±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'f')
-        81.0±2μs       69.8±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'i')
-        79.2±2μs       68.3±0.3μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'I')
-         643±2μs        554±0.9μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'f')
-        83.3±1μs       71.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'i')
-         641±2μs          553±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'f')
-         640±2μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'f')
-        83.4±2μs       71.9±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'I')
-         639±2μs        551±0.5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'f')
-       640±0.7μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'f')
-        83.8±1μs       72.3±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'Q')
-        84.9±2μs       73.2±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'L')
-         642±2μs          553±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'f')
-         641±3μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'f')
-         643±2μs          554±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'f')
-         641±3μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'f')
-        84.3±2μs       72.7±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'l')
-         645±4μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'f')
-      86.9±0.7μs       74.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'I')
-        89.7±1μs         77.3±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'l')
-     2.47±0.02μs      2.13±0.02μs     0.86  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float16')
-         642±1μs          553±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'f')
-         641±5μs          552±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'f')
-         642±1μs          553±3μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'f')
-        90.6±1μs         77.9±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'Q')
-        82.6±1μs       71.0±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'i')
-         645±2μs          555±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'f')
-      83.5±0.8μs       71.8±0.6μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'l')
-         391±2μs          336±4μs     0.86  bench_ufunc.UFunc.time_ufunc_types('fmax')
-        83.4±1μs       71.7±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'L')
-        89.3±2μs       76.8±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'Q')
-      86.9±0.8μs       74.7±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'i')
-        85.3±1μs       73.3±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'l')
-      87.0±0.9μs       74.7±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'I')
-         108±2ms         93.0±2ms     0.86  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'str'>)
-        85.5±1μs       73.4±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'i')
-         645±6μs          553±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'f')
-        83.7±1μs         71.9±1μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'I')
-         644±3μs          553±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'f')
-         643±2μs          551±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'f')
-        91.5±2μs       78.4±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'i')
-        87.6±1μs       75.1±0.7μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'i')
-        84.7±1μs       72.6±0.5μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'q')
-         644±3μs          552±2μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'f')
-        69.4±2ms         59.5±1ms     0.86  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>)
-         482±6μs          413±6μs     0.86  bench_lib.Nan.time_nanargmin(200000, 0)
-      79.2±0.5μs       67.8±0.4μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'I')
-         645±5μs          552±1μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'f')
-        90.0±1μs       77.0±0.8μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'L')
-         649±4μs          556±4μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'f')
-         647±2μs          554±5μs     0.86  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'f')
-         521±3μs          446±1μs     0.86  bench_function_base.Sort.time_sort('heap', 'float32', ('ordered',))
-      79.7±0.4μs       68.2±0.2μs     0.86  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'I')
-         648±7μs          554±2μs     0.85  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'f')
-        85.0±2μs       72.7±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'l')
-      91.5±0.7μs       78.2±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'q')
-        86.0±1μs         73.4±1μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'Q')
-         149±4μs          127±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'd')
-        84.9±1μs       72.4±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'L')
-      83.5±0.8μs       71.3±0.7μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'i')
-      86.5±0.8μs         73.8±1μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'Q')
-        85.3±2μs       72.8±0.3μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'L')
-        84.7±1μs       72.3±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'I')
-      83.0±0.9μs       70.7±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'I')
-     2.47±0.02μs      2.10±0.02μs     0.85  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int16')
-      58.6±0.6μs       50.0±0.6μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'd')
-        87.3±1μs       74.4±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'i')
-         109±3μs         92.6±4μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'd')
-        91.4±1μs         77.8±2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'I')
-         653±5μs          556±3μs     0.85  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'f')
-      83.9±0.8μs       71.4±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'I')
-      57.8±0.6μs       49.2±0.5μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'd')
-      84.7±0.8μs       72.0±0.6μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'i')
-        82.4±2μs       70.1±0.2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'I')
-      82.7±0.8μs       70.3±0.5μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'i')
-         260±1μs          221±2μs     0.85  bench_ufunc.UFunc.time_ufunc_types('floor')
-        86.2±2μs       73.3±0.9μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'q')
-        84.4±2μs       71.7±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'I')
-        82.7±2μs       70.2±0.4μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'i')
-        87.8±1μs       74.5±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'I')
-        85.6±1μs       72.7±0.2μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'I')
-         151±2μs          128±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'd')
-     5.24±0.01μs      4.44±0.04μs     0.85  bench_lib.Nan.time_nanmin(200, 2.0)
-        91.8±1μs       77.8±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'L')
-        92.1±2μs       78.0±0.8μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'I')
-      84.1±0.8μs       71.1±0.3μs     0.85  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'I')
-         258±3μs          218±5μs     0.85  bench_ufunc.UFunc.time_ufunc_types('trunc')
-         327±4μs         277±10μs     0.85  bench_ufunc.UFunc.time_ufunc_types('add')
-        78.1±2μs         66.0±3μs     0.85  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'd')
-        90.4±1μs       76.3±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'Q')
-      91.4±0.8μs         77.1±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'l')
-        83.3±1μs       70.2±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'I')
-      85.3±0.3μs       71.9±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'q')
-        91.5±3μs         77.1±1μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'l')
-        483±10μs          407±5μs     0.84  bench_lib.Nan.time_nanargmin(200000, 0.1)
-        94.9±1μs       79.9±0.3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 4, 'B')
-      94.3±0.6μs       79.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 4, 'B')
-      85.0±0.9μs       71.5±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'q')
-      84.3±0.7μs       70.9±0.2μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'I')
-     5.28±0.08μs      4.43±0.07μs     0.84  bench_lib.Nan.time_nanmin(200, 0.1)
-      85.8±0.5μs       72.1±0.4μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'L')
-      85.3±0.3μs       71.6±0.3μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'Q')
-      94.5±0.9μs       79.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 1, 'B')
-        993±60μs          833±9μs     0.84  bench_lib.Nan.time_nanargmin(200000, 90.0)
-        60.4±1μs       50.7±0.5μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'd')
-        91.5±2μs       76.8±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'q')
-      18.9±0.3ms       15.9±0.9ms     0.84  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'str'>)
-      95.3±0.6μs       79.9±0.9μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 2, 'B')
-     1.24±0.02ms      1.04±0.06ms     0.84  bench_core.Temporaries.time_large2
-      94.5±0.5μs       79.1±0.4μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 2, 'B')
-      93.4±0.6μs       78.3±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 1, 'B')
-        59.3±1μs       49.7±0.5μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'd')
-      95.9±0.4μs       80.1±0.7μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 4, 'B')
-      95.3±0.9μs       79.7±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 4, 4, 'B')
-        95.1±1μs       79.5±0.5μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 4, 'B')
-         259±2μs          217±3μs     0.84  bench_ufunc.UFunc.time_ufunc_types('ceil')
-      93.6±0.5μs       78.2±0.8μs     0.84  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 1, 'B')
-         113±2μs         94.8±3μs     0.84  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'd')
-        85.7±1μs       71.6±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'l')
-        93.6±1μs       78.0±0.9μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 4, 'B')
-         113±3μs         94.5±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'd')
-        94.0±1μs       78.3±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 1, 'B')
-      93.0±0.1μs       77.4±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 2, 'B')
-     5.32±0.01μs      4.43±0.04μs     0.83  bench_lib.Nan.time_nanmax(200, 0)
-      94.4±0.3μs       78.6±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 1, 'B')
-     5.31±0.07μs      4.42±0.04μs     0.83  bench_lib.Nan.time_nanmin(200, 0)
-      93.2±0.5μs         77.5±2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 1, 'B')
-        76.7±2μs         63.7±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'd')
-         153±3μs          127±3μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'd')
-      92.7±0.2μs       77.0±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 2, 2, 'B')
-      95.1±0.8μs       79.0±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 4, 2, 'B')
-      94.3±0.9μs       78.3±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 2, 'B')
-     94.4±0.09μs       78.4±0.6μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 4, 'B')
-      95.6±0.9μs         79.4±1μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 2, 'B')
-        94.6±1μs       78.5±0.2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 4, 'B')
-        75.5±2μs         62.7±2μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'd')
-        94.6±1μs       78.4±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 4, 'B')
-      61.4±0.7μs       50.9±0.4μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'sign'>, 1, 1, 'd')
-      86.9±0.9μs       72.0±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'l')
-     5.34±0.04μs      4.42±0.08μs     0.83  bench_lib.Nan.time_nanmax(200, 0.1)
-        95.1±1μs       78.8±0.5μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 4, 1, 'B')
-        50.1±1μs       41.5±0.8μs     0.83  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'd')
-      94.3±0.8μs       78.0±0.2μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 4, 'B')
-      94.7±0.8μs       78.2±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 4, 1, 'B')
-      94.3±0.7μs       77.8±0.3μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 2, 'B')
-        94.6±1μs       78.1±0.4μs     0.83  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 4, 'B')
-      94.8±0.2μs       78.1±0.1μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 4, 2, 'B')
-      94.6±0.4μs       77.8±0.6μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 2, 'B')
-        95.6±1μs       78.6±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 4, 4, 'B')
-      92.7±0.3μs       76.2±0.8μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 2, 'B')
-      93.3±0.4μs       76.7±0.8μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 1, 'B')
-      92.7±0.4μs       76.2±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 2, 'B')
-      93.1±0.3μs       76.5±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 2, 'B')
-      93.5±0.5μs       76.8±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 4, 'B')
-         154±4μs          126±1μs     0.82  bench_function_base.Sort.time_argsort('quick', 'uint32', ('reversed',))
-      17.5±0.5μs       14.3±0.3μs     0.82  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'D')
-     1.23±0.03ms      1.01±0.04ms     0.82  bench_core.Temporaries.time_large
-      95.5±0.6μs       78.1±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 2, 4, 'B')
-      51.8±0.4μs         42.3±1μs     0.82  bench_core.VarComplex.time_var(10000)
-        79.6±2μs         65.0±3μs     0.82  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'd')
-      94.0±0.8μs       76.7±0.6μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 2, 1, 'B')
-        94.6±1μs       77.2±0.4μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 4, 'B')
-         192±2μs        157±0.3μs     0.82  bench_reduce.ArgMin.time_argmin(<class 'numpy.float32'>)
-        94.1±1μs       76.7±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 1, 'B')
-     5.40±0.03μs      4.41±0.01μs     0.82  bench_lib.Nan.time_nanmin(200, 90.0)
-      92.9±0.2μs       75.7±0.5μs     0.82  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 2, 'B')
-     5.43±0.03μs      4.43±0.07μs     0.81  bench_lib.Nan.time_nanmax(200, 2.0)
-      94.1±0.2μs       76.6±0.2μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 2, 1, 'B')
-      92.8±0.3μs       75.5±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 1, 'B')
-      93.7±0.6μs         76.0±3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 1, 2, 'B')
-         196±3μs        159±0.8μs     0.81  bench_reduce.ArgMax.time_argmax(<class 'numpy.float32'>)
-       121±0.4ms         98.0±2ms     0.81  bench_app.LaplaceInplace.time_it('inplace')
-      95.3±0.9μs       77.1±0.3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 4, 'B')
-      94.2±0.8μs       76.2±0.3μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 2, 2, 1, 'B')
-      93.8±0.4μs       75.8±0.6μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 2, 'B')
-        94.8±1μs       76.5±0.4μs     0.81  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 4, 1, 2, 'B')
-     5.43±0.02μs      4.37±0.02μs     0.81  bench_lib.Nan.time_nanmax(200, 90.0)
-        76.2±2μs         61.3±2μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'd')
-         119±4μs         95.4±2μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'd')
-      94.0±0.8μs       75.6±0.5μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 1, 'B')
-         120±2μs         96.3±3μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'd')
-      95.9±0.7μs       77.0±0.4μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 2, 2, 'B')
-        95.7±2μs       76.6±0.2μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 2, 1, 4, 'B')
-        94.5±1μs       75.7±0.4μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 4, 1, 1, 'B')
-      95.3±0.6μs       76.1±0.3μs     0.80  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 4, 'B')
-      51.2±0.3μs       40.8±0.4μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'd')
-        50.9±2μs       40.6±0.4μs     0.80  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'd')
-        52.0±1μs       41.3±0.6μs     0.79  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'd')
-         488±5μs          387±3μs     0.79  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 1000))
-         371±6μs          291±5μs     0.79  bench_core.VarComplex.time_var(100000)
-      8.18±0.4ms       6.39±0.3ms     0.78  bench_ufunc.Broadcast.time_broadcast
-      42.6±0.1μs       32.5±0.3μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'd')
-      42.7±0.5μs       32.4±0.2μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'd')
-         168±3μs          127±4μs     0.76  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'd')
-     4.95±0.03ms      3.71±0.06ms     0.75  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'f')
-     4.88±0.02ms      3.63±0.01ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'f')
-     5.90±0.04μs      4.38±0.03μs     0.74  bench_lib.Nan.time_nanmin(200, 50.0)
-     4.87±0.01ms         3.61±0ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'f')
-     4.89±0.03ms      3.63±0.01ms     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'f')
-        67.3±2μs       49.7±0.9μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'd')
-      43.4±0.5μs       32.0±0.3μs     0.74  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'd')
-     6.05±0.05μs      4.45±0.06μs     0.74  bench_lib.Nan.time_nanmax(200, 50.0)
-        43.5±1μs       32.0±0.2μs     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'd')
-     4.95±0.06ms      3.63±0.02ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'f')
-     4.94±0.06ms      3.62±0.01ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'f')
-     4.95±0.02ms      3.62±0.02ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'f')
-     4.94±0.04ms      3.61±0.01ms     0.73  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'f')
-      18.7±0.6μs      13.5±0.06μs     0.72  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'D')
-     5.04±0.03ms      3.63±0.05ms     0.72  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'f')
-     11.3±0.07μs      8.02±0.05μs     0.71  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float64'>)
-         133±2μs         94.0±1μs     0.71  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'd')
-      14.8±0.3μs       10.4±0.3μs     0.70  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'D')
-      58.7±0.2μs       40.7±0.7μs     0.69  bench_core.Temporaries.time_mid2
-      59.0±0.5μs       40.7±0.9μs     0.69  bench_core.Temporaries.time_mid
-      14.8±0.3μs      10.2±0.09μs     0.69  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'D')
-         540±2μs          371±1μs     0.69  bench_reduce.AddReduceSeparate.time_reduce(0, 'float64')
-         140±4μs         96.0±3μs     0.69  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'd')
-        92.0±3μs         63.1±3μs     0.69  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'd')
-         187±4μs          128±4μs     0.68  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'd')
-      21.0±0.3μs       14.4±0.2μs     0.68  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 4, 'F')
-      21.3±0.5μs       14.4±0.3μs     0.68  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 4, 'F')
-      12.9±0.2μs       8.68±0.2μs     0.67  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'D')
-      64.6±0.5μs         43.2±1μs     0.67  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'd')
-      81.7±0.5μs       54.6±0.6μs     0.67  bench_ufunc.CustomInplace.time_double_add_temp
-        65.4±1μs       43.1±0.9μs     0.66  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'd')
-      65.7±0.6μs       43.3±0.8μs     0.66  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'd')
-     1.23±0.05ms         803±10μs     0.65  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex128')
-      73.7±0.4μs       47.6±0.2μs     0.65  bench_ufunc.CustomInplace.time_double_add
-      40.9±0.4μs       26.2±0.3μs     0.64  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'd')
-        64.4±1μs       40.9±0.5μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'd')
-      20.9±0.3μs      13.2±0.08μs     0.63  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 2, 'F')
-      41.2±0.9μs       26.0±0.1μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'd')
-        68.9±2μs       43.2±0.7μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'd')
-      41.3±0.6μs       25.8±0.4μs     0.63  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'd')
-        69.3±2μs       43.3±0.8μs     0.62  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'd')
-      41.2±0.7μs       25.7±0.2μs     0.62  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'd')
-      20.9±0.4μs       12.8±0.8μs     0.61  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 1, 'F')
-      21.1±0.3μs       12.9±0.2μs     0.61  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('add', 2, 'F')
-      21.0±0.4μs       12.6±0.4μs     0.60  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'F')
-      81.6±0.6μs       49.0±0.5μs     0.60  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'd')
-      13.6±0.4μs       7.80±0.3μs     0.57  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('subtract', 1, 'D')
-       158±0.8μs         88.8±1μs     0.56  bench_reduce.ArgMax.time_argmax(<class 'numpy.float64'>)
-         157±1μs       88.1±0.3μs     0.56  bench_reduce.ArgMin.time_argmin(<class 'numpy.float64'>)
-     1.96±0.01ms         1.08±0ms     0.55  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex64')
-      66.5±0.3μs      33.8±0.05μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 43)
-     66.7±0.09μs       33.9±0.1μs     0.51  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -43)
-        82.3±2μs       41.7±0.6μs     0.51  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'd')
-      71.7±0.4μs       35.0±0.1μs     0.49  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, 8)
-      71.6±0.3μs      34.4±0.06μs     0.48  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int64'>, -8)
-        66.2±1μs       31.4±0.1μs     0.47  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'd')
-        88.1±3μs         41.3±1μs     0.47  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'd')
-      78.7±0.7μs       35.7±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'Q')
-        79.1±1μs       35.8±0.7μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'l')
-        79.8±1μs       35.8±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'L')
-        79.4±1μs       35.5±0.5μs     0.45  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'q')
-         434±1μs          194±4μs     0.45  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 4, 'd')
-        80.5±1μs       35.6±0.5μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'L')
-      12.6±0.2μs      5.57±0.02μs     0.44  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (1))
-         443±2μs          196±3μs     0.44  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 4, 'd')
-      12.6±0.2μs      5.57±0.01μs     0.44  bench_reduce.MinMax.time_min(<class 'numpy.uint64'>)
-      13.2±0.3μs      5.82±0.02μs     0.44  bench_reduce.MinMax.time_max(<class 'numpy.uint64'>)
-      81.0±0.5μs       35.6±0.6μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'Q')
-      81.5±0.6μs       35.7±0.4μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'l')
-        81.4±1μs       35.6±0.8μs     0.44  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'q')
-      12.8±0.2μs      5.54±0.02μs     0.43  bench_reduce.MinMax.time_min(<class 'numpy.int64'> (0))
-      12.9±0.3μs      5.61±0.05μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (1))
-      13.2±0.3μs      5.66±0.02μs     0.43  bench_reduce.MinMax.time_max(<class 'numpy.int64'> (0))
-         505±1μs          216±1μs     0.43  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 4, 'f')
-         506±3μs          213±3μs     0.42  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 2, 'f')
-      61.7±0.9μs       25.9±0.1μs     0.42  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'd')
-         505±1μs          209±1μs     0.41  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 1, 'f')
-         432±2μs         177±10μs     0.41  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 4, 'd')
-         520±9μs          209±5μs     0.40  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 4, 'f')
-         508±6μs          200±4μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 4, 'f')
-         436±2μs          170±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 2, 'd')
-         503±1μs          195±2μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 2, 'f')
-         438±4μs          170±5μs     0.39  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 2, 'd')
-      36.4±0.2μs       14.1±0.1μs     0.39  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 4, 'F')
-         432±3μs          167±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 4, 'd')
-         432±3μs          167±3μs     0.39  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 4, 'd')
-         505±2μs          195±4μs     0.39  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 2, 1, 'f')
-        81.6±2μs       31.4±0.3μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'd')
-         509±3μs          195±2μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 4, 1, 'f')
-         436±2μs          167±5μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 4, 'd')
-         436±3μs          166±3μs     0.38  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 4, 'd')
-         218±2μs         83.0±2μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 4, 'f')
-      12.0±0.2μs      4.56±0.02μs     0.38  bench_reduce.MinMax.time_max(<class 'numpy.uint32'>)
-       215±0.9μs       81.0±0.3μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 1, 'f')
-         220±2μs         82.5±1μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 4, 'f')
-      12.0±0.2μs      4.50±0.05μs     0.38  bench_reduce.MinMax.time_min(<class 'numpy.int32'>)
-         505±2μs          189±1μs     0.38  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'reciprocal'>, 1, 2, 'f')
-      36.3±0.1μs      13.6±0.08μs     0.37  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 2, 'F')
-     2.71±0.02ms         1.01±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 1, 'd')
-     2.76±0.03ms      1.03±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 4, 'd')
-         425±3μs          159±6μs     0.37  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 4, 'd')
-         439±1μs         163±10μs     0.37  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 4, 'd')
-       218±0.9μs       80.8±0.6μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 2, 'f')
-         217±1μs       80.5±0.5μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'f')
-     2.74±0.08ms      1.01±0.04ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 4, 'd')
-     2.74±0.03ms      1.01±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 2, 2, 'd')
-      12.2±0.2μs      4.48±0.06μs     0.37  bench_reduce.MinMax.time_min(<class 'numpy.uint32'>)
-       217±0.5μs       79.9±0.5μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 2, 'f')
-         219±2μs       80.6±0.4μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 2, 4, 'f')
-     2.78±0.03ms      1.02±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 2, 'd')
-     2.76±0.04ms      1.02±0.01ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 4, 1, 'd')
-     2.74±0.04ms         1.01±0ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 1, 'd')
-       216±0.5μs       79.2±0.4μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 2, 'f')
-     2.72±0.04ms         996±20μs     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 2, 'd')
-     2.87±0.09ms      1.05±0.03ms     0.37  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'tanh'>, 1, 4, 'd')
-         219±5μs       79.4±0.4μs     0.36  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 4, 1, 'f')
-         432±2μs          156±3μs     0.36  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 4, 'd')
-      36.2±0.2μs      13.0±0.08μs     0.36  bench_ufunc_strides.AVX_cmplx_arithmetic.time_ufunc('multiply', 1, 'F')
-      12.2±0.2μs      4.39±0.02μs     0.36  bench_reduce.MinMax.time_max(<class 'numpy.int32'>)
-         448±6μs         160±10μs     0.36  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 4, 'd')
-        455±10μs         161±10μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 4, 'd')
-        443±10μs          156±4μs     0.35  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 4, 1, 'd')
-         428±2μs          150±1μs     0.35  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 2, 'd')
-         430±1μs          148±5μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 2, 'd')
-         433±1μs          149±7μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 4, 1, 'd')
-       430±0.8μs          145±4μs     0.34  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 2, 'd')
-       103±0.3μs      34.5±0.06μs     0.34  bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float32'>)
-         435±5μs          146±2μs     0.34  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 2, 'd')
-         103±1μs       34.5±0.1μs     0.34  bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
-       426±0.3μs          143±5μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 2, 'd')
-     8.84±0.04μs      2.96±0.04μs     0.33  bench_reduce.MinMax.time_min(<class 'numpy.uint8'>)
-         430±3μs          143±3μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 2, 1, 'd')
-         428±1μs          142±7μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 4, 1, 'd')
-        442±10μs          147±6μs     0.33  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 4, 'd')
-         438±3μs          145±6μs     0.33  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 2, 'd')
-     8.83±0.03μs      2.92±0.03μs     0.33  bench_reduce.MinMax.time_max(<class 'numpy.uint8'>)
-       158±0.5μs         52.2±1μs     0.33  bench_reduce.ArgMax.time_argmax(<class 'numpy.int64'>)
-      75.5±0.9μs       24.9±0.4μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'I')
-       156±0.2μs       51.5±0.6μs     0.33  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint64'>)
-         433±4μs          142±8μs     0.33  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 2, 'd')
-      77.3±0.5μs       25.2±0.4μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'I')
-      76.2±0.4μs       24.8±0.5μs     0.33  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'i')
-         433±4μs          141±6μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 2, 1, 'd')
-        76.0±2μs       24.7±0.6μs     0.32  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'i')
-         447±9μs        145±0.5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 4, 'd')
-         447±9μs          145±4μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 4, 'd')
-       437±0.9μs          141±7μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 2, 'd')
-       159±0.5μs       51.4±0.4μs     0.32  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint64'>)
-         432±4μs          139±5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 4, 1, 'd')
-       158±0.4μs       50.6±0.1μs     0.32  bench_reduce.ArgMin.time_argmin(<class 'numpy.int64'>)
-      81.3±0.6μs       25.7±0.3μs     0.32  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'absolute'>, 1, 1, 'd')
-         443±9μs          140±5μs     0.32  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 4, 'd')
-         432±4μs         130±10μs     0.30  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 4, 1, 'd')
-       157±0.1μs       46.8±0.4μs     0.30  bench_reduce.ArgMin.time_argmin(<class 'numpy.int32'>)
-         158±1μs       46.4±0.3μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint32'>)
-       157±0.4μs       45.5±0.5μs     0.29  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint32'>)
-       157±0.4μs       45.5±0.3μs     0.29  bench_reduce.ArgMax.time_argmax(<class 'numpy.int32'>)
-         434±6μs          123±3μs     0.28  bench_ufunc_strides.Binary.time_ufunc('maximum', 4, 1, 1, 'd')
-       426±0.9μs          120±3μs     0.28  bench_ufunc_strides.Binary.time_ufunc('minimum', 4, 1, 1, 'd')
-         199±2μs         55.5±2μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 4, 'f')
-         199±2μs         55.2±1μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 4, 'f')
-         198±2μs         54.9±1μs     0.28  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 4, 'f')
-         439±9μs          120±3μs     0.27  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 4, 'd')
-         198±2μs         54.2±1μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 4, 'f')
-         199±2μs         54.5±3μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 4, 'f')
-         441±9μs          119±2μs     0.27  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 4, 'd')
-         437±2μs          118±6μs     0.27  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 4, 1, 'd')
-      40.3±0.3μs      10.9±0.03μs     0.27  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float32'>)
-       424±0.8μs          114±4μs     0.27  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 2, 'd')
-       195±0.7μs       51.7±0.4μs     0.27  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 4, 'f')
-         430±3μs          113±3μs     0.26  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 2, 'd')
-       195±0.7μs       50.7±0.4μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 4, 'f')
-     13.6±0.04μs      3.52±0.01μs     0.26  bench_reduce.MinMax.time_max(<class 'numpy.uint16'>)
-         195±1μs       50.3±0.3μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 4, 'f')
-         200±2μs         51.0±1μs     0.26  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 4, 'f')
-         197±3μs       50.3±0.3μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 4, 'f')
-     13.7±0.09μs      3.48±0.03μs     0.25  bench_reduce.MinMax.time_min(<class 'numpy.uint16'>)
-         200±3μs         50.4±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 4, 'f')
-         201±3μs         50.5±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 4, 'f')
-         200±3μs         50.0±1μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 4, 'f')
-         200±3μs       49.4±0.7μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 4, 'f')
-         200±3μs       49.1±0.6μs     0.25  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 4, 'f')
-        1.95±0ms          477±3μs     0.24  bench_reduce.AddReduceSeparate.time_reduce(0, 'float32')
-         597±4μs          146±1μs     0.24  bench_ufunc.CustomInplace.time_float_add_temp
-     12.5±0.05μs      2.98±0.02μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int8'>)
-         584±1μs          137±1μs     0.24  bench_ufunc.CustomInplace.time_float_add
-     15.0±0.05μs      3.52±0.02μs     0.24  bench_reduce.MinMax.time_min(<class 'numpy.int16'>)
-     12.6±0.08μs      2.95±0.02μs     0.23  bench_reduce.MinMax.time_max(<class 'numpy.int8'>)
-     15.0±0.08μs      3.51±0.02μs     0.23  bench_reduce.MinMax.time_max(<class 'numpy.int16'>)
-         913±8μs          203±7μs     0.22  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 4, 'd')
-         881±3μs          195±4μs     0.22  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 4, 'd')
-       197±0.8μs       42.7±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 2, 'f')
-         198±2μs       42.8±0.5μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 2, 'f')
-         199±3μs       43.1±0.1μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 1, 'f')
-         197±2μs       42.7±0.7μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 4, 1, 'f')
-         199±3μs       43.0±0.6μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 4, 1, 'f')
-         197±1μs       42.5±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 2, 'f')
-       196±0.5μs       42.3±0.4μs     0.22  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 4, 1, 'f')
-         200±3μs       43.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 4, 2, 'f')
-         199±1μs       42.6±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 1, 'f')
-       199±0.8μs       42.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 4, 2, 'f')
-         195±1μs       41.4±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 2, 'f')
-       196±0.9μs       41.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 2, 'f')
-         195±1μs       41.2±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 2, 'f')
-         195±2μs       41.2±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 2, 'f')
-         196±1μs       41.3±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 2, 'f')
-       195±0.5μs       41.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 2, 1, 'f')
-       194±0.6μs       40.9±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 2, 'f')
-       195±0.8μs       41.0±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 2, 1, 'f')
-       195±0.2μs       40.9±0.5μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 1, 'f')
-       196±0.6μs       40.9±0.4μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 2, 1, 'f')
-         196±1μs       40.8±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 2, 2, 'f')
-         197±2μs       40.9±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 2, 'f')
-       195±0.4μs       40.4±0.1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'floor'>, 1, 1, 'f')
-         197±1μs       40.8±0.1μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 2, 'f')
-       198±0.9μs       40.9±0.3μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 2, 'f')
-       196±0.4μs       40.4±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'ceil'>, 1, 1, 'f')
-         196±2μs       40.3±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'trunc'>, 1, 1, 'f')
-         196±2μs       40.3±0.2μs     0.21  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 1, 1, 'f')
-         198±2μs       40.7±0.2μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'rint'>, 2, 1, 'f')
-         199±3μs       40.6±0.5μs     0.20  bench_ufunc_strides.Unary.time_ufunc(<ufunc 'square'>, 1, 1, 'f')
-         894±1μs          173±6μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 4, 'd')
-         880±2μs          166±2μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 2, 'd')
-         878±1μs          164±4μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 4, 'd')
-         901±5μs          167±4μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 4, 'd')
-         899±5μs          167±7μs     0.19  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 2, 'd')
-         424±2μs         78.8±2μs     0.19  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 2, 'd')
-         428±1μs         78.3±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 2, 'd')
-         887±6μs          159±5μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 4, 'd')
-         423±1μs         74.6±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 2, 1, 'd')
-         878±2μs          155±6μs     0.18  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 4, 'd')
-         427±2μs         75.2±1μs     0.18  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 2, 'd')
-         425±3μs         74.6±2μs     0.18  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 2, 1, 'd')
-       422±0.8μs       73.6±0.5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 2, 'd')
-         906±6μs          156±6μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 4, 'd')
-         434±4μs         74.7±2μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 2, 1, 'd')
-        914±20μs          157±9μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 4, 'd')
-         877±1μs          151±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 4, 1, 'd')
-        898±10μs          154±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 4, 'd')
-        931±20μs         159±10μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 4, 'd')
-         429±5μs         73.5±3μs     0.17  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 2, 1, 'd')
-         881±1μs         149±10μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 4, 'd')
-         895±3μs          152±4μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 4, 1, 'd')
-         896±3μs          150±5μs     0.17  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 2, 'd')
-      24.4±0.2μs      4.06±0.03μs     0.17  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 43)
-         904±5μs          149±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 2, 'd')
-         880±2μs          144±4μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 2, 'd')
-      24.9±0.2μs      4.02±0.01μs     0.16  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint32'>, 8)
-         879±2μs          141±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 2, 'd')
-         421±1μs         67.1±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 2, 1, 1, 'd')
-         420±2μs         66.5±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 2, 'd')
-         890±2μs          141±2μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 4, 1, 'd')
-      82.5±0.3μs      12.9±0.07μs     0.16  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'h')
-         875±2μs          136±1μs     0.16  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 2, 'd')
-      83.1±0.7μs       12.9±0.2μs     0.16  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'h')
-       427±0.8μs         66.1±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 2, 1, 1, 'd')
-         427±2μs         65.3±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 2, 'd')
-        935±20μs          141±4μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 4, 'd')
-        921±20μs          139±1μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 4, 'd')
-      86.9±0.6μs       13.1±0.1μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'H')
-        937±20μs          141±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 4, 'd')
-         898±3μs          135±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 2, 1, 'd')
-        917±20μs          138±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 4, 'd')
-         892±2μs          134±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 2, 'd')
-      86.3±0.7μs      12.8±0.08μs     0.15  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'H')
-         884±6μs          131±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 2, 1, 'd')
-      65.9±0.4μs       9.76±0.1μs     0.15  bench_reduce.FMinMax.time_min(<class 'numpy.float64'>)
-         898±4μs          133±5μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 2, 'd')
-         883±3μs          130±4μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 2, 'd')
-         881±4μs          129±3μs     0.15  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 4, 1, 'd')
-      67.1±0.5μs       9.62±0.1μs     0.14  bench_reduce.FMinMax.time_max(<class 'numpy.float64'>)
-         892±1μs          119±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 4, 1, 1, 'd')
-         660±6μs         87.8±1μs     0.13  bench_lib.Nan.time_nanmax(200000, 0)
-         672±5μs         88.5±1μs     0.13  bench_lib.Nan.time_nanmin(200000, 0)
-        921±30μs          121±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 4, 'd')
-         671±5μs         88.2±2μs     0.13  bench_lib.Nan.time_nanmax(200000, 2.0)
-         874±1μs          115±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 4, 1, 1, 'd')
-         669±3μs         87.5±1μs     0.13  bench_lib.Nan.time_nanmax(200000, 0.1)
-        667±10μs         87.3±2μs     0.13  bench_lib.Nan.time_nanmin(200000, 0.1)
-         672±5μs         87.7±2μs     0.13  bench_lib.Nan.time_nanmin(200000, 2.0)
-         877±3μs          114±9μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 4, 1, 'd')
-        931±20μs          120±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 4, 'd')
-         895±6μs          114±3μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 2, 'd')
-         880±4μs          112±2μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 2, 'd')
-         891±2μs          112±9μs     0.13  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 4, 1, 'd')
-      79.7±0.1μs       9.53±0.2μs     0.12  bench_reduce.ArgMax.time_argmax(<class 'bool'>)
-       418±0.9μs       45.2±0.9μs     0.11  bench_ufunc_strides.Binary.time_ufunc('minimum', 1, 1, 1, 'd')
-         427±3μs       45.2±0.8μs     0.11  bench_ufunc_strides.Binary.time_ufunc('maximum', 1, 1, 1, 'd')
-      68.5±0.3μs      7.19±0.07μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'b')
-      68.3±0.4μs      7.14±0.08μs     0.10  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'b')
-       156±0.2μs       15.9±0.2μs     0.10  bench_reduce.ArgMin.time_argmin(<class 'numpy.int16'>)
-       157±0.3μs       15.8±0.1μs     0.10  bench_reduce.ArgMax.time_argmax(<class 'numpy.int16'>)
-      70.5±0.2μs      6.45±0.04μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 43)
-      70.6±0.2μs      6.40±0.06μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -43)
-      75.7±0.5μs      6.48±0.04μs     0.09  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, -8)
-         186±2μs       15.9±0.2μs     0.09  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint16'>)
-      75.9±0.4μs      6.42±0.06μs     0.08  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int32'>, 8)
-       189±0.3μs      15.8±0.09μs     0.08  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint16'>)
-        901±20μs         74.7±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 2, 1, 'd')
-         881±5μs       71.7±0.7μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 2, 1, 'd')
-      89.1±0.2μs       7.22±0.1μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('maximum', 1, 1, 1, 'B')
-     1.10±0.01ms         88.9±1μs     0.08  bench_lib.Nan.time_nanmax(200000, 90.0)
-      88.6±0.1μs      7.08±0.07μs     0.08  bench_ufunc_strides.BinaryInt.time_ufunc('minimum', 1, 1, 1, 'B')
-        1.10±0ms         87.3±1μs     0.08  bench_lib.Nan.time_nanmin(200000, 90.0)
-         881±5μs         69.3±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 2, 'd')
-         893±2μs       69.9±0.8μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 2, 'd')
-         872±5μs         68.1±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 2, 'd')
-         901±4μs         70.3±1μs     0.08  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 2, 'd')
-         890±9μs       65.1±0.6μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 2, 'd')
-         889±2μs       64.8±0.8μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 2, 'd')
-     41.6±0.07μs      3.02±0.01μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 8)
-         880±5μs         63.7±2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 2, 1, 'd')
-      41.6±0.2μs      2.99±0.02μs     0.07  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint16'>, 43)
-         903±5μs         62.9±1μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 2, 1, 'd')
-         876±4μs       60.5±0.2μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmin', 2, 1, 1, 'd')
-         907±6μs         61.8±1μs     0.07  bench_ufunc_strides.Binary.time_ufunc('fmax', 2, 1, 1, 'd')
-       156±0.4μs       9.44±0.2μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.int8'>)
-       157±0.7μs       9.41±0.2μs     0.06  bench_reduce.ArgMin.time_argmin(<class 'numpy.uint8'>)
-       157±0.7μs      9.32±0.09μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.int8'>)
-       157±0.6μs      9.25±0.03μs     0.06  bench_reduce.ArgMax.time_argmax(<class 'numpy.uint8'>)
-        1.53±0ms         87.5±2μs     0.06  bench_lib.Nan.time_nanmax(200000, 50.0)
-     1.55±0.01ms         87.0±1μs     0.06  bench_lib.Nan.time_nanmin(200000, 50.0)
-      69.9±0.2μs      3.75±0.02μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -43)
-      70.3±0.4μs      3.77±0.04μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 43)
-      41.3±0.4μs         2.18±0μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 8)
-      41.3±0.3μs      2.18±0.01μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.uint8'>, 43)
-        885±10μs       45.9±0.6μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmin', 1, 1, 1, 'd')
-         892±5μs       45.3±0.5μs     0.05  bench_ufunc_strides.Binary.time_ufunc('fmax', 1, 1, 1, 'd')
-      74.5±0.8μs      3.76±0.03μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, -8)
-      74.0±0.1μs      3.73±0.05μs     0.05  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int16'>, 8)
-      71.3±0.3μs      2.43±0.02μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 43)
-      71.3±0.4μs      2.42±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -43)
-      75.6±0.6μs      2.42±0.04μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, 8)
-      75.3±0.7μs      2.39±0.01μs     0.03  bench_ufunc.CustomScalarFloorDivideInt.time_floor_divide_int(<class 'numpy.int8'>, -8)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@seiko2plus
Copy link
Member Author

cc @mattip

@mattip
Copy link
Member

mattip commented Jun 12, 2022

Let's put this in now, at the beginning of the 1.24 release cycle, in hopes that some IBMZ users can chime in.

@mattip mattip merged commit e6d5529 into numpy:main Jun 12, 2022
@seiko2plus seiko2plus deleted the zsystem_sup branch June 16, 2022 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants