WIP::ENH:SIMD Improve the performance of comparison operators #16960

seiko2plus · 2020-07-27T23:53:10Z

Don't merge! Work in progress

Summary of the changes, performance achievements, TODO list will be written later.

NOTE: Feel free to leave comment/review while I'm working on it

numpy/core/src/umath/simd.inc.src

Qiyu8 · 2020-07-29T06:50:04Z

numpy/core/src/umath/loops.h.src

@@ -18,6 +18,29 @@
 #define BOOL_fmax BOOL_maximum
 #define BOOL_fmin BOOL_minimum

+/*
+ *****************************************************************************
+ **                          Compersion & Logical                           **


TODO

eric-wieser · 2020-08-22T07:36:42Z

numpy/distutils/pyas_template.py

@@ -0,0 +1,192 @@
+#!/usr/bin/env python3


I suspect this would make sense in a standalone PR, along with tests and documention.

yes, indeed. it's under experiments right now. sure I will move it later into a seprate pr along with doc and testing unit.

It would be good to look around and see if there are existing template solutions that can be reused. tempita I think is used in some places in numpy, and jinja may be an option too.

I tried almost everything until I figure out the most flexible template engine is the one who doesn't bring new language syntax or philosophies and that what "pyas" does "Python as a template language", its simply treat Python as a PHP and f-strings as a template. it also provides a simple translation mechanism.

And the reason why I drop repeat template is the generated source size almost hit 9mb without finishing the rest of the work also it can't be used for generating C macros.

mattip · 2020-08-22T19:14:30Z

I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via NPY__CPU_DISPATCH_CALL. I am worried about the maintenance burden moving forward and the intricies of adding yet another C code format to NumPy.

mattip · 2020-08-22T19:15:49Z

If the generated code is so large, maybe we need to rethink what we are trying to do here.

seiko2plus · 2020-08-22T22:39:14Z

@mattip,

I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via NPY__CPU_DISPATCH_CALL. I am worried about the maintenance burden moving forward and the intricies of adding yet another C code format to NumPy.

xref, mattip#46 (comment)

seiko2plus · 2020-08-22T23:07:11Z

@mattip,

If the generated code is so large, maybe we need to rethink what we are trying to do here.

The issue in the conv_template(template repeater) that we had to count on C preprocessors for everything even with internal looping!
Do you know how that cost?

$ python numpy/numpy/distutils/conv_template.py einsum_sumprod.c.src
$ du einsum_sumprod.c
4576	einsum_sumprod.c

This PR is covering more and more kernels than einsum. So the best thing we can do is to cut the roots from the beginning
to avoid increasing the maintenance cost, it wouldn't hurt that much if we give it a try!.

seiko2plus · 2022-05-22T19:54:54Z

closed in favor of #21483, while it doesn't contains all the improvements this pr has but we could add later during moving to C++

rgommers added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jul 28, 2020

Qiyu8 reviewed Jul 29, 2020

View reviewed changes

numpy/core/src/umath/simd.inc.src Show resolved Hide resolved

Qiyu8 reviewed Jul 29, 2020

View reviewed changes

seiko2plus force-pushed the simd_improve_cmp branch 5 times, most recently from 112b5ac to 2e1682c Compare August 1, 2020 17:49

Qiyu8 mentioned this pull request Aug 3, 2020

USIMD: Optimize the performace of np.einsum for all platforms #16641

Closed

seiko2plus added 5 commits August 3, 2020 21:43

ENH:NPYV add pack intrinsics for boolean vectors

2207fea

ENH:NPYV add non-contiguous load/store intrinsics for all vectors types

1fa7b81

ENH:NPYV add logical intrinsics for boolean vectors

a9292b5

msvc bug, TODO: move it into seperate pr

d161288

WIP::ENH:SIMD Improve the performance of comparison operators

95c485a

TODO

seiko2plus force-pushed the simd_improve_cmp branch from 2e1682c to 95c485a Compare August 3, 2020 19:44

seberg marked this pull request as draft August 3, 2020 20:14

tumbling down (test AVX2)

8d4ae79

seiko2plus force-pushed the simd_improve_cmp branch from 9628906 to 8d4ae79 Compare August 11, 2020 04:35

reduce the size, map unsigned to signed

86f2d2b

seiko2plus force-pushed the simd_improve_cmp branch 3 times, most recently from c09eecc to cedc863 Compare August 22, 2020 07:02

eric-wieser reviewed Aug 22, 2020

View reviewed changes

seiko2plus force-pushed the simd_improve_cmp branch 3 times, most recently from 7602864 to 2c4415b Compare August 22, 2020 15:12

This was referenced Aug 23, 2020

MAINT: refactor _Distutils out of CCompiler_Opt mattip/numpy#46

Closed

ENH, TST: Bring the NumPy C SIMD vectorization interface "NPYV" to Python #16782

Merged

seiko2plus added 2 commits August 23, 2020 18:59

The new template module, TODO: move it to a separate pr

4ad27f6

try the new template

727a2e7

seiko2plus force-pushed the simd_improve_cmp branch from 2c4415b to 727a2e7 Compare August 23, 2020 17:00

seiko2plus mentioned this pull request Aug 31, 2020

MAINT: Make the NPY_CPU_DISPATCH_CALL macros expressions not statements #17201

Merged

seiko2plus mentioned this pull request Sep 8, 2020

ENH:Umath Replace raw SIMD of unary float point(32-64) with NPYV - g0 #16247

Merged

11 tasks

Qiyu8 mentioned this pull request Jan 26, 2021

ENH: add scalar special cases for boolean logical loops #8924

Closed

Base automatically changed from master to main March 4, 2021 02:05

github-actions bot added the 25 - WIP label Mar 4, 2021

seiko2plus mentioned this pull request Aug 23, 2021

MAINT: Replace numpy custom generation engine by raw C++ #19713

Merged

seiko2plus closed this May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP::ENH:SIMD Improve the performance of comparison operators #16960

WIP::ENH:SIMD Improve the performance of comparison operators #16960

seiko2plus commented Jul 27, 2020

Qiyu8 Jul 29, 2020

eric-wieser Aug 22, 2020

seiko2plus Aug 22, 2020 •

edited

eric-wieser Aug 22, 2020

seiko2plus Aug 22, 2020

seiko2plus Aug 22, 2020

mattip commented Aug 22, 2020

mattip commented Aug 22, 2020

seiko2plus commented Aug 22, 2020

seiko2plus commented Aug 22, 2020

seiko2plus commented May 22, 2022

WIP::ENH:SIMD Improve the performance of comparison operators #16960

WIP::ENH:SIMD Improve the performance of comparison operators #16960

Conversation

seiko2plus commented Jul 27, 2020

Don't merge! Work in progress

Qiyu8 Jul 29, 2020

Choose a reason for hiding this comment

eric-wieser Aug 22, 2020

Choose a reason for hiding this comment

seiko2plus Aug 22, 2020 • edited

Choose a reason for hiding this comment

eric-wieser Aug 22, 2020

Choose a reason for hiding this comment

seiko2plus Aug 22, 2020

Choose a reason for hiding this comment

seiko2plus Aug 22, 2020

Choose a reason for hiding this comment

mattip commented Aug 22, 2020

mattip commented Aug 22, 2020

seiko2plus commented Aug 22, 2020

seiko2plus commented Aug 22, 2020

seiko2plus commented May 22, 2022

seiko2plus Aug 22, 2020 •

edited