Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP::ENH:SIMD Improve the performance of comparison operators #16960

Closed
wants to merge 9 commits into from

Conversation

seiko2plus
Copy link
Member

Don't merge! Work in progress

Summary of the changes, performance achievements, TODO list will be written later.

NOTE: Feel free to leave comment/review while I'm working on it

@rgommers rgommers added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jul 28, 2020
@@ -18,6 +18,29 @@
#define BOOL_fmax BOOL_maximum
#define BOOL_fmin BOOL_minimum

/*
*****************************************************************************
** Compersion & Logical **
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparison

@seberg seberg marked this pull request as draft August 3, 2020 20:14
@seiko2plus seiko2plus force-pushed the simd_improve_cmp branch 3 times, most recently from c09eecc to cedc863 Compare August 22, 2020 07:02
@@ -0,0 +1,192 @@
#!/usr/bin/env python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this would make sense in a standalone PR, along with tests and documention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed. it's under experiments right now. sure I will move it later into a seprate pr along with doc and testing unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to look around and see if there are existing template solutions that can be reused. tempita I think is used in some places in numpy, and jinja may be an option too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried almost everything until I figure out the most flexible template engine is the one who doesn't bring new language syntax or philosophies and that what "pyas" does "Python as a template language", its simply treat Python as a PHP and f-strings as a template. it also provides a simple translation mechanism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the reason why I drop repeat template is the generated source size almost hit 9mb without finishing the rest of the work also it can't be used for generating C macros.

@seiko2plus seiko2plus force-pushed the simd_improve_cmp branch 3 times, most recently from 7602864 to 2c4415b Compare August 22, 2020 15:12
@mattip
Copy link
Member

mattip commented Aug 22, 2020

I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via NPY__CPU_DISPATCH_CALL. I am worried about the maintenance burden moving forward and the intricies of adding yet another C code format to NumPy.

@mattip
Copy link
Member

mattip commented Aug 22, 2020

If the generated code is so large, maybe we need to rethink what we are trying to do here.

@seiko2plus
Copy link
Member Author

@mattip,

I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via NPY__CPU_DISPATCH_CALL. I am worried about the maintenance burden moving forward and the intricies of adding yet another C code format to NumPy.

xref, mattip#46 (comment)

@seiko2plus
Copy link
Member Author

@mattip,

If the generated code is so large, maybe we need to rethink what we are trying to do here.

The issue in the conv_template(template repeater) that we had to count on C preprocessors for everything even with internal looping!
Do you know how that cost?

$ python numpy/numpy/distutils/conv_template.py einsum_sumprod.c.src
$ du einsum_sumprod.c
4576	einsum_sumprod.c

This PR is covering more and more kernels than einsum. So the best thing we can do is to cut the roots from the beginning
to avoid increasing the maintenance cost, it wouldn't hurt that much if we give it a try!.

@seiko2plus
Copy link
Member Author

closed in favor of #21483, while it doesn't contains all the improvements this pr has but we could add later during moving to C++

@seiko2plus seiko2plus closed this May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 25 - WIP component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants