New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ext/bcmath: Changed the bcmul calculation method #14213

Merged

SakiTakamachi merged 15 commits into php:master from SakiTakamachi:refactor_bcmath_mul

May 20, 2024

+130 −196

Member

SakiTakamachi commented May 13, 2024 •

edited

There are still ways to make it faster, but since it would complicate the PR, I will split it into a follow-up PR (~~#14229~~).

The difference is quite large, so I will only post rough measurements.

Bench code:

<?php

// 1
$num1 = '1.2345678';
$num2 = '2.1234567';

$start = microtime(true);
for ($i = 0; $i < 10000000; $i++) {
    bcmul($num1, $num2, 20);
}
echo microtime(true) - $start . PHP_EOL;


// 2
$num1 = str_repeat('1234567890', 300);
$num2 = str_repeat('9876543210', 300);

$start = microtime(true);
for ($i = 0; $i < 1000; $i++) {
    bcmul($num1, $num2, 20);
}
echo microtime(true) - $start . PHP_EOL;


// 3
$num1 = '1.2345678901234567890';
$num2 = '2.12345678901234567890';

$start = microtime(true);
for ($i = 0; $i < 10000000; $i++) {
    bcmul($num1, $num2, 20);
}
echo microtime(true) - $start . PHP_EOL;


// 4
$num1 = '-12345678901234567890.12345678901234567890';
$num2 = '-212345678901234567890.12345678901234567890';

$start = microtime(true);
for ($i = 0; $i < 2000000; $i++) {
    bcmul($num1, $num2, 20);
}
echo microtime(true) - $start . PHP_EOL;


// 5
$num1 = '12345678901234567890.00000000000000000000000000000000000000000000000000';
$num2 = '212345678901234567890.00000000000000000000000000000000000000000000000000';

$start = microtime(true);
for ($i = 0; $i < 10000000; $i++) {
    bcmul($num1, $num2, 20);
}
echo microtime(true) - $start . PHP_EOL;

before

1.6423609256744
1.0373518466949
3.7979400157928
2.4885659217834
4.256236076355

after

1.2081251144409
0.096983194351196
1.9056849479675
0.60840702056885
2.0678749084473

SakiTakamachi requested review from Girgias and nielsdos as code owners

May 13, 2024 08:41

github-actions bot added the Extension: bcmath label

Member Author

SakiTakamachi commented May 13, 2024

CI is red

SakiTakamachi added 4 commits

May 13, 2024 20:04


          Changed the bcmul calculation method

91b9343

Multiplication is performed after converting to unsigned long, resulting in faster calculations.


          Fixed array allocation

ec9f8df


          Fixed array allocation

ac9342d


          Changed unsigned long to BC_UINT_T

899bb39

SakiTakamachi force-pushed the refactor_bcmath_mul branch from 7a944c7 to 899bb39 Compare

May 13, 2024 11:04

Member Author

SakiTakamachi commented May 13, 2024

I need to change some of the constant names and comments.

SakiTakamachi added 2 commits

May 14, 2024 08:21


          Fixed comments and constant names

ea57a9f


          Moved the constant

b23e0a3

Member Author

SakiTakamachi commented May 13, 2024

done!

Member Author

SakiTakamachi commented May 13, 2024

The red one in CI is openssl.


          Fixed variable names and comments

ec51d76

Member Author

SakiTakamachi commented May 14, 2024

There were some corrections that were omitted, so I corrected the variable names and comments. I think everything else is probably fine.

SakiTakamachi mentioned this pull request

ext/bcmath: Improving mul performance 2 #14229

Closed

SakiTakamachi marked this pull request as draft

May 14, 2024 13:29

SakiTakamachi marked this pull request as ready for review

May 14, 2024 13:30

nielsdos reviewed

View reviewed changes

Member

nielsdos left a comment

Nice work. Some comments:

ext/bcmath/libbcmath/src/recmul.c

+              	BC_UINT_T num = 0;
+              	BC_UINT_T base = 1;
+              	for (size_t i = 0; i < len; i++) {

Member

nielsdos May 14, 2024

I once shared a fast way of doing this that I had implemented in a WIP commit: nielsdos@20783c3
Original source: https://kholdstare.github.io/technical/2020/05/26/faster-integer-parsing.html
The article claims it is faster than a regular for loop that multiplies by 10.
(note: I wouldn't use the SSE SIMD approach from the article, just the 4/8 byte fast non-SIMD approach)
However, let's not do that in this PR because this PR is large enough as-is.

ext/bcmath/libbcmath/src/recmul.c Outdated

-                 Let u = u0 + u1*(b^n)
-                 Let v = v0 + v1*(b^n)
-                 Then uv = (B^2n+B^n)*u1*v1 + B^n*(u1-u0)*(v0-v1) + (B^n+1)*u0*v0
+              	BC_UINT_T *buf = emalloc((n1_arr_size + n2_arr_size + prod_arr_size) * sizeof(BC_UINT_T));

Member

nielsdos May 14, 2024

Allocations with multiplications shouldn't be open-coded, because they risk getting overflown.
Use safe_emalloc here please.

Member Author

SakiTakamachi May 15, 2024

Fixed it

ext/bcmath/libbcmath/src/recmul.c

-              		}
-              		*pvptr-- = sum % BASE;
-              		sum = sum / BASE;
+              	while (pend >= pptr) {

Member

nielsdos May 14, 2024

I also had a faster way than this of writing the BCD representation back. See nielsdos@20783c3#diff-5550a7de511402b266a0fc932bd11298d2e02ef720879507b89651add38f380dR177
But we should keep that as a follow-up imo.

ext/bcmath/libbcmath/src/recmul.c Outdated

-              		_bc_rec_mul(u1, u1->n_len, v1, v1->n_len, &m1);
+              	/* Convert n2 to uint[] */
+              	i = 0;
+              	while (n2len > 0) {

Member

nielsdos May 14, 2024

Hmmm, we're doing the same for n1, would be great if this was factored out.

Member Author

SakiTakamachi May 15, 2024

fixed it

ext/bcmath/libbcmath/src/recmul.c

+              	char *pend = pptr + prodlen - 1;
+              	i = 0;
+              	while (i < prod_arr_size - 1) {
+              		for (size_t j = 0; j < BC_MUL_UINT_DIGITS; j++) {

Member

nielsdos May 14, 2024

Also an opportunity for the faster BCD write method for follow-up.

ext/bcmath/libbcmath/src/recmul.c


		/* Do the multiply */
		_bc_rec_mul(n1, len1, n2, len2, &pval);

Member

nielsdos May 14, 2024

One thing we're going to lose here now is the divide-and-conquer approach for very large numbers, and I think we should probably keep that. WDYT?

Member Author

SakiTakamachi May 14, 2024

Good point. Sorry, I should have explained this in the PR description.

The existing implementation uses divide-and-conquer when the number of digits is 80 or more.

This is compared in test case 2, and clearly this PR is faster, so it's not worth keeping the original implementation as is.

However, combining this PR approach with a divide and conquer approach has shown, in my local testing, to probably be nearly twice as effective.

Therefore, we should use the divide and conquer method (commonly called the Karatsuba algorithm in Japanese), but probably separate PRs.

ext/bcmath/libbcmath/src/recmul.c Outdated Show resolved Hide resolved

SakiTakamachi and others added 2 commits

May 15, 2024 08:08


          Fixed array size calculation

7e82dc1

Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>


          address comments

25b33bd

Girgias reviewed

View reviewed changes

ext/bcmath/libbcmath/src/recmul.c Outdated Show resolved Hide resolved

SakiTakamachi and others added 3 commits

May 15, 2024 19:36


          nend to const

888ded6

Co-authored-by: Gina Peter Banyard <girgias@php.net>


          n1end and n2end to const

aa1a677


          n1end and n2end to const

dd401bd

nielsdos reviewed

View reviewed changes

Member

nielsdos left a comment

Well, it's a good thing I allocated time for this, took me a while to think about this.
It looks right except for some small nits.

ext/bcmath/libbcmath/src/recmul.c Outdated Show resolved Hide resolved

ext/bcmath/libbcmath/src/recmul.c Outdated

-              */
-              static void _bc_rec_mul(bc_num u, size_t ulen, bc_num v, size_t vlen, bc_num *prod)
+              /*
+               * If the n_values of n1 and n2 are both 4 (32-bit) or 8 (64-bit) digits or less,

Member

nielsdos May 15, 2024

I can see in my IDE there's zero-width-spaces on this line.

Member Author

SakiTakamachi May 15, 2024

It seems that Google Translate mixed it up. Fixed.

ext/bcmath/libbcmath/src/recmul.c Outdated

+               * e.g. 12345678901234567890 => {34567890, 56789012, 1234}
+               *
+               * Multiply and add these groups of numbers to perform multiplication fast.
+               * How much to shift the digits when adding values can be calculated from the index of the array.

Member

nielsdos May 15, 2024

Also zero-width-spaces on this line.

Member Author

SakiTakamachi May 15, 2024

Fixed

ext/bcmath/libbcmath/src/recmul.c

-              	bc_num d1 = bc_sub(u1, u0, 0);
-              	bc_num d2 = bc_sub(v0, v1, 0);
+              	BC_UINT_T *buf = safe_emalloc(n1_arr_size + n2_arr_size + prod_arr_size, sizeof(BC_UINT_T), 0);

Member

nielsdos May 15, 2024

I think this deserves a comment to explain why the addition can't overflow in practice.
I.e. let's say that N is the max of n1len and n2len (and a multiple of BC_MUL_UINT_DIGITS for simplicity), then this sum is <= N/BC_MUL_UINT_DIGITS + N/BC_MUL_UINT_DIGITS + N/BC_MUL_UINT_DIGITS + N/BC_MUL_UINT_DIGITS - 1 which is equal to N - 1 if BC_MUL_UINT_DIGITS is 4, and N/2 - 1 if BC_MUL_UINT_DIGITS is 8.

Member Author

SakiTakamachi May 15, 2024

Added, thx!

SakiTakamachi and others added 3 commits

May 16, 2024 07:44


          fixed type

c7c977a

Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>


          rm zero space

dab7e15


          added comment

48b96df

nielsdos approved these changes

View reviewed changes

Member

nielsdos left a comment

Seems right. Maybe @Girgias wants to have a final look?

Girgias approved these changes

View reviewed changes

Member

Girgias left a comment

Looks fine to me

SakiTakamachi merged commit 1d38656 into php:master

8 of 10 checks passed

SakiTakamachi deleted the refactor_bcmath_mul branch

May 20, 2024 11:29

nielsdos mentioned this pull request

Bcmath multiply speedup #14276

Merged

Member

iluuu1994 commented May 22, 2024

This broken 32-bit builds, see https://github.com/php/php-src/actions/runs/9183895160/job/25255358600. @SakiTakamachi Please have a look.

Member Author

SakiTakamachi commented May 22, 2024

Okay, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment