Doing multiplication into the result directly. #17

aameen951 · 2020-05-11T13:23:25Z

Sorry for not submitting a pull request, I'm working on a bignum library and I saw your implementation of multiplication so I just want to point out that you can simplify it by doing the multiply and the add in one loop into the result bn as follow:

typedef unsigned int           u32;
typedef unsigned long long int u64;
void _bn_add_mul_u32(u32 *out, u32 *a, size_t a_len, u32 b)
{
  u64 c = 0;
  size_t i;
  for(i=0; i<a_len; i++)
  {
    u64 a_w = a[i];
    u64 out_w = out[i];
    u64 o = out_w + a_w * b + c;
    c = o >> 32;
    out[i] = o & 0xffffffff;
  }
  for(; c; i++)
  {
    u64 o = out[i] + c;
    c = o >> 32;
    out[i] = o & 0xffffffff;
  }
}
void _bn_mul(u32 *out, u32 *a, size_t a_len, u32 *b, size_t b_len)
{
  for(size_t i=0; i<b_len; i++)
  {
    u32 b_w = b[i];
    _bn_add_mul_u32(out+i, a, a_len, b_w);
  }
}

kokke · 2020-06-29T21:05:35Z

Hey @aameen951 - sorry for the slow response.

Thank you for the tip, and no worries about not leaving a PR - a tip like this with code and all, is valuable to me :)

However, "straight from the hip" I am not completely sure what it would translate to, in my library - I would have to take some time to understand each step in your example code, and translate it.

Unfortunately, it is a task that will have to wait a bit - but maybe in my summer's vacation :)

Anyways - thanks for the contribution Ameen. I will leave this issue open and mark it as TODO/help wanted, in case anyone can easily grasp your idea, and make a PR, before I get to it.

JL2014 · 2022-11-09T11:30:04Z

Hi @kokke,

Your library is very useful and clear, I add one ⭐ .

However, I use calculations which require millions of calls to the bignum_* functions
and I notice an extremely slow execution time (hours)
whereas the same algorithm with standard types is instantaneous (one second).

Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?

Here are my stats for one execution of my program :

-------------------------------------
| bignum function | number of calls |
-------------------------------------
|   bignum_inc    |           65972 |
|   bignum_mul    |        19856726 |
|   bignum_cmp    |        20516421 |
|   bignum_add    |        13193836 |
|  bignum_divmod  |        13193836 |
| bignum_from_int |         7256613 |
-------------------------------------

Greetings, JL.

kokke · 2022-11-29T09:20:46Z

Hi @JL2014 and thanks for your kind words :)

This library is using very simplistic algorithms and not much optimization at all.

Multiplication and division is usually quite slow, so that's where I would start looking.

For better performance, you could consider using GMP https://gmplib.org/ -- very clear code as well, but also very optimized and fast.

Do you have any ideas like the one proposed by @aameen951 to optimize the "big" loops in your library ?

A simple optimization would be to add a "counter"/bitmap of which words have bits set, so you can skip them. E.g. when adding 1 +1, only the least-significant-word is needed.

I think several guys have proposed PRs and optimizations in issues over the time, I just haven't had time/energy to incorporate them or even review much of it.

kokke added good first issue help wanted labels Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doing multiplication into the result directly. #17

Doing multiplication into the result directly. #17

aameen951 commented May 11, 2020

kokke commented Jun 29, 2020

JL2014 commented Nov 9, 2022 •

edited

kokke commented Nov 29, 2022

Doing multiplication into the result directly. #17

Doing multiplication into the result directly. #17

Comments

aameen951 commented May 11, 2020

kokke commented Jun 29, 2020

JL2014 commented Nov 9, 2022 • edited

kokke commented Nov 29, 2022

JL2014 commented Nov 9, 2022 •

edited