Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Split arb_pow_fmpz_binexp into a ui and mpz version and implement proper arb_sqr #396

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

albinahlback
Copy link
Contributor

I am done with the former part, with addition of adding nn_sqr_2 which simplifies nn_mul_2x2 by skipping one umul_ppmm.

I am going to add arf_sqr_rnd_down and then implement arb_sqr from there.


mag_init(resr);
mag_fast_mul(resr, xm, arb_radref(x));
if (resr->exp < COEFF_MAX && resr->exp >= COEFF_MIN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are not needed; the exponent can be increment with risk of overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you meant without risk of overflow ;) Och god jul, Fredrik!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God jul :)


mag_init(resr);
mag_mul(resr, xm, arb_radref(x));
fmpz_add_ui(&(resr->exp), &(resr->exp), 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MAG_EXPREF

@@ -914,6 +926,12 @@ int arf_mul_rnd_down(arf_ptr z, arf_srcptr x, arf_srcptr y, slong prec);
? arf_mul_rnd_down(z, x, y, prec) \
: arf_mul_rnd_any(z, x, y, prec, rnd))

int arf_sqr_via_mpfr(arf_ptr res, arf_srcptr x, slong prec, arf_rnd_t rnd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pointless extra code; arf_mul_via_mpfr already does the job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It removes one branching and removes two/three useless assignments/calculations. My thinking with this PR is to strip all the "unnecessary" stuff away from squaring in order to make it as fast as possible. I know it's pretty small improvement, but you do not think that it's worth it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mul_via_mpfr is only used for products that take 1000+ cycles to compute so a few cycles here will not be measurable.

@@ -111,6 +111,18 @@ Types, macros and constants
test code. If in doubt, simply try with some convenient high precision
instead of using this special value, and check that the result is exact.

.. macro:: nn_mul_2x1(r2, r1, r0, a1, a0, b0)

Sets `(r_2, r_1, r_0)` to `(a_1, a_0)` multiplied by `b_0`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to document that these are limbs. Also, I think these macros don't allow aliasing, which might be worth documenting.

@@ -553,7 +565,8 @@ Addition and multiplication

.. function:: int arf_neg_round(arf_t res, const arf_t x, slong prec, arf_rnd_t rnd)

Sets *res* to `-x`.
Sets *res* to `-x`. Returns 0 if this operation was made exactly and 1 if
truncation occurred.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"truncation" is the wrong term (means rounding down) - should be "rounding".

{
slong cx = *x;

if (!COEFF_IS_MPZ(*res) && (cx > COEFF_MIN / 2 && cx < COEFF_MAX / 2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions are wrong. See _fmpz_add2_fast. Anyway, I think this function is not really needed; the compiler should be able to eliminate the redundant branches in _fmpz_add2_fast when passed the same operand twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has actually one property that I do not think the compiler can transform into, and that is that we now can allow cx to be up to size COEFF_MAX / 2 instead of COEFF_MAX / 4.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_fmpz_add2_fast/_fmpz_sub2_fast need to work for values of c larger than +/- 1. I think it's confusing if _fmpz_2times_fast in contrast to the other functions only supports |c| <= 1.

In practice, it makes no difference to go up to COEFF_MAX / 2 instead of COEFF_MAX / 4 since such huge exponents are very rare. We only care about optimizing for small exponents here. Better to be uniform and conservative.

Considering this, I don't really think this extra function is needed.

Clearly these functions and their assumptions should be documented...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will remove it then.

@fredrik-johansson
Copy link
Collaborator

Did you do some profiling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants