New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized BigInt implementation #8932
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks for tackling this. I’m on my phone but a few questions:
Are you sure there is test coverage for all these changes? I can imagine we need to be more careful and really exercise all the branches now.
Can we commit any benchmarks to compare to BigInteger and java.lang.Long? The goal, I imagine here is be only a bit slower than a boxed Long but have the ability to scale up. Would be nice to demonstrate that in benchmarks (sorry if I overlooked them in this PR).
Ok, MiMa complains about
even though that (new) constructor is private. Is that a problem in practice? |
That's not a problem in practice. The constructor is not accessible from Scala from a different file (although it would be accessible from Java), so it cannot break linking across different artifacts.
I believe that's OK. It's not source compatible (the member's not stable anymore, which means you cannot |
New encoding which caches a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the caching, noticed something that looks like a bug.
(We use |
Changes
Help needed
I'm using the following JMH annotations.
Preliminary benchmark resultsI'm comparing:
For the RSA benchmark, I get:
For the factorial benchmark: Data of the factorial benchmark below: 1. Latest 2.13.x library
2. Optimized BigInt without cache
3. Optimized BigInt with cache
|
Caching in
|
Mabye https://github.com/lightbend/benchdb could be helpful? It's very new.
Most errors seem to be around 1%, which seems reasonable no? scala/scala-dev#338 has information, probably more than you need :-) Generally, I could imagine results being less stable on laptops because they usually don't have enough cooling to keep the CPU frequency stable.
Hmm, 10 minutes is not that much for a benchmark, but obviously annoying if you can't hand it off to a different machine. In fact you might actually want to do more than 2 forks if you need more results. Did you check (manually, by looking at the numbers) that 10 warmup iterations are enough? Maybe you can do a log scale on the sizes that you measure and thin it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the same comment over and over: why are we doing new BigInt(long)
rather than BigInt.apply(long)
and seeing if we can find it in the cache.
So new, that I think I can't access it! Is the project still private?
I'm seeing the std dev magnitude fluctuating a lot between benchmark instances, so I was wondering if it comes from JIT instability, or CPU performance instability (throttling etc)
I'll move the benchmarks to a dedicated machine, and will use two tips: disable HT and Turbo Boost.
I can reduce the number of test cases now. How many forks do people usually ask for? I'm cargo culting the JMH parameters for now. I guess to decide on the number of warmup iterations, I should set a large number like 50, and then check manually the convergence (average, std dev) in the output? And also perform a larger number of forks, and compare the behaviour across forks? And it's time to write a small tool to interpret/plot the benchmark results (I'm cut and pasting things in OpenOffice for now). Final question: for such microbenchmarks, is there a difference when the test runs for longer, in that the benchmark will start to include the code memory pressure? Imagine two variants, V1 and V2, where V1 doesn't allocate, and V2 allocates a bit. Then the performance profile in the benchmark will depend on whether the benchmark runs until V2 triggers GC, right? Is there an accepted standard on how to benchmark this fairly? (I'm now reading this: https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf but I'd welcome links to resources/books. Any information I get, I'll submit a scala/scala PR to the bench project README as to not lose that info.) |
When benchmarking on a laptop, make sure you have Turbo Boost disabled: |
oops 😄 i'll let you know!
that's exactly what benchdb would do
That's what I'd do, yeah. I personally don't have a lot of experience with bechmarking and JMH. Maybe @retronym can debunk some of the bad advice I'm giving here.
Yes, I think that's right. Quantifying the benefit of reducing the allocation rate of an implementation is extremely difficult because the effect on GC is unpredictable. There can even be unexpected side-effects, like a lot of short-lived allocation can be cheaper (handled by minor GCs) compared to a long-lived cache that gets promoted to old gen. I assume JMH would do its best to force a GC before starting to measure a benchmark? I don't know. There's also the option to enable a profiler while running benchmarks https://hg.openjdk.java.net/code-tools/jmh/file/b6f87aa2a687/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java#l52, |
The benchdb repository is public now, please take a look and feel free to give it a try! It was developed and used by @szeiger while he was working on the new Vector implementation that landed in 2.13.2 (#8534). |
Denis: thanks for the reply about when to use |
@denisrosset could you rebase and squash the commits, please? Other than that would you say this is ready for merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments.
Thanks a lot for pushing on this. It would be really amazing to improve the performance of BigInt.
All BigInt creations pass by the BigInt.apply method Disabled implicit conversions in BigInt own's code Introduced BigInt.bigTag as an alias of Long.MinValue Removed unnecessary return statements
As far as I see, we had the following concerns:
Haven't seen a reply, but I don't see any blockers.
I've looked at all conversations, and made the changes.
I did not merge benchmarking code; it's been a while! I'd be happy to have a second look at it later.
Are we OK?
Agreed! Code coverage-wise (by the ScalaCheck test):
For the testing, are we confident that the BigInt code is exercised by external code (crypto? math?)? |
We still need the MiMa filter, as we add a new (private) constructor to a class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more comments.
@denisrosset we may build 2.13.6 as soon as Mon May 17, so there's still a chance this could make it in. care to respond to Oscar's latest review round? |
There is presumably some usage in the community build, which will run after the PR is merged. The license question is still on my plate; I'll get to it this week. Note that before final merge this will need a rebase so that every commit passes CI. Simply squashing to a single commit would be okay, but @denisrosset if you want to preserve a little more history than that, that's fine too as long as every commit passes CI. |
@SethTisue If we have the time, I'll rebuild a series of commits in another branch to preserve some history. Otherwise, I'm happy with a squash. Concerning the license: there is another (and likely faster!) implementation of the Long gcd in Guava, which is Apache 2-licensed. Should we switch? |
Perfect! Yes |
@johnynek @SethTisue The code in this PR and #9628 are now exactly the same, except that #9628 has a cleaner history which should make things easier to review. |
Let's merge #9628 on Thursday unless there is further review feedback. |
This augments the
BigInt
class with aLong
field, so that small integers can be handled without allocating ajava.math.BigInteger
.The main motivation is to make
BigInt
a "safe integer" type with a reasonable performance profile in the case it is mainly used to storeLong
-sized integers; whereas the overhead of the introduced special case is dwarfed by thejava.math.BigInteger
machinery.To maintain binary compatibility even if
BigInt
is a final class, I added a main constructor parameter of typeLong
; that main constructor is private. The original constructor is now a secondary constructor taking a singleBigInteger
argument. Logic inside the constructor decides whether to store the value as aBigInteger
orLong
argument.This introduces a small memory overhead when
BigInt
stores aBigInteger
(but in turn,BigInteger
is a pretty mighty class which allocates an array).Points to be discussed:
MiMA doesn't complain: but the
val bigInteger: BigInteger
constructor argument is now adef bigInteger: BigInteger
body method. Is that OK?This needs to be benchmarked. What would be a relevant benchmark in the Scala community build?
The GCD optimization rests on a Long GCD code lifted from the Spire library, which is under the MIT License. Is such a code integration permitted by the Scala CLA?
The Java API documentation allows for big integers with a bit length bigger than what an
Int
can store. I'm usingbitLength
liberally in the new implementation, I don't know how the code would behave in these cases. On the other hand, the old code callsbitLength
a few times before having code to handle the "overly big BigInteger speical case". Shouldn'tbitLength
throw an exception beforehand (the API specifies that methods should throw an exception instead of returning an invalid value, but this point isn't clear). What should we do there? (It's interesting to find this type of undefined behaviour in the Java API).