New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Vector
concatenation
#10159
Faster Vector
concatenation
#10159
Conversation
…concatenation * misleading indentation * `a6 = new Arr6(LASTWIDTH)`: The whole point of LASTWIDTH is that Arr6 may be larger than the other arrays. This is considered in all initializations except here. The added test shows how initFrom failed so far on arrays with size in (Int.MaxValue/2, Int.MaxValue] and works now. * `else if (xor < WIDTH6) { // level = 5`: Similarly to the last one, Arr6 may have a size up to LASTWIDTH, so this check fails for valid Arr6s with size in [32,64]. About WIDTH6, see next point. * `BITS6` and `WIDTH6`: There is no valid usage for them. The BITSn and WIDHTn variables mark which bits of the index correspond to which level, or more precisely Vectors up to level n have sizes of [0, WIDTHn). Thus, the correct value of BITS6 would be (BITS * 6 + 1), thus WIDTH6 would be Int.MaxValue+1, one "more" than the longest possible vector of level 6. The current value is just wrong, as it arbitrarily "cuts" the range of 6 bits which compose the index in Arr6 in 1 high and 5 lower bits. In fact, the line above was the only place where one of `BITS6` and `WIDTH6` was used. I suspect this was a little oversight because of the irregularity of level 6.
* add alignTo and leftAlignPrefix in VectorBuilder: alignTo(n: Int, v: Vector) ensures that if there are added n elements before Vector v, the Builder is aligned to the structure of v. This way, the Arrays of v can be reused in the Builder, which improves performance drastically. Formally, alignTo sets `len1`, `lenRest` and the internal arrays as if there were $x = len1 + lenRest$ elements already added, such that $x + n + prefixLenght(v)$ is a multiple of the maximum prefix length of a Vector of v's level. The number $x$ is stored in `offset` and the new variable `prefixIsRightAligned` is set true to record the fact that in each prefix array, the length is WIDTH and the content is right-aligned. This is different to the normal case, where the prefixes are either full and size WIDTH or, if `initFrom` was used, full and trimmed. To remove the leading null entries in the prefix vectors, a call to leftAlignPrefixes is needed. This is done at the beginning of result(). * prependedAll0 and appendedAll0: In case that an IterableOnce (left side; may be a Vector itself) and a Vector (right side) are concatenated and the left one is shorter than the right Vector, a VectorBuilder aligned to the right Vector is used. `++` and `concat` forward to `appendedAll`, so they profit from this optimization. * add tests and benchmark
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
Thank you — this looks like it will a fantastic addition to Scala 2.13.11! Finding reviewers might take a bit of time. |
re: variety in test data, how about adding something to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing. Thanks for presenting it so clearly.
I can't think of any performance tradeoffs to consider, am I right in thinking this is all upside?
If so, the risk seems to be a data-depedent bug in the new implementation. I've highlighted a few branches that aren't covered by the s.c.i.VectorTest
(although may well be covered by other broader tests, I didn't run the full suite.). Let's try to expand the unit test coverage to at least cover these.
I'm always a bit wary of overflow bugs, but I don't see anything obvious here. I understand @szeiger might be able to take a look at this change, his experience with this implementation would be more likely to spot that sort of problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand unit test to cover more branches of the new implementation.
We also have generative testing in the https://github.com/scala/scala-collection-laws repository. The Scala 2 community build runs scala-collection-laws against merged PRs, but we can also do runs ourselves locally against the unmerged PR, using the PR validation snapshot (as documented in the README here in scala/scala). In my clone of the scala-collection-laws repo, just now I did:
and it passed, so that's encouraging. We should do the check again just before merge. |
This might be a change where trying enabling scoverage during testing would have good cost/benefit ratio. |
Mostly, yes. The only thing: As the calls to Another (more theoretical) downside I could think of is, that as aligned
Most should be catched by the Something else: Even though this "optimizes Vector concatenation" (maybe asymptotically even as much as possible with the current homogeneous |
9df5774
to
fbd5a55
Compare
This comment was marked as resolved.
This comment was marked as resolved.
fbd5a55
to
a5786df
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
* `leftAlignPrefix()` now keeps the length of the outermost array at (LAST)WIDTH. This is necessary because after a call to `result()`, the VB may be used further. * Allow arbitrarily misleading/wrong arguments in `alignTo()` Should not fail now, even if the hint given is completely nonsensical, `leftAlignPrefix()` now compensates for that. Negative alignments are allowed, e.g. if you plan to drop elements of the vector to be added. * Add a bunch of tests to cover previously uncovered branches.
This comment was marked as resolved.
This comment was marked as resolved.
a5786df
to
a30168d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
* To cempensate for alignTo's (small, but constant) overhead, it is only called when the right Vector is mor then 64 elemente longer than the left one. * replaced some `….knownSize` by `k`
In addition to the remaining tests:
|
Sorry for the delay! This looks good to go, thanks for going the extra mile. |
@ansvonwa I gave this PR top billing in the Scala 2.13.11 release notes. also posted some kudos at https://fosstodon.org/@SethTisue/110540603310328217 . thank you again for this valuable work. |
@SethTisue Wow, thank you! |
So far, concatenating two Vectors$m$ ($n$ ($O(n)$ time.
v1
andv2
of lengthsv1
) andv2
) tookThis PR reduces the time needed to$O(\min(m, n))$ .
For$m\lt n$ , instead of pre-filling the
VectorBuilder
withv1
and traversing all ofv2
, we now set an offset, such that after traversingv1
and adding it's elements, the structure of the Builder does exactly matchv2
's structure. This way, we can reuse arrays ofv2
and don't need to iterate over all ofv2
.As an example, take the brown ("old misaligned") and red ("misaligned") line in the following plot. It shows times taken by concatenation of two Vectors with 1 million entries in total. The percentage states the relative size of$20\%$ means $m=2\times 10^5, n=8\times10^5$ .)$\frac{\text{ns}}{\text{operation}}$ . See also benchmark source.
v1
. (So,The values are in
The brown ("old misaligned") line shows the time for worst case (which is the case in 31 of 32 cases) of the old implementation.$m\lt n$ .
The red ("misaligned") line shows times with this optimization. You see the improvement for
The way in which this optimization is implemented, yields another improvement. In special cases, time goes down to$O(\log_{32}(m+n))$ , so basically zero.
In the old implementation, if the data arrays of the Vectors align (i.e.
(v1.suffix1.length + v2.prefix1.length) % 32 == 0
), there was only one call toSystem.arraycopy
for each data array inv2
, instead of two.Thus, there was a small performance improvement for aligned Vectors (green and pale blue).
Now, we check for every slice whether it aligns to the Builder's structure. If that is the case, we reuse the old Array instead of copying it's content. We do this on every level. Thus, if the Vectors align at the lowest level ($\frac1{32}$ chance for random vectors), we get the improvement shown by the yellow line. If the Vectors align at all levels (
(v1.suffix1.length + v2.prefix1.length) % WIDTH == 0
;(v1.suffix1…X.length + v2.prefix1…X.length) % WIDTHX == 0
), we get the dark blue times, so just a hand full of operations.In short, the four commits do
alignTo
) to set the offset and another one to drop leadingnull
s in Arrays before constructing the result,This should improve performance slightly, but I did not really get significantly shorter times in the benchmarks.
It's mostly a matter of taste, but with the switch, it looks and feels more like the rest of the Vector class.
closes scala/bug#4442