Faster `Vector` concatenation #10159

ansvonwa · 2022-09-25T00:27:02Z

So far, concatenating two Vectors v1 and v2 of lengths $m$ (v1) and $n$ (v2) took $O(n)$ time.

This PR reduces the time needed to $O(\min(m, n))$.

For $m\lt n$, instead of pre-filling the VectorBuilder with v1 and traversing all of v2, we now set an offset, such that after traversing v1 and adding it's elements, the structure of the Builder does exactly match v2's structure. This way, we can reuse arrays of v2 and don't need to iterate over all of v2.

As an example, take the brown ("old misaligned") and red ("misaligned") line in the following plot. It shows times taken by concatenation of two Vectors with 1 million entries in total. The percentage states the relative size of v1. (So, $20\%$ means $m=2\times 10^5, n=8\times10^5$.)
The values are in $\frac{\text{ns}}{\text{operation}}$. See also benchmark source.

The brown ("old misaligned") line shows the time for worst case (which is the case in 31 of 32 cases) of the old implementation.
The red ("misaligned") line shows times with this optimization. You see the improvement for $m\lt n$.

The way in which this optimization is implemented, yields another improvement. In special cases, time goes down to $O(\log_{32}(m+n))$, so basically zero.

In the old implementation, if the data arrays of the Vectors align (i.e. (v1.suffix1.length + v2.prefix1.length) % 32 == 0), there was only one call to System.arraycopy for each data array in v2, instead of two.
Thus, there was a small performance improvement for aligned Vectors (green and pale blue).

Now, we check for every slice whether it aligns to the Builder's structure. If that is the case, we reuse the old Array instead of copying it's content. We do this on every level. Thus, if the Vectors align at the lowest level ((v1.suffix1.length + v2.prefix1.length) % WIDTH == 0; $\frac1{32}$ chance for random vectors), we get the improvement shown by the yellow line. If the Vectors align at all levels ((v1.suffix1…X.length + v2.prefix1…X.length) % WIDTHX == 0), we get the dark blue times, so just a hand full of operations.

In short, the four commits do

fix old issues, mostly related to large Vectors,
detect where slices align and reuse arrays,
add a method (alignTo) to set the offset and another one to drop leading nulls in Arrays before constructing the result,
change bit-arithmetics to a switch with hardcoded values.
This should improve performance slightly, but I did not really get significantly shorter times in the benchmarks.
It's mostly a matter of taste, but with the switch, it looks and feels more like the rest of the Vector class.

closes scala/bug#4442

…concatenation * misleading indentation * `a6 = new Arr6(LASTWIDTH)`: The whole point of LASTWIDTH is that Arr6 may be larger than the other arrays. This is considered in all initializations except here. The added test shows how initFrom failed so far on arrays with size in (Int.MaxValue/2, Int.MaxValue] and works now. * `else if (xor < WIDTH6) { // level = 5`: Similarly to the last one, Arr6 may have a size up to LASTWIDTH, so this check fails for valid Arr6s with size in [32,64]. About WIDTH6, see next point. * `BITS6` and `WIDTH6`: There is no valid usage for them. The BITSn and WIDHTn variables mark which bits of the index correspond to which level, or more precisely Vectors up to level n have sizes of [0, WIDTHn). Thus, the correct value of BITS6 would be (BITS * 6 + 1), thus WIDTH6 would be Int.MaxValue+1, one "more" than the longest possible vector of level 6. The current value is just wrong, as it arbitrarily "cuts" the range of 6 bits which compose the index in Arr6 in 1 high and 5 lower bits. In fact, the line above was the only place where one of `BITS6` and `WIDTH6` was used. I suspect this was a little oversight because of the irregularity of level 6.

* add alignTo and leftAlignPrefix in VectorBuilder: alignTo(n: Int, v: Vector) ensures that if there are added n elements before Vector v, the Builder is aligned to the structure of v. This way, the Arrays of v can be reused in the Builder, which improves performance drastically. Formally, alignTo sets `len1`, `lenRest` and the internal arrays as if there were $x = len1 + lenRest$ elements already added, such that $x + n + prefixLenght(v)$ is a multiple of the maximum prefix length of a Vector of v's level. The number $x$ is stored in `offset` and the new variable `prefixIsRightAligned` is set true to record the fact that in each prefix array, the length is WIDTH and the content is right-aligned. This is different to the normal case, where the prefixes are either full and size WIDTH or, if `initFrom` was used, full and trimmed. To remove the leading null entries in the prefix vectors, a call to leftAlignPrefixes is needed. This is done at the beginning of result(). * prependedAll0 and appendedAll0: In case that an IterableOnce (left side; may be a Vector itself) and a Vector (right side) are concatenated and the left one is shorter than the right Vector, a VectorBuilder aligned to the right Vector is used. `++` and `concat` forward to `appendedAll`, so they profit from this optimization. * add tests and benchmark

SethTisue · 2022-10-08T12:40:07Z

Thank you — this looks like it will a fantastic addition to Scala 2.13.11! Finding reviewers might take a bit of time.

src/library/scala/collection/immutable/Vector.scala

SethTisue · 2022-10-25T06:20:18Z

re: variety in test data, how about adding something to test/scalacheck?

retronym

This looks amazing. Thanks for presenting it so clearly.

I can't think of any performance tradeoffs to consider, am I right in thinking this is all upside?

If so, the risk seems to be a data-depedent bug in the new implementation. I've highlighted a few branches that aren't covered by the s.c.i.VectorTest (although may well be covered by other broader tests, I didn't run the full suite.). Let's try to expand the unit test coverage to at least cover these.

I'm always a bit wary of overflow bugs, but I don't see anything obvious here. I understand @szeiger might be able to take a look at this change, his experience with this implementation would be more likely to spot that sort of problem.

retronym

Expand unit test to cover more branches of the new implementation.

SethTisue · 2022-10-25T06:32:08Z

We also have generative testing in the https://github.com/scala/scala-collection-laws repository.

The Scala 2 community build runs scala-collection-laws against merged PRs, but we can also do runs ourselves locally against the unmerged PR, using the PR validation snapshot (as documented in the README here in scala/scala).

In my clone of the scala-collection-laws repo, just now I did:

set Global / resolvers += "pr" at "https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots/"
++2.13.9-bin-e4b872a-SNAPSHOT!
test

and it passed, so that's encouraging. We should do the check again just before merge.

SethTisue · 2022-10-25T06:32:49Z

This might be a change where trying enabling scoverage during testing would have good cost/benefit ratio.

ansvonwa · 2022-10-25T12:08:41Z

I can't think of any performance tradeoffs to consider, am I right in thinking this is all upside?

Mostly, yes. The only thing: As the calls to alignTo and leftAlignPrefix take some (small but mostly constant) time, one may want to choose this optimization only if left.length < right.length + c for left ++ right and a small constant c, instead of the current left.length < right.length. This is only relevant for rather small Vectors that are large enough to trigger the optimization. I haven't actually tested how much time alignTo and leftAlignPrefix take, maybe I should.

Another (more theoretical) downside I could think of is, that as aligned Vectors are now much faster to concatenate: If one only tests with those, they're code may run much slower than expected at production time. But imho definitely no reason to slow these cases down ;)

I'm always a bit wary of overflow bugs

Most should be catched by the <= 0 check in advance1, but I could add tests for that as well.
And note that an overflow in the builder could already happen when constructing Vectors of size about (Int.MaxValue - WIDTH5 + 2) if the offset is large enough (i.e. prefix is not full). This also yields an ioob.

Something else: Even though this "optimizes Vector concatenation" (maybe asymptotically even as much as possible with the current homogeneous Vector structure), maybe scala/bug#4442 should stay open?
There is still the possibility to introduce some form of heterogeneous structure that allows $O(log(n))$ concatenation also for misaligned Vectors. I'm currently experimenting with this, and it looks promising, but it's definitely not possible without some tradeoffs in (random) access times, at least for large Vectors that were obtained by concatenating many misaligned medium-sized Vectors.

src/library/scala/collection/immutable/Vector.scala

* `leftAlignPrefix()` now keeps the length of the outermost array at (LAST)WIDTH. This is necessary because after a call to `result()`, the VB may be used further. * Allow arbitrarily misleading/wrong arguments in `alignTo()` Should not fail now, even if the hint given is completely nonsensical, `leftAlignPrefix()` now compensates for that. Negative alignments are allowed, e.g. if you plan to drop elements of the vector to be added. * Add a bunch of tests to cover previously uncovered branches.

* To cempensate for alignTo's (small, but constant) overhead, it is only called when the right Vector is mor then 64 elemente longer than the left one. * replaced some `….knownSize` by `k`

ansvonwa · 2022-11-02T12:44:37Z

In addition to the remaining tests:

I made sure VectorBuilder is reusable again,
Made benchmarks on what the best offset is to compensate for alignTo's time (and came to 64 entries; So only if the RHS is 64 entries longer than the LHS, we use alignTo)
The (previously good) optimization in prependedAll0 and appendedAll0 is not optimal anymore. This is only, because concatenation via VB got faster, so it shouldn't be much of a problem. Anyway, in the long term, that should be adjusted accordingly. I think, as there is no drawback compared to before, this could be done in a different PR?

retronym · 2022-11-30T03:17:18Z

Sorry for the delay! This looks good to go, thanks for going the extra mile.

SethTisue · 2023-06-14T04:09:41Z

@ansvonwa I gave this PR top billing in the Scala 2.13.11 release notes. also posted some kudos at https://fosstodon.org/@SethTisue/110540603310328217 . thank you again for this valuable work.

ansvonwa · 2023-06-17T14:43:45Z

@SethTisue Wow, thank you!
Btw there may be something similar coming within the next year or so :)

ansvonwa added 4 commits September 16, 2022 15:21

reuse Arrays in VectorBuilder.addVector(..) where possible

d7d696b

expand bit-manipulations to switch in Vector.addArrN

e4b872a

scala-jenkins added this to the 2.13.10 milestone Sep 25, 2022

This comment was marked as resolved.

Sign in to view

This comment was marked as duplicate.

Sign in to view

lrytz modified the milestones: 2.13.10, 2.13.11 Sep 26, 2022

SethTisue added release-notes worth highlighting in next release notes library:collections PRs involving changes to the standard collection library labels Sep 29, 2022

SethTisue requested a review from a team September 29, 2022 04:34

SethTisue changed the title ~~Faster Vector concatenation~~ Faster Vector concatenation Sep 29, 2022

retronym reviewed Oct 25, 2022

View reviewed changes

src/library/scala/collection/immutable/Vector.scala Outdated Show resolved Hide resolved

retronym reviewed Oct 25, 2022

View reviewed changes

retronym requested changes Oct 25, 2022

View reviewed changes

ansvonwa commented Oct 25, 2022

View reviewed changes

src/library/scala/collection/immutable/Vector.scala Outdated Show resolved Hide resolved

ansvonwa force-pushed the vector-faster-prepend branch from 9df5774 to fbd5a55 Compare October 26, 2022 00:17

This comment was marked as resolved.

Sign in to view

ansvonwa force-pushed the vector-faster-prepend branch from fbd5a55 to a5786df Compare October 26, 2022 10:38

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

ansvonwa mentioned this pull request Oct 26, 2022

ZipArchiveTest and ZipAndJarFileLookupFactoryTest fail under _some_ circumstances scala/bug#12677

Closed

ansvonwa force-pushed the vector-faster-prepend branch from a5786df to a30168d Compare October 27, 2022 23:48

This comment was marked as resolved.

Sign in to view

use VB.alignTo only if RHS is significantly larger

7b8c407

* To cempensate for alignTo's (small, but constant) overhead, it is only called when the right Vector is mor then 64 elemente longer than the left one. * replaced some `….knownSize` by `k`

ansvonwa requested a review from retronym November 2, 2022 12:24

retronym approved these changes Nov 30, 2022

View reviewed changes

retronym merged commit 6d7f884 into scala:2.13.x Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster `Vector` concatenation #10159

Faster `Vector` concatenation #10159

ansvonwa commented Sep 25, 2022

This comment was marked as resolved.

This comment was marked as duplicate.

SethTisue commented Oct 8, 2022

SethTisue commented Oct 25, 2022

retronym left a comment

retronym left a comment

SethTisue commented Oct 25, 2022

SethTisue commented Oct 25, 2022 •

edited

ansvonwa commented Oct 25, 2022

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

ansvonwa commented Nov 2, 2022

retronym commented Nov 30, 2022

SethTisue commented Jun 14, 2023 •

edited

ansvonwa commented Jun 17, 2023

Faster Vector concatenation #10159

Faster Vector concatenation #10159

Conversation

ansvonwa commented Sep 25, 2022

This comment was marked as resolved.

This comment was marked as duplicate.

SethTisue commented Oct 8, 2022

SethTisue commented Oct 25, 2022

retronym left a comment

Choose a reason for hiding this comment

retronym left a comment

Choose a reason for hiding this comment

SethTisue commented Oct 25, 2022

SethTisue commented Oct 25, 2022 • edited

ansvonwa commented Oct 25, 2022

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

ansvonwa commented Nov 2, 2022

retronym commented Nov 30, 2022

SethTisue commented Jun 14, 2023 • edited

ansvonwa commented Jun 17, 2023

Faster `Vector` concatenation #10159

Faster `Vector` concatenation #10159

SethTisue commented Oct 25, 2022 •

edited

SethTisue commented Jun 14, 2023 •

edited