Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Vector concatenation #10159

Merged
merged 6 commits into from Nov 30, 2022
Merged

Conversation

ansvonwa
Copy link
Contributor

So far, concatenating two Vectors v1 and v2 of lengths $m$ (v1) and $n$ (v2) took $O(n)$ time.

This PR reduces the time needed to $O(\min(m, n))$.

For $m\lt n$, instead of pre-filling the VectorBuilder with v1 and traversing all of v2, we now set an offset, such that after traversing v1 and adding it's elements, the structure of the Builder does exactly match v2's structure. This way, we can reuse arrays of v2 and don't need to iterate over all of v2.

As an example, take the brown ("old misaligned") and red ("misaligned") line in the following plot. It shows times taken by concatenation of two Vectors with 1 million entries in total. The percentage states the relative size of v1. (So, $20\%$ means $m=2\times 10^5, n=8\times10^5$.)
The values are in $\frac{\text{ns}}{\text{operation}}$. See also benchmark source.

The brown ("old misaligned") line shows the time for worst case (which is the case in 31 of 32 cases) of the old implementation.
The red ("misaligned") line shows times with this optimization. You see the improvement for $m\lt n$.

s c i Vector concat

The way in which this optimization is implemented, yields another improvement. In special cases, time goes down to $O(\log_{32}(m+n))$, so basically zero.

In the old implementation, if the data arrays of the Vectors align (i.e. (v1.suffix1.length + v2.prefix1.length) % 32 == 0), there was only one call to System.arraycopy for each data array in v2, instead of two.
Thus, there was a small performance improvement for aligned Vectors (green and pale blue).

Now, we check for every slice whether it aligns to the Builder's structure. If that is the case, we reuse the old Array instead of copying it's content. We do this on every level. Thus, if the Vectors align at the lowest level ((v1.suffix1.length + v2.prefix1.length) % WIDTH == 0; $\frac1{32}$ chance for random vectors), we get the improvement shown by the yellow line. If the Vectors align at all levels ((v1.suffix1…X.length + v2.prefix1…X.length) % WIDTHX == 0), we get the dark blue times, so just a hand full of operations.


In short, the four commits do

  1. fix old issues, mostly related to large Vectors,
  2. detect where slices align and reuse arrays,
  3. add a method (alignTo) to set the offset and another one to drop leading nulls in Arrays before constructing the result,
  4. change bit-arithmetics to a switch with hardcoded values.
    This should improve performance slightly, but I did not really get significantly shorter times in the benchmarks.
    It's mostly a matter of taste, but with the switch, it looks and feels more like the rest of the Vector class.

closes scala/bug#4442

…concatenation

* misleading indentation
* `a6 = new Arr6(LASTWIDTH)`:
  The whole point of LASTWIDTH is that Arr6 may be larger than the other arrays.
  This is considered in all initializations except here.
  The added test shows how initFrom failed so far on arrays with size in
  (Int.MaxValue/2, Int.MaxValue] and works now.
* `else if (xor < WIDTH6) { // level = 5`:
  Similarly to the last one, Arr6 may have a size up to LASTWIDTH, so this check
  fails for valid Arr6s with size in [32,64]. About WIDTH6, see next point.
* `BITS6` and `WIDTH6`:
  There is no valid usage for them. The BITSn and WIDHTn variables mark which bits
  of the index correspond to which level, or more precisely Vectors up to level n
  have sizes of [0, WIDTHn). Thus, the correct value of BITS6 would be (BITS * 6 + 1),
  thus WIDTH6 would be Int.MaxValue+1, one "more" than the longest possible vector
  of level 6.
  The current value is just wrong, as it arbitrarily "cuts" the range of 6 bits
  which compose the index in Arr6 in 1 high and 5 lower bits.
  In fact, the line above was the only place where one of `BITS6` and `WIDTH6` was used.
  I suspect this was a little oversight because of the irregularity of level 6.
* add alignTo and leftAlignPrefix in VectorBuilder:
  alignTo(n: Int, v: Vector) ensures that if there are added n elements
  before Vector v, the Builder is aligned to the structure of v.
  This way, the Arrays of v can be reused in the Builder, which improves
  performance drastically.
  Formally, alignTo sets `len1`, `lenRest` and the internal arrays as
  if there were $x = len1 + lenRest$ elements already added, such that
  $x + n + prefixLenght(v)$ is a  multiple of the maximum prefix length
  of a Vector of v's level.
  The number $x$ is stored in `offset` and the new variable
  `prefixIsRightAligned` is set true to record the fact that in each
  prefix array, the length is WIDTH and the content is right-aligned.
  This is different to the normal case, where the prefixes are either
  full and size WIDTH or, if `initFrom` was used, full and trimmed.
  To remove the leading null entries in the prefix vectors, a call to
  leftAlignPrefixes is needed. This is done at the beginning of
  result().
* prependedAll0 and appendedAll0:
  In case that an IterableOnce (left side; may be a Vector itself) and a
  Vector (right side) are concatenated and the left one is shorter
  than the right Vector, a VectorBuilder aligned to the right Vector is
  used.
  `++` and `concat` forward to `appendedAll`, so they profit from this
  optimization.
* add tests and benchmark
@scala-jenkins scala-jenkins added this to the 2.13.10 milestone Sep 25, 2022
@ansvonwa

This comment was marked as resolved.

@som-snytt

This comment was marked as duplicate.

@lrytz lrytz modified the milestones: 2.13.10, 2.13.11 Sep 26, 2022
@SethTisue SethTisue added release-notes worth highlighting in next release notes library:collections PRs involving changes to the standard collection library labels Sep 29, 2022
@SethTisue SethTisue requested a review from a team September 29, 2022 04:34
@SethTisue SethTisue changed the title Faster Vector concatenation Faster Vector concatenation Sep 29, 2022
@SethTisue
Copy link
Member

Thank you — this looks like it will a fantastic addition to Scala 2.13.11! Finding reviewers might take a bit of time.

@SethTisue
Copy link
Member

re: variety in test data, how about adding something to test/scalacheck?

Copy link
Member

@retronym retronym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing. Thanks for presenting it so clearly.

I can't think of any performance tradeoffs to consider, am I right in thinking this is all upside?

If so, the risk seems to be a data-depedent bug in the new implementation. I've highlighted a few branches that aren't covered by the s.c.i.VectorTest (although may well be covered by other broader tests, I didn't run the full suite.). Let's try to expand the unit test coverage to at least cover these.

I'm always a bit wary of overflow bugs, but I don't see anything obvious here. I understand @szeiger might be able to take a look at this change, his experience with this implementation would be more likely to spot that sort of problem.

Copy link
Member

@retronym retronym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand unit test to cover more branches of the new implementation.

@SethTisue
Copy link
Member

We also have generative testing in the https://github.com/scala/scala-collection-laws repository.

The Scala 2 community build runs scala-collection-laws against merged PRs, but we can also do runs ourselves locally against the unmerged PR, using the PR validation snapshot (as documented in the README here in scala/scala).

In my clone of the scala-collection-laws repo, just now I did:

set Global / resolvers += "pr" at "https://scala-ci.typesafe.com/artifactory/scala-pr-validation-snapshots/"
++2.13.9-bin-e4b872a-SNAPSHOT!
test

and it passed, so that's encouraging. We should do the check again just before merge.

@SethTisue
Copy link
Member

SethTisue commented Oct 25, 2022

This might be a change where trying enabling scoverage during testing would have good cost/benefit ratio.

@ansvonwa
Copy link
Contributor Author

I can't think of any performance tradeoffs to consider, am I right in thinking this is all upside?

Mostly, yes. The only thing: As the calls to alignTo and leftAlignPrefix take some (small but mostly constant) time, one may want to choose this optimization only if left.length < right.length + c for left ++ right and a small constant c, instead of the current left.length < right.length. This is only relevant for rather small Vectors that are large enough to trigger the optimization. I haven't actually tested how much time alignTo and leftAlignPrefix take, maybe I should.

Another (more theoretical) downside I could think of is, that as aligned Vectors are now much faster to concatenate: If one only tests with those, they're code may run much slower than expected at production time. But imho definitely no reason to slow these cases down ;)

I'm always a bit wary of overflow bugs

Most should be catched by the <= 0 check in advance1, but I could add tests for that as well.
And note that an overflow in the builder could already happen when constructing Vectors of size about (Int.MaxValue - WIDTH5 + 2) if the offset is large enough (i.e. prefix is not full). This also yields an ioob.


Something else: Even though this "optimizes Vector concatenation" (maybe asymptotically even as much as possible with the current homogeneous Vector structure), maybe scala/bug#4442 should stay open?
There is still the possibility to introduce some form of heterogeneous structure that allows $O(log(n))$ concatenation also for misaligned Vectors. I'm currently experimenting with this, and it looks promising, but it's definitely not possible without some tradeoffs in (random) access times, at least for large Vectors that were obtained by concatenating many misaligned medium-sized Vectors.

@retronym

This comment was marked as resolved.

@SethTisue

This comment was marked as resolved.

@ansvonwa

This comment was marked as resolved.

@som-snytt

This comment was marked as resolved.

@ansvonwa

This comment was marked as resolved.

@ansvonwa

This comment was marked as outdated.

* `leftAlignPrefix()` now keeps the length of the outermost array at
  (LAST)WIDTH.
  This is necessary because after a call to `result()`, the VB may be
  used further.
* Allow arbitrarily misleading/wrong arguments in `alignTo()`
  Should not fail now, even if the hint given is completely nonsensical,
  `leftAlignPrefix()` now compensates for that.
  Negative alignments are allowed, e.g. if you plan to drop elements of
  the vector to be added.
* Add a bunch of tests to cover previously uncovered branches.
@som-snytt

This comment was marked as resolved.

@retronym

This comment was marked as resolved.

@SethTisue

This comment was marked as resolved.

* To cempensate for alignTo's (small, but constant) overhead, it is only
called when the right Vector is mor then 64 elemente longer than the
left one.
* replaced some `….knownSize` by `k`
@ansvonwa
Copy link
Contributor Author

ansvonwa commented Nov 2, 2022

In addition to the remaining tests:

  • I made sure VectorBuilder is reusable again,
  • Made benchmarks on what the best offset is to compensate for alignTo's time (and came to 64 entries; So only if the RHS is 64 entries longer than the LHS, we use alignTo)
  • The (previously good) optimization in prependedAll0 and appendedAll0 is not optimal anymore. This is only, because concatenation via VB got faster, so it shouldn't be much of a problem. Anyway, in the long term, that should be adjusted accordingly. I think, as there is no drawback compared to before, this could be done in a different PR?

@retronym
Copy link
Member

Sorry for the delay! This looks good to go, thanks for going the extra mile.

@retronym retronym merged commit 6d7f884 into scala:2.13.x Nov 30, 2022
@SethTisue
Copy link
Member

SethTisue commented Jun 14, 2023

@ansvonwa I gave this PR top billing in the Scala 2.13.11 release notes. also posted some kudos at https://fosstodon.org/@SethTisue/110540603310328217 . thank you again for this valuable work.

@ansvonwa
Copy link
Contributor Author

@SethTisue Wow, thank you!
Btw there may be something similar coming within the next year or so :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library release-notes worth highlighting in next release notes
Projects
None yet
6 participants