Improving performance on Chunk #3301

ahoy-jon · 2020-04-05T23:44:09Z

Chunk could be improved in different ways.

Concat could be specialized and improved with:

case class ConcatArr[A](body: Array[Arr[A]], tail:Array[Arr[A]]) extends NonEmpty[A]
case class Concat[A](body: Array[NonEmpty[A]], tail:Array[NonEmpty[A]]) extends NonEmpty[A]

This change could help a lot with #3233 .
The tail of those structures would be used to hold the element just added to the end of the structure.

For example:

val c1 = Arr(Array(1,2,3))
val c2 = c1 + 4
//ConcatArr(body = Array(c),  tail = Array(4))
val c3 = c2 + 5
//ConcatArr(body = Array(c), tail = Array(4,5))
val c4 = c2 + 6 + 7 + 8 + 9 + 10 + 11
//ConcatArr(body = Array(c, Arr(Array(4, 5, 6, 7, 8, 9, 10, 11)), tail = Array())

Slice could be removed and pushed to Arr and VectorChunk

case class Arr[A](array: Array[A], offset: Int, l: Int) extends NonEmpty[A]
case class VectorChunk[A](private val vector: Vector[A], offset: Int, l: Int) extends NonEmpty[A]

materialize could be optimized, keep Concat and group substructures.
most of the complex components should have private "smart constructors".

private def arr[A](array: Array[A]): Arr[A] =
    Arr(array, ClassTag(array.getClass.getComponentType))

private case class Arr[A](private val array: Array[A], ct: ClassTag[A]) extends NonEmpty[A] {

What do you think about those changes?

The text was updated successfully, but these errors were encountered:

Vilkina · 2020-04-06T09:42:03Z

I'll take that ticket

ahoy-jon · 2020-04-06T17:31:46Z

Interesting resources :

jdegoes · 2020-04-13T17:57:50Z

Materialize should not be changed, it's purpose is to force the chunk to be strict (no laziness), so making it lazy which just defeat the point of its existence.
How would Concat specialization improve anything?
Removing Slice could be useful. "Inlining" it into Concat (concat would have to force it down into the left and right children, discarding, if necessary, left or right when they are entirely sliced out), Vector, and Array chunks would create more flattening and reduce nesting.

ahoy-jon · 2020-04-13T18:13:36Z

it could be changed so it doesn't rewrite already optimized parts of the structures.
The specialization could help with the different layers of the Tree, Level0 (ConcatArr) for structures up to 1024 elements, LevelN (Concat). The tail helps with append on the right.
was done then removed in a previous PR (NonEmptyChunk), will reintroduce it when possible.

jdegoes · 2020-04-13T19:08:48Z

The goal of materialize is to allocate an array, and copy all the data inside there, then wrap the array into a chunk. I think it can be optimized somewhat, but mainly by using System.arraycopy to copy the array parts into the target array.
Ok, I see. Yes, I think Concat could benefit from richer structure. See here for what Scala's vector is currently using (a radix balanced finger tree).
Benchmarks should guide the deletion of Slice. I'd try to slice a bit, then slice some more, then slice some more, etc., building up Slice(Slice(Slice(...))) very deeply, exercising all possible cases, and then show the performance improvement possible with "slice fusion".

Looking forward to seeing this work!

ahoy-jon · 2020-04-14T01:17:36Z

Scala Vector and Clojure Vector are using a tree structure with leaf at 32 elements. It seems that the page size makes is great bellow 256 elements. Materializing on large chunks could have a cost (to be proven with benchmarks).
I took the time to read it before proposing the issue. A version with 1 level and a tail (like the PersistentVector from Clojure) could be faster to implement.
Slice fusion requires a lot less allocation + ... I will do a PR just for that one!

ahoy-jon mentioned this issue Apr 7, 2020

WIP, Balancing on ++ #3311

Closed

2 tasks

ahoy-jon closed this as completed Apr 13, 2020

jdegoes reopened this Apr 14, 2020

ahoy-jon mentioned this issue Apr 21, 2020

Implement fast single-element append for Chunk #3371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving performance on Chunk #3301

Improving performance on Chunk #3301

ahoy-jon commented Apr 5, 2020 •

edited

Vilkina commented Apr 6, 2020

ahoy-jon commented Apr 6, 2020

jdegoes commented Apr 13, 2020

ahoy-jon commented Apr 13, 2020

jdegoes commented Apr 13, 2020

ahoy-jon commented Apr 14, 2020

Improving performance on Chunk #3301

Improving performance on Chunk #3301

Comments

ahoy-jon commented Apr 5, 2020 • edited

Vilkina commented Apr 6, 2020

ahoy-jon commented Apr 6, 2020

jdegoes commented Apr 13, 2020

ahoy-jon commented Apr 13, 2020

jdegoes commented Apr 13, 2020

ahoy-jon commented Apr 14, 2020

ahoy-jon commented Apr 5, 2020 •

edited