Skip to content

Conversation

joshlemer
Copy link
Contributor

@joshlemer joshlemer commented Feb 12, 2019

One critique often made against Vectors is that small Vectors consume a lot of memory (See Li Haoyi's post here ). Previous to this PR, Vectors of sizes in range [1, 31] would always round up and allocate an Array of size 32. Now, when creating a Vector with known size < 32, only exactly that size of an Array is allocated.

As well, when creating from varargs of AnyRefs (collection.immutable.ArraySeq[T <: AnyRef]) of size <= 32, we reuse the underlying Array[AnyRef].

Performance of creating a Vector from varargs is greatly improved, for both AnyRef and AnyVal types.

Benchmarks

Code:

@Benchmark def apply5String(bh: Blackhole): Unit = { bh.consume(Vector("1", "2", "3", "4", "5")) } @Benchmark def apply5Int(bh: Blackhole): Unit = { bh.consume(Vector(1,2,3,4,5)) } @Benchmark def apply5StringOld(bh: Blackhole): Unit = { bh.consume(Vector.applyOld("1", "2", "3", "4", "5")) } @Benchmark def apply5IntOld(bh: Blackhole): Unit = { bh.consume(Vector.applyOld(1,2,3,4,5)) }

GC-profiled benchmark:

[info] Benchmark Mode Cnt Score Error Units [info] VectorBenchmark.apply5Int avgt 10 37.862 ± 0.941 ns/op [info] VectorBenchmark.apply5Int:·gc.alloc.rate avgt 10 3223.910 ± 77.807 MB/sec [info] VectorBenchmark.apply5Int:·gc.alloc.rate.norm avgt 10 192.000 ± 0.001 B/op [info] VectorBenchmark.apply5Int:·gc.churn.PS_Eden_Space avgt 10 3222.198 ± 273.634 MB/sec [info] VectorBenchmark.apply5Int:·gc.churn.PS_Eden_Space.norm avgt 10 191.965 ± 17.591 B/op [info] VectorBenchmark.apply5Int:·gc.churn.PS_Survivor_Space avgt 10 0.110 ± 0.071 MB/sec [info] VectorBenchmark.apply5Int:·gc.churn.PS_Survivor_Space.norm avgt 10 0.007 ± 0.004 B/op [info] VectorBenchmark.apply5Int:·gc.count avgt 10 106.000 counts [info] VectorBenchmark.apply5Int:·gc.time avgt 10 83.000 ms [info] VectorBenchmark.apply5IntOld avgt 10 73.371 ± 0.792 ns/op [info] VectorBenchmark.apply5IntOld:·gc.alloc.rate avgt 10 2979.859 ± 32.103 MB/sec [info] VectorBenchmark.apply5IntOld:·gc.alloc.rate.norm avgt 10 344.000 ± 0.001 B/op [info] VectorBenchmark.apply5IntOld:·gc.churn.PS_Eden_Space avgt 10 2984.372 ± 249.212 MB/sec [info] VectorBenchmark.apply5IntOld:·gc.churn.PS_Eden_Space.norm avgt 10 344.551 ± 29.454 B/op [info] VectorBenchmark.apply5IntOld:·gc.churn.PS_Survivor_Space avgt 10 0.100 ± 0.061 MB/sec [info] VectorBenchmark.apply5IntOld:·gc.churn.PS_Survivor_Space.norm avgt 10 0.012 ± 0.007 B/op [info] VectorBenchmark.apply5IntOld:·gc.count avgt 10 106.000 counts [info] VectorBenchmark.apply5IntOld:·gc.time avgt 10 80.000 ms [info] VectorBenchmark.apply5String avgt 10 10.452 ± 0.080 ns/op [info] VectorBenchmark.apply5String:·gc.alloc.rate avgt 10 5835.117 ± 45.558 MB/sec [info] VectorBenchmark.apply5String:·gc.alloc.rate.norm avgt 10 96.000 ± 0.001 B/op [info] VectorBenchmark.apply5String:·gc.churn.PS_Eden_Space avgt 10 5730.019 ± 452.277 MB/sec [info] VectorBenchmark.apply5String:·gc.churn.PS_Eden_Space.norm avgt 10 94.270 ± 7.385 B/op [info] VectorBenchmark.apply5String:·gc.churn.PS_Survivor_Space avgt 10 0.104 ± 0.049 MB/sec [info] VectorBenchmark.apply5String:·gc.churn.PS_Survivor_Space.norm avgt 10 0.002 ± 0.001 B/op [info] VectorBenchmark.apply5String:·gc.count avgt 10 112.000 counts [info] VectorBenchmark.apply5String:·gc.time avgt 10 91.000 ms [info] VectorBenchmark.apply5StringOld avgt 10 71.918 ± 1.024 ns/op [info] VectorBenchmark.apply5StringOld:·gc.alloc.rate avgt 10 3110.849 ± 44.149 MB/sec [info] VectorBenchmark.apply5StringOld:·gc.alloc.rate.norm avgt 10 352.000 ± 0.001 B/op [info] VectorBenchmark.apply5StringOld:·gc.churn.PS_Eden_Space avgt 10 3125.554 ± 234.281 MB/sec [info] VectorBenchmark.apply5StringOld:·gc.churn.PS_Eden_Space.norm avgt 10 353.583 ± 23.029 B/op [info] VectorBenchmark.apply5StringOld:·gc.churn.PS_Survivor_Space avgt 10 0.100 ± 0.063 MB/sec [info] VectorBenchmark.apply5StringOld:·gc.churn.PS_Survivor_Space.norm avgt 10 0.011 ± 0.007 B/op [info] VectorBenchmark.apply5StringOld:·gc.count avgt 10 119.000 counts [info] VectorBenchmark.apply5StringOld:·gc.time avgt 10 88.000 ms 

Edit:
Some benchmark for Vector.from(ArrayBuffer)

 val ab = collection.mutable.ArrayBuffer(1,2,3,4,5) @Benchmark def from5Old(bh: Blackhole): Unit = { bh.consume(Vector.fromOld(ab)) } @Benchmark def from5(bh: Blackhole): Unit = { bh.consume(Vector.from(ab)) }
[info] Benchmark Mode Cnt Score Error Units [info] VectorBenchmark.from5 avgt 10 39.882 ± 0.785 ns/op [info] VectorBenchmark.from5:·gc.alloc.rate avgt 10 2295.068 ± 46.150 MB/sec [info] VectorBenchmark.from5:·gc.alloc.rate.norm avgt 10 144.000 ± 0.001 B/op [info] VectorBenchmark.from5:·gc.churn.PS_Eden_Space avgt 10 2325.590 ± 232.257 MB/sec [info] VectorBenchmark.from5:·gc.churn.PS_Eden_Space.norm avgt 10 145.898 ± 13.829 B/op [info] VectorBenchmark.from5:·gc.churn.PS_Survivor_Space avgt 10 0.106 ± 0.052 MB/sec [info] VectorBenchmark.from5:·gc.churn.PS_Survivor_Space.norm avgt 10 0.007 ± 0.003 B/op [info] VectorBenchmark.from5:·gc.count avgt 10 104.000 counts [info] VectorBenchmark.from5:·gc.time avgt 10 94.000 ms [info] VectorBenchmark.from5Old avgt 10 68.592 ± 1.548 ns/op [info] VectorBenchmark.from5Old:·gc.alloc.rate avgt 10 2743.165 ± 61.659 MB/sec [info] VectorBenchmark.from5Old:·gc.alloc.rate.norm avgt 10 296.000 ± 0.001 B/op [info] VectorBenchmark.from5Old:·gc.churn.PS_Eden_Space avgt 10 2717.767 ± 422.149 MB/sec [info] VectorBenchmark.from5Old:·gc.churn.PS_Eden_Space.norm avgt 10 293.202 ± 44.510 B/op [info] VectorBenchmark.from5Old:·gc.churn.PS_Survivor_Space avgt 10 0.090 ± 0.052 MB/sec [info] VectorBenchmark.from5Old:·gc.churn.PS_Survivor_Space.norm avgt 10 0.010 ± 0.005 B/op [info] VectorBenchmark.from5Old:·gc.count avgt 10 86.000 counts [info] VectorBenchmark.from5Old:·gc.time avgt 10 79.000 ms 
@joshlemer joshlemer added WIP performance the need for speed. usually compiler performance, sometimes runtime performance. library:collections PRs involving changes to the standard collection library labels Feb 12, 2019
@scala-jenkins scala-jenkins added this to the 2.13.1 milestone Feb 12, 2019
@joshlemer joshlemer changed the title Vectors smaller than 32 allocate or reuse small arrays when possible. [WIP][do not merge]Vectors smaller than 32 allocate or reuse small arrays when possible. Feb 12, 2019
@joshlemer joshlemer force-pushed the small-vectors branch 3 times, most recently from ef7b1bd to f8c01a6 Compare February 13, 2019 04:40
@joshlemer
Copy link
Contributor Author

Unfortunately I am seeing about a 40% slowdown in this code:

 @Benchmark def appended0To33(bh: Blackhole): Unit = { var v: Vector[Int] = Vector.empty[Int] var i = 0 while (i < 33) { v = v.appended(i) i += 1 } bh.consume(v) }

hoping to avoid that..

@joshlemer
Copy link
Contributor Author

joshlemer commented Feb 14, 2019

❗ (tentatively) Fixed it, turned that around to a 50-70% speedup over 2.13.x, and on par with ArraySeq

 // this branch [info] Benchmark Mode Cnt Score Error Units [info] VectorBenchmark.appended0To32 avgt 10 853.292 ± 17.276 ns/op [info] VectorBenchmark.appended0To32ArraySeq avgt 10 813.642 ± 204.851 ns/op [info] VectorBenchmark.appended0To33 avgt 10 942.780 ± 8.932 ns/op // 2.13.x [info] Benchmark Mode Cnt Score Error Units [info] VectorBenchmark.appended0To32 avgt 10 3371.617 ± 1899.937 ns/op [info] VectorBenchmark.appended0To33 avgt 10 2699.577 ± 85.111 ns/op 

in the more general case...

 @Param(Array("1", "10", "100", "1000", "10000")) var size: Int = _ @Benchmark def appended0ToN(bh: Blackhole): Unit = { var v: Vector[Int] = Vector.empty[Int] var i = 0 while (i < size) { v = v.appended(i) i += 1 } bh.consume(v) }
branch small-vectors [info] Benchmark (size) Mode Cnt Score Error Units [info] VectorBenchmark.appended0ToN 1 avgt 10 9.776 ± 0.220 ns/op [info] VectorBenchmark.appended0ToN 10 avgt 10 229.207 ± 11.370 ns/op [info] VectorBenchmark.appended0ToN 100 avgt 10 7135.687 ± 301.753 ns/op [info] VectorBenchmark.appended0ToN 1000 avgt 10 90524.928 ± 867.974 ns/op [info] VectorBenchmark.appended0ToN 10000 avgt 10 946733.072 ± 14830.945 ns/op branch 2.13.x [info] Benchmark (size) Mode Cnt Score Error Units [info] VectorBenchmark.appended0ToN 1 avgt 10 17.879 ± 0.230 ns/op [info] VectorBenchmark.appended0ToN 10 avgt 10 840.940 ± 9.183 ns/op [info] VectorBenchmark.appended0ToN 100 avgt 10 9266.415 ± 277.926 ns/op [info] VectorBenchmark.appended0ToN 1000 avgt 10 106326.725 ± 4853.579 ns/op [info] VectorBenchmark.appended0ToN 10000 avgt 10 935528.214 ± 11043.096 ns/op 
@joshlemer joshlemer force-pushed the small-vectors branch 4 times, most recently from 821065a to 07f352f Compare February 15, 2019 02:50
s.display0(lo) = value.asInstanceOf[AnyRef]
val thisLength = length
val result =
if (depth == 1 && thisLength < 32) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch and the Vector.single branch are the only changed branches, the reset of this method is just indentation

} else {
val shift = startIndex & ~((1 << (5 * (depth - 1))) - 1)
val shiftBlocks = startIndex >>> (5 * (depth - 1))
} else if (thisLength > 0) {
Copy link
Contributor Author

@joshlemer joshlemer Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly indentation, plus changing condition to have a bit more obvious of logic:

old:
if (startIndex != endIndex)
new:
if (thisLength > 0) // thisLength already computed anyways

@joshlemer joshlemer changed the title [WIP][do not merge]Vectors smaller than 32 allocate or reuse small arrays when possible. Vectors smaller than 32 allocate or reuse small arrays when possible. Mar 22, 2019
@joshlemer joshlemer removed the WIP label Mar 22, 2019
@adriaanm adriaanm modified the milestones: 2.13.1, 2.13.0-RC1 Mar 26, 2019
System.arraycopy(display0, startIndex, newDisplay0, 1, thisLength)
newDisplay0(0) = value.asInstanceOf[AnyRef]
s.display0 = newDisplay0
s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 already Covered by the releaseFence below.

System.arraycopy(display0, startIndex, newDisplay0, 0, thisLength)
newDisplay0(thisLength) = value.asInstanceOf[AnyRef]
s.display0 = newDisplay0
s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 already Covered by the releaseFence below.

@retronym
Copy link
Member

Just pushed a commit with the fences

Copy link
Member

@retronym retronym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to these slimmer Vectors!

@SethTisue
Copy link
Member

SethTisue commented Mar 27, 2019

rebased (trivial merge conflict involving imports), let's merge once CI likes it

@SethTisue SethTisue merged commit 12cde30 into scala:2.13.x Mar 27, 2019
@joshlemer joshlemer deleted the small-vectors branch March 27, 2019 17:22
@SethTisue
Copy link
Member

SethTisue commented Mar 30, 2019

Looking forward to these slimmer Vectors!

yeah, it's awesome this got in, thanks Josh

@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Apr 4, 2019
@SethTisue SethTisue changed the title Vectors smaller than 32 allocate or reuse small arrays when possible. Vectors smaller than 32 allocate or reuse small arrays when possible Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. release-notes worth highlighting in next release notes

6 participants