Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go Optimizations 101 #214

Open
ivanburak opened this issue Feb 4, 2022 · 4 comments
Open

Go Optimizations 101 #214

ivanburak opened this issue Feb 4, 2022 · 4 comments

Comments

@ivanburak
Copy link

ivanburak commented Feb 4, 2022

2.7 Value copy scenarios

Example 1
This example is more efficient

func Sum_RangeSliceIdx(a []int) (r int) { for i := range a { r += a[i] } return }

image

@go101
Copy link
Owner

go101 commented Feb 5, 2022

It is true for Go toolchain 1.17, the reason might be for-range loops are not inline-able before 1.18.
For Go toolchain 1.18 beta 2, the benchmark results are similar if the //go:noinline directives are added before all these functions.

But here, the example is intended to explain value copy costs, so I don't want to make it more complicated.
One thing I really need to do is to add the //go:noinline directives for the functions in the next release version of the book.

@go101
Copy link
Owner

go101 commented Feb 5, 2022

It looks there are some weirdness in the code inline module of the standard Go compiler v1.18.
Here are the benchmark results for different compiler versions and whether or not the functions are inlined.

(for 1.17, with //go:noinline directives):

Benchmark_Sum_RangeArray-4       	 1307961	       944.6 ns/op
Benchmark_Sum_RangeArrayPtr1-4   	 1457169	       773.3 ns/op
Benchmark_Sum_RangeArrayPtr2-4   	 2106256	       562.2 ns/op
Benchmark_Sum_RangeSlice-4       	 2093174	       559.3 ns/op
Benchmark_Sum_RangeSliceIdx-4    	 2271555	       550.1 ns/op

(for 1.17, without //go:noinline directives):

Benchmark_Sum_RangeArray-4       	 1307254	       914.8 ns/op
Benchmark_Sum_RangeArrayPtr1-4   	 1549593	       813.3 ns/op
Benchmark_Sum_RangeArrayPtr2-4   	 2113741	       588.4 ns/op
Benchmark_Sum_RangeSlice-4       	 2091764	       558.0 ns/op
Benchmark_Sum_RangeSliceIdx-4    	 2236640	       547.1 ns/op

(for 1.18 beta 2, with //go:noinline directives):

Benchmark_Sum_RangeArray-4       	 1223260	       983.8 ns/op
Benchmark_Sum_RangeArrayPtr1-4   	 1402024	       794.7 ns/op
Benchmark_Sum_RangeArrayPtr2-4   	 2094290	       571.6 ns/op
Benchmark_Sum_RangeSlice-4       	 2140164	       560.0 ns/op
Benchmark_Sum_RangeSliceIdx-4    	 2277823	       525.3 ns/op

(for 1.18 beta 2, without //go:noinline directives):

Benchmark_Sum_RangeArray-4       	 1556587	       760.2 ns/op
Benchmark_Sum_RangeArrayPtr1-4   	 1582272	       772.8 ns/op
Benchmark_Sum_RangeArrayPtr2-4   	 2286620	       550.6 ns/op
Benchmark_Sum_RangeSlice-4       	 2187531	       550.0 ns/op
Benchmark_Sum_RangeSliceIdx-4    	 2007074	       623.0 ns/op

The results for 1.17 are expected.
For 1.18, the results of Sum_RangeSliceIdx are some werid.
The not-inlined (inlined) Sum_RangeSliceIdx is too slower than expected.

@go101
Copy link
Owner

go101 commented Feb 5, 2022

The simplified benchmark code:

package copycost

import "testing"

const N = 1024

func Sum_RangeSliceIdx_Inline(a []int) (r int) {
	for i := range a {
		r += a[i]
	}
	return
}

//go:noinline
func Sum_RangeSliceIdx_NoInline(a []int) (r int) {
	for i := range a {
		r += a[i]
	}
	return
}

func buildArray() [N]int {
	var a [N]int
	for i := 0; i < N; i++ {
		a[i] = (N - i) & i
	}
	return a
}

var r [128]int

func Benchmark_Sum_RangeSliceIdx_Inline(b *testing.B) {
	var a = buildArray()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r[i&127] = Sum_RangeSliceIdx_Inline(a[:])
	}
}

func Benchmark_Sum_RangeSliceIdx_NoInline(b *testing.B) {
	var a = buildArray()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r[i&127] = Sum_RangeSliceIdx_NoInline(a[:])
	}
}

The benchmark results:

$ go version
go version go1.18beta2 linux/amd64

$ go test -bench=.
goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_Sum_RangeSliceIdx_Inline-4     	 1996579	       603.2 ns/op
Benchmark_Sum_RangeSliceIdx_NoInline-4   	 2176748	       545.0 ns/op

$ go version
go version go1.17.6 linux/amd64

$ go test -bench=.
...
Benchmark_Sum_RangeSliceIdx_Inline-4     	 2201970	       534.3 ns/op
Benchmark_Sum_RangeSliceIdx_NoInline-4   	 2250338	       535.2 ns/op

@go101
Copy link
Owner

go101 commented Feb 5, 2022

I submitted an issue here: golang/go#51028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants