Go1.17，为什么性能可以提升5%

go1.17发布blog中有这么一段描述本次大版本更新带来的性能普遍提升：

This release brings additional improvements to the compiler, namely a new way of passing function arguments and results. This change has shown about a 5% performance improvement in Go programs and reduction in binary sizes of around 2% for amd64 platforms. Support for more platforms will come in future releases.

简单来说就是，通过一种新的传参和传返回值的方法，实现了性能提升5%。go1.17发布新特性总结传送门：go1.17发布

验证一下

下面我们通过benchmark来验证一下，go1.16版本和go1.17版本的性能变化，测试代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


package main
import (
    "testing"
)
func FibRecursion(n int) int {
    switch {
    case n < 2:
        return n
    default:
        return FibRecursion(n-1) + FibRecursion(n-2)
    }
}
func BenchmarkFibRecursion(b *testing.B) {
    for i := 0; i < b.N; i++ {
        FibRecursion(20)
    }
}

被测试的方法FibRecursion是一个斐波那契数列生成方法，之所以选择它，是因为根据官方的文档，性能的提升来自传参及传返回值的优化，而FibRecursion是递归运行的，方法调用次数多，这样才容易进行对比。

下面是在go1.16和go1.17两个版本下运行的benchmark测试结果：

go1.16版本的测试结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


PS E:\liuwei\go1617> gov16
已切换：go1.16.3
PS E:\liuwei\go1617> go version
go version go1.16.3 windows/amd64
PS E:\liuwei\go1617> go test --bench=. --benchmem=true
goos: windows
goarch: amd64
pkg: go1617
cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
BenchmarkFibRecursion-4            25707             48020 ns/op               0 B/op          0 allocs/op
PASS
ok      go1617  1.825s

go1.17版本的测试结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


PS E:\liuwei\go1617> gov17
已切换：go1.17
PS E:\liuwei\go1617> go version
go version go1.17 windows/amd64
PS E:\liuwei\go1617> go test --bench=. --benchmem=true
goos: windows
goarch: amd64
pkg: go1617
cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
BenchmarkFibRecursion-4            33096             35767 ns/op               0 B/op          0 allocs/op
PASS
ok      go1617  1.625s

go1.16的48020 ns/op和go1.17的35767 ns/op，对比下来，远不止5%的提升。

哪里带来的优化

不懂就看文档，关于本次性能优化的官方文档地址：https://golang.org/doc/go1.17#compiler。文档开头一句如下：

Go 1.17 implements a new way of passing function arguments and results using registers instead of the stack. 简单翻译，就是go1.17使用寄存器代替栈来传递参数和返回值。懂的都懂，寄存器比内存快，即使内存能够缓存到cpu的高速缓存中，也还是寄存器更快。

实践出真知，我们再通过反汇编看一下，传参和返回值到底有什么变化：

示例代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


package main

func main() {
	foo(3)
}

//go:noinline
func foo(n int) int {
	n = 5
	return n
}

输出go1.16的汇编代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


PS E:\liuwei\go1617> go version
go version go1.16.3 windows/amd64
PS E:\liuwei\go1617> go build -o main.exe main.go
PS E:\liuwei\go1617> go tool objdump -s main main.exe


TEXT main.main(SB) E:/liuwei/go1617/main.go
  main.go:3             0x4626c0                65488b0c2528000000      MOVQ GS:0x28, CX
  main.go:3             0x4626c9                488b8900000000          MOVQ 0(CX), CX
  main.go:3             0x4626d0                483b6110                CMPQ 0x10(CX), SP
  main.go:3             0x4626d4                762a                    JBE 0x462700
  main.go:3             0x4626d6                4883ec18                SUBQ $0x18, SP
  main.go:3             0x4626da                48896c2410              MOVQ BP, 0x10(SP)
  main.go:3             0x4626df                488d6c2410              LEAQ 0x10(SP), BP
  main.go:4             0x4626e4                48c7042403000000        MOVQ $0x3, 0(SP)    //将传入foo方法的参数写入栈内存
  main.go:4             0x4626ec                e82f000000              CALL main.foo(SB)   //进行foo方法调用
  main.go:5             0x4626f1                488b6c2410              MOVQ 0x10(SP), BP
  main.go:5             0x4626f6                4883c418                ADDQ $0x18, SP
  main.go:5             0x4626fa                c3                      RET
  main.go:3             0x4626fb                0f1f440000              NOPL 0(AX)(AX*1)
  main.go:3             0x462700                e89b87ffff              CALL runtime.morestack_noctxt(SB)
  main.go:3             0x462705                ebb9                    JMP main.main(SB)


TEXT main.foo(SB) E:/liuwei/go1617/main.go
  main.go:10            0x462720                48c744241005000000      MOVQ $0x5, 0x10(SP) //将返回值写入栈内存
  main.go:10            0x462729                c3                      RET

输出go1.17的汇编代码如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


PS E:\liuwei\go1617> gov17
已切换：go1.17
PS E:\liuwei\go1617> go version
go version go1.17 windows/amd64
PS E:\iuwei\go1617> go build -o main.exe main.go
PS E:\liuwei\go1617> go tool objdump -s main main.exe

TEXT main.main(SB) E:/liuwei/go1617/main.go
  main.go:3             0x45aae0                493b6610                CMPQ 0x10(R14), SP
  main.go:3             0x45aae4                7622                    JBE 0x45ab08
  main.go:3             0x45aae6                4883ec10                SUBQ $0x10, SP
  main.go:3             0x45aaea                48896c2408              MOVQ BP, 0x8(SP)
  main.go:3             0x45aaef                488d6c2408              LEAQ 0x8(SP), BP
  main.go:4             0x45aaf4                b803000000              MOVL $0x3, AX       //将传入foo方法的参数写入寄存器
  main.go:4             0x45aaf9                e822000000              CALL main.foo(SB)   //进行foo方法调用
  main.go:5             0x45aafe                488b6c2408              MOVQ 0x8(SP), BP
  main.go:5             0x45ab03                4883c410                ADDQ $0x10, SP
  main.go:5             0x45ab07                c3                      RET
  main.go:3             0x45ab08                e8f386ffff              CALL runtime.morestack_noctxt.abi0(SB)
  main.go:3             0x45ab0d                ebd1                    JMP main.main(SB)


TEXT main.foo(SB) E:/liuwei/go1617/main.go
  main.go:10            0x45ab20                b805000000              MOVL $0x5, AX       //将返回值写入寄存器
  main.go:10            0x45ab25                c3                      RET

go1.16版本将参数和返回值存在栈内存里进行传递，而go1.17将参数和返回值通过寄存器传递。

一方面cpu访问寄存器速度远远高于访问栈内存，另一方面，使用寄存器传递参数也减少了出入栈操作。

Contents

验证一下

哪里带来的优化