forked from golang/go
- Notifications
You must be signed in to change notification settings - Fork 0
cmd/internal/obj/riscv: support zawrs assembly on riscv64 #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
…kages Intrinsifying things inside the module (crypto/internal/fips140/subtle) is asking for trouble, as the import paths are rewritten by the GOFIPS140 mechanism, and we might have to support multiple modules in the future. Importing crypto/subtle from inside a FIPS 140-3 module is not allowed, and is basically asking for circular dependencies. Instead, break off the intrinsics into their own package (crypto/internal/constanttime), and keep the byte slice operations in crypto/internal/fips140/subtle. crypto/subtle then becomes a thin dispatch layer. Change-Id: I6a6a6964cd5cb5ad06e9d1679201447f5a811da4 Reviewed-on: https://go-review.googlesource.com/c/go/+/716120 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Since we always can get the address of `CALL runtime.deferreturn(SB)` from the unwinder, so it is not necessary to record the caller's pc in the _defer struct. For the stack allocated _defer, this CL makes the frame smaller. Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/716720 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Fixes golang#73794 Change-Id: I0a57db05aacfa805213fe8278fc727e76eb8a65e GitHub-Last-Rev: 3494d93 GitHub-Pull-Request: golang#73795 Reviewed-on: https://go-review.googlesource.com/c/go/+/674415 Reviewed-by: Sean Liao <sean@liao.dev> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
In our case, it greatly improves the performance of continuously collecting diff profiles from the net/http/pprof endpoint, such as /debug/pprof/allocs?seconds=30. This CL is a cherry-pick of my PR upstream: google/pprof#951 Benchmark of profile Parse func: goos: linux goarch: amd64 pkg: github.com/google/pprof/profile cpu: 13th Gen Intel(R) Core(TM) i7-1360P │ old-parse.txt │ new-parse.txt │ │ sec/op │ sec/op vs base │ Parse-16 62.07m ± 13% 55.54m ± 13% -10.52% (p=0.035 n=10) │ old-parse.txt │ new-parse.txt │ │ B/op │ B/op vs base │ Parse-16 47.56Mi ± 0% 41.09Mi ± 0% -13.59% (p=0.000 n=10) │ old-parse.txt │ new-parse.txt │ │ allocs/op │ allocs/op vs base │ Parse-16 272.9k ± 0% 175.8k ± 0% -35.58% (p=0.000 n=10) Change-Id: I737ff9b9f815fdc56bc3b5743403717c4b6f07fd GitHub-Last-Rev: a09108f GitHub-Pull-Request: golang#76145 Reviewed-on: https://go-review.googlesource.com/c/go/+/717081 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
Change-Id: I26302d801732f40b1fe6b30ff69d222047bca490 Reviewed-on: https://go-review.googlesource.com/c/go/+/716740 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change-Id: I0ea4d15da163cec6fe2a703376ce5a6032e15484 Reviewed-on: https://go-review.googlesource.com/c/go/+/714861 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
Knowing how many times cgo is used is useful information to have in the local telemetry database. It also opens the door for uploading them in the future if desired. Change-Id: Ia92b11fc489f015bbface7f28ed5a5c2871c44f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/707055 Reviewed-by: Michael Matloob <matloob@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Findley <rfindley@google.com> Reviewed-by: Michael Matloob <matloob@google.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
Move global variable to a field on the State type. Change-Id: I1edd32e1d28ce814bcd75501098ee4b22227546b Reviewed-on: https://go-review.googlesource.com/c/go/+/716162 Reviewed-by: Michael Matloob <matloob@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com>
[git-generate] cd src/cmd/go/internal/modload rf ' mv InitWorkfile State.InitWorkfile mv FindGoWork State.FindGoWork mv WillBeEnabled State.WillBeEnabled mv Enabled State.Enabled mv inWorkspaceMode State.inWorkspaceMode mv HasModRoot State.HasModRoot mv MustHaveModRoot State.MustHaveModRoot mv ModFilePath State.ModFilePath ' Change-Id: I207113868af037c9c0049f4207c3d3b4c19468bb Reviewed-on: https://go-review.googlesource.com/c/go/+/716602 Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: I43ea575aff87a3e420477cb26d35185d03df5ccc Reviewed-on: https://go-review.googlesource.com/c/go/+/713283 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
We've been seeing the flakes where we get a 'no errors' output on freebsd in addition to windows and solaris. Also allow that case to avoid flakes. For golang#73976 Change-Id: I6a6a696445ec908b55520d8d75e7c1f867b9c092 Reviewed-on: https://go-review.googlesource.com/c/go/+/715640 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Matloob <matloob@google.com> Reviewed-by: Ian Alexander <jitsu@google.com>
Support arm64 FMOVQ with large offset in immediate which is encoded using register offset instruction in opldrr or opstrr. This will help allowing folding immediate into new ssa ops FMOVQload and FMOVQstore. For example: FMOVQ F0, -20000(R0) is encoded as following: MOVD 3(PC), R27 FMOVQ F0, (R0)(R27) RET ffff b1e0 # constant value Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523 Reviewed-on: https://go-review.googlesource.com/c/go/+/716960 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
This commit adds test coverage of path building and name constraint verification using the suite of test data provided by Netflix's BetterTLS project. Since the uncompressed raw JSON test data exported by BetterTLS for external test integrations is ~31MB we use a similar approach to the BoGo and ACVP test integrations and fetch the BetterTLS Go module, and run its export tool on-the-fly to generate the test data in a tempdir. As expected, all tests pass currently and this coverage is mainly helpful in catching regressions, especially with tricky/cursed name constraints. Change-Id: I23d7c24232e314aece86bcbfd133b7f02c9e71b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/717420 TryBot-Bypass: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Roland Shoemaker <roland@golang.org> Auto-Submit: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Michael Pratt <mpratt@google.com>
This makes the user experience better, before users would receive an unknown godebug error message, now we explicitly mention that it was removed and link to go.dev/doc/godebug where users can find more information about the removal. Additionally we keep all the removed GODEBUGs in the source, making sure we do not reuse such GODEBUG after it is removed. Updates golang#72111 Updates golang#75316 Change-Id: I6a6a6964cce1c100108fdba4bfba7d13cd9a893a Reviewed-on: https://go-review.googlesource.com/c/go/+/701875 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Mateusz Poliwczak <mpoliwczak34@gmail.com> Reviewed-by: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Matloob <matloob@google.com>
When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is returned from Go to C, 1 package main 2 3 import ( 4 "C" 5 ) 6 7 //export foo 8 func foo(CLine *C.char) string { 9 return C.GoString(CLine) 10 } 11 12 13 func main() { 14 } The error message mentions the file/line of the cgo wrapper, panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer goroutine 17 [running, locked to thread]: panic({0x798f2341a4c0?, 0xc000112000?}) /usr/lib/go/src/runtime/panic.go:802 +0x168 runtime.cgoCheckArg(0x798f23417e20, 0xc000066e50, 0x0?, 0x0, {0x798f233f5a62, 0x42}) /usr/lib/go/src/runtime/cgocall.go:679 +0x35b runtime.cgoCheckResult({0x798f23417e20, 0xc000066e50}) /usr/lib/go/src/runtime/cgocall.go:795 +0x4b _cgoexp_3c910ddb72c4_foo(0x7ffc9fa9bfa0) _cgo_gotypes.go:65 +0x5d runtime.cgocallbackg1(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0) /usr/lib/go/src/runtime/cgocall.go:446 +0x289 runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0) /usr/lib/go/src/runtime/cgocall.go:350 +0x132 runtime.cgocallbackg(0x798f233ec780, 0x7ffc9fa9bfa0, 0x0) <autogenerated>:1 +0x2b runtime.cgocallback(0x0, 0x0, 0x0) /usr/lib/go/src/runtime/asm_amd64.s:1082 +0xcd runtime.goexit({}) /usr/lib/go/src/runtime/asm_amd64.s:1693 +0x1 The cgo wrapper (_cgoexp_3c910ddb72c4_foo) is located in a temporary build artifact (_cgo_gotypes.go) $ go tool cgo -objdir objdir parse.go $ cat -n objdir/_cgo_gotypes.go | sed -n '55,70p' 55 //go:cgo_export_dynamic foo 56 //go:linkname _cgoexp_d48770e267d1_foo _cgoexp_d48770e267d1_foo 57 //go:cgo_export_static _cgoexp_d48770e267d1_foo 58 func _cgoexp_d48770e267d1_foo(a *struct { 59 p0 *_Ctype_char 60 r0 string 61 }) { 62 a.r0 = foo(a.p0) 63 _cgoCheckResult(a.r0) 64 } The file/line of the export'ed function is expected in the error message. Use it in error messages. panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer goroutine 17 [running, locked to thread]: panic({0x7df72b1d8ae0?, 0x3ec8a1790030?}) /mnt/go/src/runtime/panic.go:877 +0x16f runtime.cgoCheckArg(0x7df72b1d62c0, 0x3ec8a16eee50, 0x68?, 0x0, {0x7df72b1ad44c, 0x42}) /mnt/go/src/runtime/cgocall.go:679 +0x35b runtime.cgoCheckResult({0x7df72b1d62c0, 0x3ec8a16eee50}) /mnt/go/src/runtime/cgocall.go:795 +0x4b _cgoexp_3c910ddb72c4_foo(0x7ffca1b21020) /mnt/tmp/parse.go:8 +0x5d runtime.cgocallbackg1(0x7df72b1a4360, 0x7ffca1b21020, 0x0) /mnt/go/src/runtime/cgocall.go:446 +0x289 runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0) /mnt/go/src/runtime/cgocall.go:350 +0x132 runtime.cgocallbackg(0x7df72b1a4360, 0x7ffca1b21020, 0x0) <autogenerated>:1 +0x2b runtime.cgocallback(0x0, 0x0, 0x0) /mnt/go/src/runtime/asm_amd64.s:1101 +0xcd runtime.goexit({}) /mnt/go/src/runtime/asm_amd64.s:1712 +0x1 So doing, fix typos in comments. Link: https://web.archive.org/web/20251008114504/https://dave.cheney.net/2018/01/08/gos-hidden-pragmas Suggested-by: Keith Randall <khr@golang.org> For golang#75856 Change-Id: I0bf36d5c8c5c0c7df13b00818bc4641009058979 GitHub-Last-Rev: e65839c GitHub-Pull-Request: golang#76118 Reviewed-on: https://go-review.googlesource.com/c/go/+/716441 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> The panic calls gopanic which may have write barriers, but castogscanstatus is called from //go:nowritebarrier contexts. The panic is dead code anyway, and appears immediately before a call to 'throw'. Change-Id: I4a8e296b71bf002295a3aa1db4f723c305ed939a Reviewed-on: https://go-review.googlesource.com/c/go/+/717406 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of all goroutines, and searches for a specific line in that stack trace. It currently sometimes fails because it encounters the goroutine its looking for in the small window where a goroutine might be in _Grunning while in a syscall, introduced in CL 646198. In that case, the traceback will give up, failing to print the stack TestCopyError is expecting. This represents a general regression, since previously runtime.Stack could never fail to take a goroutine's stack; giving up was only possible in fatal panic cases. Fix this the same way we fixed goroutine profiles: allow the stack trace to proceed if the g's syscallsp != 0. This is safe in any stop-the-world-related context, because syscallsp won't be mutated while the goroutine fails to acquire a P, and thus fails to fully exit the syscall context. This also means the stack below syscallsp won't be mutated, and thus taking a traceback is also safe. Fixes golang#66639. Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Should make cmd/link/internal/ld.TestAbstractOriginSanity happier. Change-Id: I121927d42e527ff23d996e7387066f149b11cc59 Reviewed-on: https://go-review.googlesource.com/c/go/+/717480 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
…upport Go asm syntax: VPERMIW $0x1b, vj, vd XVPERMI{W,V,Q} $0x1b, xj, xd Equivalent platform assembler syntax: vpermi.w vd, vj, $0x1b xvpermi.{w,d,q} xd, xj, $0x1b Change-Id: Ie23b2fdd09b4c93801dc804913206f1c5a496268 Reviewed-on: https://go-review.googlesource.com/c/go/+/716800 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> …en vector registers Go asm syntax: VMOVQ Vj, Vd XVMOVQ Xj, Xd Equivalent platform assembler syntax: vslli.d vd, vj, 0x0 xvslli.d xd, xj, 0x0 Change-Id: Ifddc3d4d3fbaa6fee2e079bf2ebfe96a2febaa1c Reviewed-on: https://go-review.googlesource.com/c/go/+/716801 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The exact meaning of pow10 was not defined nor tested directly. Define it as pow10(e) returns mant, exp where mant/2^128 * 2**exp = 10^e. This is the most natural definition but is off-by-one from what it had been returning. Fix the off-by-one and then adjust the call sites to stop compensating for it. Change-Id: I9ee475854f30be4bd0d4f4d770a6b12ec68281fe Reviewed-on: https://go-review.googlesource.com/c/go/+/717180 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
ftoaFixed is in the next CL; this proves the tests are correct against the current implementation, and it adds a benchmark for comparison with the new implementation. Change-Id: I7ac8a1f699b693ea6d11a7122b22fc70cc135af6 Reviewed-on: https://go-review.googlesource.com/c/go/+/717181 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The fixed-precision ftoa algorithm is not actually documented in the Ryū paper, and it is fairly straightforward: multiply by a power of 10 to get an integer that contains the digits we need. There is also no need for separate float32 and float64 implementations. This CL implements a new fixedFtoa, separate from Ryū. The overall algorithm is the same, but the new code is simpler, faster, and better documented. Now ftoaryu.go is only about shortest-output formatting, so if and when yet another algorithm comes along, it will be clearer what should be replaced (all of ftoaryu.go) and what should not (all of ftoafixed.go). benchmark \ host linux-arm64 local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base vs base AppendFloat/Decimal -0.18% ~ ~ -0.68% +0.49% -0.79% AppendFloat/Float +0.09% ~ +1.50% +0.84% -0.37% -0.69% AppendFloat/Exp -0.51% ~ ~ +1.20% -1.27% -1.01% AppendFloat/NegExp -1.01% ~ +3.43% +1.35% -2.33% ~ AppendFloat/LongExp -1.22% +0.77% ~ ~ -1.48% ~ AppendFloat/Big -2.07% ~ -2.07% -1.97% -2.89% -2.93% AppendFloat/BinaryExp -0.28% +1.06% ~ +1.35% -0.64% -1.64% AppendFloat/32Integer ~ ~ ~ -0.79% ~ -0.66% AppendFloat/32ExactFraction -0.50% ~ +5.69% ~ -1.24% +0.69% AppendFloat/32Point ~ -1.19% +2.59% +1.03% -1.37% +0.80% AppendFloat/32Exp -3.39% -2.79% -8.36% -0.94% -5.72% -5.92% AppendFloat/32NegExp -0.63% ~ ~ +0.98% -1.34% -0.73% AppendFloat/32Shortest -1.00% +1.36% +2.94% ~ ~ ~ AppendFloat/32Fixed8Hard -5.91% -12.45% -6.62% ~ +18.46% +11.61% AppendFloat/32Fixed9Hard -6.53% -11.35% -6.01% -0.97% -18.31% -9.16% AppendFloat/64Fixed1 -13.84% -16.90% -13.13% -10.71% -24.52% -18.94% AppendFloat/64Fixed2 -11.12% -16.97% -12.13% -9.88% -22.73% -15.48% AppendFloat/64Fixed2.5 -21.98% -20.75% -19.08% -14.74% -28.11% -24.92% AppendFloat/64Fixed3 -11.53% -16.21% -10.75% -7.53% -23.11% -15.78% AppendFloat/64Fixed4 -12.89% -12.36% -11.07% -9.79% -14.51% -13.44% AppendFloat/64Fixed5Hard -47.62% -38.59% -40.83% -37.06% -60.51% -55.29% AppendFloat/64Fixed12 -7.40% ~ -8.56% -4.31% -13.82% -8.61% AppendFloat/64Fixed16 -9.10% -8.95% -6.92% -3.92% -12.99% -9.03% AppendFloat/64Fixed12Hard -9.14% -5.24% -6.23% -4.82% -13.58% -8.99% AppendFloat/64Fixed17Hard -6.80% ~ -4.03% -2.84% -19.81% -10.27% AppendFloat/64Fixed18Hard -0.12% ~ ~ ~ ~ ~ AppendFloat/64FixedF1 ~ ~ ~ ~ -0.40% +2.72% AppendFloat/64FixedF2 -0.18% ~ -1.98% -0.95% ~ +1.25% AppendFloat/64FixedF3 -0.29% ~ ~ ~ ~ +1.22% AppendFloat/Slowpath64 -1.16% ~ ~ ~ ~ -2.16% AppendFloat/SlowpathDenormal64 -1.09% ~ ~ -0.88% -0.83% ~ host: linux-arm64 goos: linux goarch: arm64 pkg: internal/strconv cpu: unknown │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-8 60.35n ± 0% 60.24n ± 0% -0.18% (p=0.000 n=20) AppendFloat/Float-8 88.83n ± 0% 88.91n ± 0% +0.09% (p=0.000 n=20) AppendFloat/Exp-8 93.55n ± 0% 93.06n ± 0% -0.51% (p=0.000 n=20) AppendFloat/NegExp-8 94.01n ± 0% 93.06n ± 0% -1.01% (p=0.000 n=20) AppendFloat/LongExp-8 101.00n ± 0% 99.77n ± 0% -1.22% (p=0.000 n=20) AppendFloat/Big-8 106.1n ± 0% 103.9n ± 0% -2.07% (p=0.000 n=20) AppendFloat/BinaryExp-8 47.48n ± 0% 47.35n ± 0% -0.28% (p=0.000 n=20) AppendFloat/32Integer-8 60.45n ± 0% 60.43n ± 0% ~ (p=0.150 n=20) AppendFloat/32ExactFraction-8 86.65n ± 0% 86.22n ± 0% -0.50% (p=0.000 n=20) AppendFloat/32Point-8 83.26n ± 0% 83.21n ± 0% ~ (p=0.046 n=20) AppendFloat/32Exp-8 92.55n ± 0% 89.42n ± 0% -3.39% (p=0.000 n=20) AppendFloat/32NegExp-8 87.89n ± 0% 87.34n ± 0% -0.63% (p=0.000 n=20) AppendFloat/32Shortest-8 77.05n ± 0% 76.28n ± 0% -1.00% (p=0.000 n=20) AppendFloat/32Fixed8Hard-8 55.73n ± 0% 52.44n ± 0% -5.91% (p=0.000 n=20) AppendFloat/32Fixed9Hard-8 64.80n ± 0% 60.57n ± 0% -6.53% (p=0.000 n=20) AppendFloat/64Fixed1-8 53.72n ± 0% 46.29n ± 0% -13.84% (p=0.000 n=20) AppendFloat/64Fixed2-8 52.64n ± 0% 46.79n ± 0% -11.12% (p=0.000 n=20) AppendFloat/64Fixed2.5-8 56.01n ± 0% 43.70n ± 0% -21.98% (p=0.000 n=20) AppendFloat/64Fixed3-8 53.38n ± 0% 47.23n ± 0% -11.53% (p=0.000 n=20) AppendFloat/64Fixed4-8 50.62n ± 0% 44.10n ± 0% -12.89% (p=0.000 n=20) AppendFloat/64Fixed5Hard-8 98.94n ± 0% 51.82n ± 0% -47.62% (p=0.000 n=20) AppendFloat/64Fixed12-8 84.70n ± 0% 78.44n ± 0% -7.40% (p=0.000 n=20) AppendFloat/64Fixed16-8 71.68n ± 0% 65.16n ± 0% -9.10% (p=0.000 n=20) AppendFloat/64Fixed12Hard-8 68.41n ± 0% 62.16n ± 0% -9.14% (p=0.000 n=20) AppendFloat/64Fixed17Hard-8 79.31n ± 0% 73.92n ± 0% -6.80% (p=0.000 n=20) AppendFloat/64Fixed18Hard-8 4.290µ ± 0% 4.285µ ± 0% -0.12% (p=0.000 n=20) AppendFloat/64FixedF1-8 216.0n ± 0% 216.1n ± 0% ~ (p=0.090 n=20) AppendFloat/64FixedF2-8 228.2n ± 0% 227.8n ± 0% -0.18% (p=0.000 n=20) AppendFloat/64FixedF3-8 208.8n ± 0% 208.2n ± 0% -0.29% (p=0.000 n=20) AppendFloat/Slowpath64-8 98.56n ± 0% 97.42n ± 0% -1.16% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-8 95.81n ± 0% 94.77n ± 0% -1.09% (p=0.000 n=20) geomean 93.81n 87.87n -6.33% host: local goos: darwin cpu: Apple M3 Pro │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-12 21.14n ± 0% 21.15n ± 0% ~ (p=0.963 n=20) AppendFloat/Float-12 32.48n ± 1% 32.43n ± 0% ~ (p=0.358 n=20) AppendFloat/Exp-12 31.85n ± 0% 31.94n ± 1% ~ (p=0.634 n=20) AppendFloat/NegExp-12 31.75n ± 0% 32.04n ± 0% ~ (p=0.004 n=20) AppendFloat/LongExp-12 33.55n ± 0% 33.81n ± 0% +0.77% (p=0.000 n=20) AppendFloat/Big-12 35.62n ± 1% 35.73n ± 1% ~ (p=0.888 n=20) AppendFloat/BinaryExp-12 19.26n ± 0% 19.46n ± 1% +1.06% (p=0.000 n=20) AppendFloat/32Integer-12 21.41n ± 0% 21.46n ± 1% ~ (p=0.733 n=20) AppendFloat/32ExactFraction-12 31.23n ± 1% 31.30n ± 1% ~ (p=0.857 n=20) AppendFloat/32Point-12 31.39n ± 1% 31.02n ± 0% -1.19% (p=0.000 n=20) AppendFloat/32Exp-12 32.42n ± 1% 31.52n ± 1% -2.79% (p=0.000 n=20) AppendFloat/32NegExp-12 30.66n ± 1% 30.66n ± 1% ~ (p=0.380 n=20) AppendFloat/32Shortest-12 26.88n ± 1% 27.25n ± 1% +1.36% (p=0.000 n=20) AppendFloat/32Fixed8Hard-12 19.52n ± 0% 17.09n ± 1% -12.45% (p=0.000 n=20) AppendFloat/32Fixed9Hard-12 21.55n ± 2% 19.11n ± 1% -11.35% (p=0.000 n=20) AppendFloat/64Fixed1-12 18.64n ± 0% 15.49n ± 0% -16.90% (p=0.000 n=20) AppendFloat/64Fixed2-12 18.65n ± 0% 15.49n ± 0% -16.97% (p=0.000 n=20) AppendFloat/64Fixed2.5-12 19.23n ± 1% 15.24n ± 0% -20.75% (p=0.000 n=20) AppendFloat/64Fixed3-12 18.61n ± 0% 15.59n ± 1% -16.21% (p=0.000 n=20) AppendFloat/64Fixed4-12 17.55n ± 1% 15.38n ± 0% -12.36% (p=0.000 n=20) AppendFloat/64Fixed5Hard-12 29.27n ± 1% 17.97n ± 0% -38.59% (p=0.000 n=20) AppendFloat/64Fixed12-12 28.26n ± 1% 28.17n ± 10% ~ (p=0.941 n=20) AppendFloat/64Fixed16-12 23.56n ± 0% 21.46n ± 0% -8.95% (p=0.000 n=20) AppendFloat/64Fixed12Hard-12 21.85n ± 2% 20.70n ± 1% -5.24% (p=0.000 n=20) AppendFloat/64Fixed17Hard-12 26.91n ± 1% 27.10n ± 0% ~ (p=0.059 n=20) AppendFloat/64Fixed18Hard-12 2.197µ ± 1% 2.169µ ± 1% ~ (p=0.013 n=20) AppendFloat/64FixedF1-12 103.7n ± 1% 103.3n ± 0% ~ (p=0.035 n=20) AppendFloat/64FixedF2-12 114.8n ± 1% 114.1n ± 1% ~ (p=0.234 n=20) AppendFloat/64FixedF3-12 107.8n ± 1% 107.1n ± 1% ~ (p=0.180 n=20) AppendFloat/Slowpath64-12 32.05n ± 1% 32.00n ± 0% ~ (p=0.952 n=20) AppendFloat/SlowpathDenormal64-12 29.98n ± 1% 30.20n ± 0% ~ (p=0.004 n=20) geomean 33.83n 31.91n -5.68% host: linux-amd64 goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-16 64.00n ± 1% 63.67n ± 1% ~ (p=0.784 n=20) AppendFloat/Float-16 95.99n ± 1% 97.42n ± 1% +1.50% (p=0.000 n=20) AppendFloat/Exp-16 97.59n ± 1% 97.72n ± 1% ~ (p=0.984 n=20) AppendFloat/NegExp-16 97.80n ± 1% 101.15n ± 1% +3.43% (p=0.000 n=20) AppendFloat/LongExp-16 103.1n ± 1% 104.5n ± 1% ~ (p=0.006 n=20) AppendFloat/Big-16 110.8n ± 1% 108.5n ± 1% -2.07% (p=0.000 n=20) AppendFloat/BinaryExp-16 47.82n ± 1% 47.33n ± 1% ~ (p=0.007 n=20) AppendFloat/32Integer-16 63.65n ± 1% 63.51n ± 0% ~ (p=0.560 n=20) AppendFloat/32ExactFraction-16 91.81n ± 1% 97.03n ± 1% +5.69% (p=0.000 n=20) AppendFloat/32Point-16 89.84n ± 1% 92.16n ± 1% +2.59% (p=0.000 n=20) AppendFloat/32Exp-16 103.80n ± 1% 95.12n ± 1% -8.36% (p=0.000 n=20) AppendFloat/32NegExp-16 93.70n ± 1% 94.87n ± 1% ~ (p=0.003 n=20) AppendFloat/32Shortest-16 83.98n ± 1% 86.45n ± 1% +2.94% (p=0.000 n=20) AppendFloat/32Fixed8Hard-16 61.91n ± 1% 57.81n ± 1% -6.62% (p=0.000 n=20) AppendFloat/32Fixed9Hard-16 71.08n ± 0% 66.81n ± 1% -6.01% (p=0.000 n=20) AppendFloat/64Fixed1-16 59.27n ± 2% 51.49n ± 1% -13.13% (p=0.000 n=20) AppendFloat/64Fixed2-16 57.89n ± 1% 50.87n ± 1% -12.13% (p=0.000 n=20) AppendFloat/64Fixed2.5-16 61.04n ± 1% 49.40n ± 1% -19.08% (p=0.000 n=20) AppendFloat/64Fixed3-16 58.42n ± 1% 52.14n ± 1% -10.75% (p=0.000 n=20) AppendFloat/64Fixed4-16 56.52n ± 1% 50.27n ± 1% -11.07% (p=0.000 n=20) AppendFloat/64Fixed5Hard-16 97.79n ± 1% 57.86n ± 1% -40.83% (p=0.000 n=20) AppendFloat/64Fixed12-16 90.78n ± 1% 83.01n ± 1% -8.56% (p=0.000 n=20) AppendFloat/64Fixed16-16 76.11n ± 1% 70.84n ± 0% -6.92% (p=0.000 n=20) AppendFloat/64Fixed12Hard-16 73.56n ± 1% 68.98n ± 2% -6.23% (p=0.000 n=20) AppendFloat/64Fixed17Hard-16 83.20n ± 1% 79.85n ± 1% -4.03% (p=0.000 n=20) AppendFloat/64Fixed18Hard-16 4.947µ ± 1% 4.915µ ± 1% ~ (p=0.229 n=20) AppendFloat/64FixedF1-16 242.4n ± 1% 239.4n ± 1% ~ (p=0.038 n=20) AppendFloat/64FixedF2-16 257.7n ± 2% 252.6n ± 1% -1.98% (p=0.000 n=20) AppendFloat/64FixedF3-16 237.5n ± 0% 237.5n ± 1% ~ (p=0.440 n=20) AppendFloat/Slowpath64-16 99.75n ± 1% 99.78n ± 1% ~ (p=0.995 n=20) AppendFloat/SlowpathDenormal64-16 97.41n ± 1% 98.20n ± 1% ~ (p=0.006 n=20) geomean 100.7n 95.60n -5.05% host: s7 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-32 22.19n ± 0% 22.04n ± 0% -0.68% (p=0.000 n=20) AppendFloat/Float-32 34.59n ± 0% 34.88n ± 0% +0.84% (p=0.000 n=20) AppendFloat/Exp-32 34.47n ± 0% 34.88n ± 0% +1.20% (p=0.000 n=20) AppendFloat/NegExp-32 34.85n ± 0% 35.32n ± 0% +1.35% (p=0.000 n=20) AppendFloat/LongExp-32 37.23n ± 0% 37.09n ± 0% ~ (p=0.003 n=20) AppendFloat/Big-32 39.27n ± 0% 38.50n ± 0% -1.97% (p=0.000 n=20) AppendFloat/BinaryExp-32 17.38n ± 0% 17.61n ± 0% +1.35% (p=0.000 n=20) AppendFloat/32Integer-32 22.26n ± 0% 22.08n ± 0% -0.79% (p=0.000 n=20) AppendFloat/32ExactFraction-32 32.82n ± 0% 32.91n ± 0% ~ (p=0.018 n=20) AppendFloat/32Point-32 32.88n ± 0% 33.22n ± 0% +1.03% (p=0.000 n=20) AppendFloat/32Exp-32 34.95n ± 0% 34.62n ± 0% -0.94% (p=0.000 n=20) AppendFloat/32NegExp-32 33.23n ± 0% 33.55n ± 0% +0.98% (p=0.000 n=20) AppendFloat/32Shortest-32 30.19n ± 0% 30.12n ± 0% ~ (p=0.122 n=20) AppendFloat/32Fixed8Hard-32 22.94n ± 0% 22.88n ± 0% ~ (p=0.124 n=20) AppendFloat/32Fixed9Hard-32 26.20n ± 0% 25.94n ± 1% -0.97% (p=0.000 n=20) AppendFloat/64Fixed1-32 21.10n ± 0% 18.84n ± 0% -10.71% (p=0.000 n=20) AppendFloat/64Fixed2-32 20.75n ± 0% 18.70n ± 0% -9.88% (p=0.000 n=20) AppendFloat/64Fixed2.5-32 21.07n ± 0% 17.96n ± 0% -14.74% (p=0.000 n=20) AppendFloat/64Fixed3-32 21.24n ± 0% 19.64n ± 0% -7.53% (p=0.000 n=20) AppendFloat/64Fixed4-32 20.63n ± 0% 18.61n ± 0% -9.79% (p=0.000 n=20) AppendFloat/64Fixed5Hard-32 34.48n ± 0% 21.70n ± 0% -37.06% (p=0.000 n=20) AppendFloat/64Fixed12-32 32.26n ± 0% 30.87n ± 1% -4.31% (p=0.000 n=20) AppendFloat/64Fixed16-32 27.95n ± 0% 26.86n ± 0% -3.92% (p=0.000 n=20) AppendFloat/64Fixed12Hard-32 27.30n ± 0% 25.98n ± 1% -4.82% (p=0.000 n=20) AppendFloat/64Fixed17Hard-32 30.80n ± 0% 29.93n ± 0% -2.84% (p=0.000 n=20) AppendFloat/64Fixed18Hard-32 1.833µ ± 0% 1.831µ ± 0% ~ (p=0.663 n=20) AppendFloat/64FixedF1-32 83.42n ± 1% 84.00n ± 1% ~ (p=0.003 n=20) AppendFloat/64FixedF2-32 90.10n ± 0% 89.23n ± 1% -0.95% (p=0.001 n=20) AppendFloat/64FixedF3-32 84.42n ± 1% 84.39n ± 0% ~ (p=0.878 n=20) AppendFloat/Slowpath64-32 35.72n ± 0% 35.59n ± 0% ~ (p=0.007 n=20) AppendFloat/SlowpathDenormal64-32 35.36n ± 0% 35.05n ± 0% -0.88% (p=0.000 n=20) geomean 36.05n 34.69n -3.77% host: linux-386 goarch: 386 cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-16 132.8n ± 0% 133.5n ± 0% +0.49% (p=0.001 n=20) AppendFloat/Float-16 242.6n ± 0% 241.7n ± 0% -0.37% (p=0.000 n=20) AppendFloat/Exp-16 252.2n ± 0% 249.1n ± 0% -1.27% (p=0.000 n=20) AppendFloat/NegExp-16 253.6n ± 0% 247.7n ± 0% -2.33% (p=0.000 n=20) AppendFloat/LongExp-16 260.9n ± 0% 257.1n ± 0% -1.48% (p=0.000 n=20) AppendFloat/Big-16 293.7n ± 0% 285.2n ± 0% -2.89% (p=0.000 n=20) AppendFloat/BinaryExp-16 89.63n ± 1% 89.06n ± 0% -0.64% (p=0.000 n=20) AppendFloat/32Integer-16 132.6n ± 0% 133.2n ± 0% ~ (p=0.016 n=20) AppendFloat/32ExactFraction-16 216.9n ± 0% 214.2n ± 0% -1.24% (p=0.000 n=20) AppendFloat/32Point-16 205.0n ± 0% 202.2n ± 0% -1.37% (p=0.000 n=20) AppendFloat/32Exp-16 250.2n ± 0% 235.9n ± 0% -5.72% (p=0.000 n=20) AppendFloat/32NegExp-16 213.5n ± 0% 210.6n ± 0% -1.34% (p=0.000 n=20) AppendFloat/32Shortest-16 198.3n ± 0% 197.8n ± 0% ~ (p=0.147 n=20) AppendFloat/32Fixed8Hard-16 114.9n ± 1% 136.0n ± 1% +18.46% (p=0.000 n=20) AppendFloat/32Fixed9Hard-16 189.8n ± 0% 155.0n ± 1% -18.31% (p=0.000 n=20) AppendFloat/64Fixed1-16 175.8n ± 0% 132.7n ± 0% -24.52% (p=0.000 n=20) AppendFloat/64Fixed2-16 166.6n ± 0% 128.7n ± 0% -22.73% (p=0.000 n=20) AppendFloat/64Fixed2.5-16 176.5n ± 0% 126.8n ± 0% -28.11% (p=0.000 n=20) AppendFloat/64Fixed3-16 165.3n ± 0% 127.1n ± 0% -23.11% (p=0.000 n=20) AppendFloat/64Fixed4-16 141.3n ± 0% 120.8n ± 1% -14.51% (p=0.000 n=20) AppendFloat/64Fixed5Hard-16 344.6n ± 0% 136.0n ± 0% -60.51% (p=0.000 n=20) AppendFloat/64Fixed12-16 184.2n ± 0% 158.7n ± 0% -13.82% (p=0.000 n=20) AppendFloat/64Fixed16-16 174.0n ± 0% 151.3n ± 0% -12.99% (p=0.000 n=20) AppendFloat/64Fixed12Hard-16 169.7n ± 0% 146.7n ± 0% -13.58% (p=0.000 n=20) AppendFloat/64Fixed17Hard-16 207.7n ± 0% 166.6n ± 0% -19.81% (p=0.000 n=20) AppendFloat/64Fixed18Hard-16 10.66µ ± 0% 10.63µ ± 0% ~ (p=0.030 n=20) AppendFloat/64FixedF1-16 615.9n ± 0% 613.5n ± 0% -0.40% (p=0.000 n=20) AppendFloat/64FixedF2-16 846.6n ± 0% 847.4n ± 0% ~ (p=0.551 n=20) AppendFloat/64FixedF3-16 609.9n ± 0% 609.5n ± 0% ~ (p=0.213 n=20) AppendFloat/Slowpath64-16 254.1n ± 0% 252.6n ± 1% ~ (p=0.048 n=20) AppendFloat/SlowpathDenormal64-16 251.5n ± 0% 249.4n ± 0% -0.83% (p=0.000 n=20) geomean 249.2n 225.4n -9.54% host: s7:GOARCH=386 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 14b7e09f493 │ f9bf7fcb8e2 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-32 42.65n ± 0% 42.31n ± 0% -0.79% (p=0.000 n=20) AppendFloat/Float-32 71.56n ± 0% 71.06n ± 0% -0.69% (p=0.000 n=20) AppendFloat/Exp-32 75.61n ± 1% 74.85n ± 1% -1.01% (p=0.000 n=20) AppendFloat/NegExp-32 74.36n ± 0% 74.30n ± 0% ~ (p=0.482 n=20) AppendFloat/LongExp-32 75.82n ± 0% 75.73n ± 0% ~ (p=0.490 n=20) AppendFloat/Big-32 85.10n ± 0% 82.61n ± 0% -2.93% (p=0.000 n=20) AppendFloat/BinaryExp-32 33.02n ± 0% 32.48n ± 1% -1.64% (p=0.000 n=20) AppendFloat/32Integer-32 41.54n ± 1% 41.27n ± 1% -0.66% (p=0.000 n=20) AppendFloat/32ExactFraction-32 62.48n ± 0% 62.91n ± 0% +0.69% (p=0.000 n=20) AppendFloat/32Point-32 60.17n ± 0% 60.65n ± 0% +0.80% (p=0.000 n=20) AppendFloat/32Exp-32 73.34n ± 0% 68.99n ± 0% -5.92% (p=0.000 n=20) AppendFloat/32NegExp-32 63.29n ± 0% 62.83n ± 0% -0.73% (p=0.000 n=20) AppendFloat/32Shortest-32 58.97n ± 0% 59.07n ± 0% ~ (p=0.029 n=20) AppendFloat/32Fixed8Hard-32 37.42n ± 0% 41.76n ± 1% +11.61% (p=0.000 n=20) AppendFloat/32Fixed9Hard-32 55.18n ± 0% 50.13n ± 1% -9.16% (p=0.000 n=20) AppendFloat/64Fixed1-32 50.89n ± 1% 41.25n ± 0% -18.94% (p=0.000 n=20) AppendFloat/64Fixed2-32 48.33n ± 1% 40.85n ± 1% -15.48% (p=0.000 n=20) AppendFloat/64Fixed2.5-32 52.46n ± 0% 39.39n ± 0% -24.92% (p=0.000 n=20) AppendFloat/64Fixed3-32 48.28n ± 1% 40.66n ± 0% -15.78% (p=0.000 n=20) AppendFloat/64Fixed4-32 44.57n ± 0% 38.58n ± 0% -13.44% (p=0.000 n=20) AppendFloat/64Fixed5Hard-32 96.16n ± 0% 42.99n ± 1% -55.29% (p=0.000 n=20) AppendFloat/64Fixed12-32 56.84n ± 0% 51.95n ± 1% -8.61% (p=0.000 n=20) AppendFloat/64Fixed16-32 54.23n ± 0% 49.33n ± 0% -9.03% (p=0.000 n=20) AppendFloat/64Fixed12Hard-32 53.47n ± 0% 48.67n ± 0% -8.99% (p=0.000 n=20) AppendFloat/64Fixed17Hard-32 61.76n ± 0% 55.42n ± 1% -10.27% (p=0.000 n=20) AppendFloat/64Fixed18Hard-32 3.998µ ± 1% 4.001µ ± 0% ~ (p=0.449 n=20) AppendFloat/64FixedF1-32 161.8n ± 0% 166.2n ± 1% +2.72% (p=0.000 n=20) AppendFloat/64FixedF2-32 223.4n ± 2% 226.2n ± 1% +1.25% (p=0.000 n=20) AppendFloat/64FixedF3-32 159.6n ± 0% 161.6n ± 1% +1.22% (p=0.000 n=20) AppendFloat/Slowpath64-32 76.69n ± 0% 75.03n ± 0% -2.16% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-32 75.02n ± 0% 74.36n ± 1% ~ (p=0.003 n=20) geomean 74.66n 69.39n -7.06% Change-Id: I9db46471a93bd2aab3c2796e563d154cb531d4cb Reviewed-on: https://go-review.googlesource.com/c/go/+/717182 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
We re-slice the data being processed at the stat of each loop. If the var that we use to calculate where to re-slice is < 0 or > the length of the remaining data, return instead of attempting to re-slice. Change-Id: I1d6c2b6c596feedeea8feeaace370ea73ba02c4c Reviewed-on: https://go-review.googlesource.com/c/go/+/715260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
This lets us remove useAvg and useHmul from the division rules. The compiler is simpler and the generated code is faster. goos: wasip1 goarch: wasm pkg: internal/strconv │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal 192.8n ± 1% 194.6n ± 0% +0.91% (p=0.000 n=10) AppendFloat/Float 328.6n ± 0% 279.6n ± 0% -14.93% (p=0.000 n=10) AppendFloat/Exp 335.6n ± 1% 289.2n ± 1% -13.80% (p=0.000 n=10) AppendFloat/NegExp 336.0n ± 0% 289.1n ± 1% -13.97% (p=0.000 n=10) AppendFloat/LongExp 332.4n ± 0% 285.2n ± 1% -14.20% (p=0.000 n=10) AppendFloat/Big 348.2n ± 0% 300.1n ± 0% -13.83% (p=0.000 n=10) AppendFloat/BinaryExp 137.4n ± 0% 138.2n ± 0% +0.55% (p=0.001 n=10) AppendFloat/32Integer 193.3n ± 1% 196.5n ± 0% +1.66% (p=0.000 n=10) AppendFloat/32ExactFraction 283.3n ± 0% 268.9n ± 1% -5.08% (p=0.000 n=10) AppendFloat/32Point 279.9n ± 0% 266.5n ± 0% -4.80% (p=0.000 n=10) AppendFloat/32Exp 300.1n ± 0% 288.3n ± 1% -3.90% (p=0.000 n=10) AppendFloat/32NegExp 288.2n ± 1% 277.9n ± 1% -3.59% (p=0.000 n=10) AppendFloat/32Shortest 261.7n ± 0% 250.2n ± 0% -4.39% (p=0.000 n=10) AppendFloat/32Fixed8Hard 173.3n ± 1% 158.9n ± 1% -8.31% (p=0.000 n=10) AppendFloat/32Fixed9Hard 180.0n ± 0% 167.9n ± 2% -6.70% (p=0.000 n=10) AppendFloat/64Fixed1 167.1n ± 0% 149.6n ± 1% -10.50% (p=0.000 n=10) AppendFloat/64Fixed2 162.4n ± 1% 146.5n ± 0% -9.73% (p=0.000 n=10) AppendFloat/64Fixed2.5 165.5n ± 0% 149.4n ± 1% -9.70% (p=0.000 n=10) AppendFloat/64Fixed3 166.4n ± 1% 150.2n ± 0% -9.74% (p=0.000 n=10) AppendFloat/64Fixed4 163.7n ± 0% 149.6n ± 1% -8.62% (p=0.000 n=10) AppendFloat/64Fixed5Hard 182.8n ± 1% 167.1n ± 1% -8.61% (p=0.000 n=10) AppendFloat/64Fixed12 222.2n ± 0% 208.8n ± 0% -6.05% (p=0.000 n=10) AppendFloat/64Fixed16 197.6n ± 1% 181.7n ± 0% -8.02% (p=0.000 n=10) AppendFloat/64Fixed12Hard 194.5n ± 0% 181.0n ± 0% -6.99% (p=0.000 n=10) AppendFloat/64Fixed17Hard 205.1n ± 1% 191.9n ± 0% -6.44% (p=0.000 n=10) AppendFloat/64Fixed18Hard 6.269µ ± 0% 6.643µ ± 0% +5.97% (p=0.000 n=10) AppendFloat/64FixedF1 211.7n ± 1% 197.0n ± 0% -6.95% (p=0.000 n=10) AppendFloat/64FixedF2 189.4n ± 0% 174.2n ± 0% -8.08% (p=0.000 n=10) AppendFloat/64FixedF3 169.0n ± 0% 154.9n ± 0% -8.32% (p=0.000 n=10) AppendFloat/Slowpath64 321.2n ± 0% 274.2n ± 1% -14.63% (p=0.000 n=10) AppendFloat/SlowpathDenormal64 307.4n ± 1% 261.2n ± 0% -15.03% (p=0.000 n=10) AppendInt 3.367µ ± 1% 3.376µ ± 0% ~ (p=0.517 n=10) AppendUint 675.5n ± 0% 676.9n ± 0% ~ (p=0.196 n=10) AppendIntSmall 28.13n ± 1% 28.17n ± 0% +0.14% (p=0.015 n=10) AppendUintVarlen/digits=1 20.70n ± 0% 20.51n ± 1% -0.89% (p=0.018 n=10) AppendUintVarlen/digits=2 20.43n ± 0% 20.27n ± 0% -0.81% (p=0.001 n=10) AppendUintVarlen/digits=3 38.48n ± 0% 37.93n ± 0% -1.43% (p=0.000 n=10) AppendUintVarlen/digits=4 41.10n ± 0% 38.78n ± 1% -5.62% (p=0.000 n=10) AppendUintVarlen/digits=5 42.25n ± 1% 42.11n ± 0% -0.32% (p=0.041 n=10) AppendUintVarlen/digits=6 45.40n ± 1% 43.14n ± 0% -4.98% (p=0.000 n=10) AppendUintVarlen/digits=7 46.81n ± 1% 46.03n ± 0% -1.66% (p=0.000 n=10) AppendUintVarlen/digits=8 48.88n ± 1% 46.59n ± 1% -4.68% (p=0.000 n=10) AppendUintVarlen/digits=9 49.94n ± 2% 49.41n ± 1% -1.06% (p=0.000 n=10) AppendUintVarlen/digits=10 57.28n ± 1% 56.92n ± 1% -0.62% (p=0.045 n=10) AppendUintVarlen/digits=11 60.09n ± 1% 58.11n ± 2% -3.30% (p=0.000 n=10) AppendUintVarlen/digits=12 62.22n ± 0% 61.85n ± 0% -0.59% (p=0.000 n=10) AppendUintVarlen/digits=13 64.94n ± 0% 62.92n ± 0% -3.10% (p=0.000 n=10) AppendUintVarlen/digits=14 65.42n ± 1% 65.19n ± 1% -0.34% (p=0.005 n=10) AppendUintVarlen/digits=15 68.17n ± 0% 66.13n ± 0% -2.99% (p=0.000 n=10) AppendUintVarlen/digits=16 70.21n ± 1% 70.09n ± 1% ~ (p=0.517 n=10) AppendUintVarlen/digits=17 72.93n ± 0% 70.49n ± 0% -3.34% (p=0.000 n=10) AppendUintVarlen/digits=18 73.01n ± 0% 72.75n ± 0% -0.35% (p=0.000 n=10) AppendUintVarlen/digits=19 79.27n ± 1% 79.49n ± 1% ~ (p=0.671 n=10) AppendUintVarlen/digits=20 82.18n ± 0% 80.43n ± 1% -2.14% (p=0.000 n=10) geomean 143.4n 136.0n -5.20% Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267 Reviewed-on: https://go-review.googlesource.com/c/go/+/717407 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Everyone writes papers about fast shortest-output formatting. Eventually we also sped up fixed-length formatting %e and %g. But we've neglected %f, which falls back to the slow general code even for relatively trivial things like %.2f on 1.23. This CL uses the fast path fixedFtoa for %f when possible by estimating the number of digits needed. benchmark \ host linux-arm64 local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base vs base AppendFloat/Decimal ~ ~ ~ +0.30% ~ ~ AppendFloat/Float -0.45% ~ -2.20% ~ -2.19% ~ AppendFloat/Exp +0.12% ~ +4.11% ~ ~ ~ AppendFloat/NegExp +0.53% ~ ~ ~ ~ ~ AppendFloat/LongExp +0.41% -1.42% +4.50% ~ ~ ~ AppendFloat/Big ~ -1.25% +3.69% ~ ~ ~ AppendFloat/BinaryExp +0.38% +1.68% ~ ~ +2.65% +0.97% AppendFloat/32Integer ~ ~ ~ ~ ~ ~ AppendFloat/32ExactFraction ~ ~ -2.61% ~ ~ ~ AppendFloat/32Point -0.41% ~ -2.65% ~ ~ ~ AppendFloat/32Exp ~ ~ +5.35% ~ +1.44% +0.39% AppendFloat/32NegExp +0.30% ~ +2.31% ~ ~ +0.82% AppendFloat/32Shortest +0.28% -0.85% ~ ~ -3.20% ~ AppendFloat/32Fixed8Hard -0.29% ~ ~ -1.75% ~ +4.30% AppendFloat/32Fixed9Hard ~ ~ ~ ~ ~ +1.52% AppendFloat/64Fixed1 +0.61% -2.03% ~ ~ ~ +4.36% AppendFloat/64Fixed2 ~ -3.43% ~ ~ ~ +1.03% AppendFloat/64Fixed2.5 +0.57% -2.23% ~ ~ ~ +2.66% AppendFloat/64Fixed3 ~ -1.64% ~ +0.31% +2.32% +2.10% AppendFloat/64Fixed4 +0.15% -2.11% ~ ~ +1.48% +1.58% AppendFloat/64Fixed5Hard +0.45% ~ +1.58% ~ ~ +1.73% AppendFloat/64Fixed12 -0.16% ~ +1.63% -1.23% +3.93% +2.42% AppendFloat/64Fixed16 -0.33% -0.49% ~ ~ +3.67% +2.33% AppendFloat/64Fixed12Hard -0.58% ~ ~ ~ +4.98% +0.62% AppendFloat/64Fixed17Hard +0.27% -0.94% ~ ~ +2.07% +1.79% AppendFloat/64Fixed18Hard ~ ~ ~ ~ ~ ~ AppendFloat/64FixedF1 -69.59% -76.08% -70.94% -68.26% -75.27% -69.88% AppendFloat/64FixedF2 -76.28% -81.82% -76.95% -77.34% -83.53% -80.04% AppendFloat/64FixedF3 -77.30% -84.51% -77.82% -77.81% -78.77% -73.69% AppendFloat/Slowpath64 ~ -1.30% +1.64% ~ -2.66% -0.44% AppendFloat/SlowpathDenormal64 +0.11% -1.69% ~ ~ -2.90% ~ host: linux-arm64 goos: linux goarch: arm64 pkg: internal/strconv cpu: unknown │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-8 60.22n ± 0% 60.21n ± 0% ~ (p=0.416 n=20) AppendFloat/Float-8 88.93n ± 0% 88.53n ± 0% -0.45% (p=0.000 n=20) AppendFloat/Exp-8 93.09n ± 0% 93.20n ± 0% +0.12% (p=0.000 n=20) AppendFloat/NegExp-8 93.06n ± 0% 93.56n ± 0% +0.53% (p=0.000 n=20) AppendFloat/LongExp-8 99.79n ± 0% 100.20n ± 0% +0.41% (p=0.000 n=20) AppendFloat/Big-8 103.9n ± 0% 104.0n ± 0% ~ (p=0.004 n=20) AppendFloat/BinaryExp-8 47.34n ± 0% 47.52n ± 0% +0.38% (p=0.000 n=20) AppendFloat/32Integer-8 60.43n ± 0% 60.40n ± 0% ~ (p=0.006 n=20) AppendFloat/32ExactFraction-8 86.21n ± 0% 86.24n ± 0% ~ (p=0.634 n=20) AppendFloat/32Point-8 83.20n ± 0% 82.87n ± 0% -0.41% (p=0.000 n=20) AppendFloat/32Exp-8 89.43n ± 0% 89.45n ± 0% ~ (p=0.193 n=20) AppendFloat/32NegExp-8 87.31n ± 0% 87.58n ± 0% +0.30% (p=0.000 n=20) AppendFloat/32Shortest-8 76.28n ± 0% 76.49n ± 0% +0.28% (p=0.000 n=20) AppendFloat/32Fixed8Hard-8 52.44n ± 0% 52.29n ± 0% -0.29% (p=0.000 n=20) AppendFloat/32Fixed9Hard-8 60.57n ± 0% 60.54n ± 0% ~ (p=0.285 n=20) AppendFloat/64Fixed1-8 46.27n ± 0% 46.55n ± 0% +0.61% (p=0.000 n=20) AppendFloat/64Fixed2-8 46.77n ± 0% 46.80n ± 0% ~ (p=0.060 n=20) AppendFloat/64Fixed2.5-8 43.70n ± 0% 43.95n ± 0% +0.57% (p=0.000 n=20) AppendFloat/64Fixed3-8 47.22n ± 0% 47.19n ± 0% ~ (p=0.008 n=20) AppendFloat/64Fixed4-8 44.07n ± 0% 44.13n ± 0% +0.15% (p=0.000 n=20) AppendFloat/64Fixed5Hard-8 51.81n ± 0% 52.04n ± 0% +0.45% (p=0.000 n=20) AppendFloat/64Fixed12-8 78.41n ± 0% 78.29n ± 0% -0.16% (p=0.000 n=20) AppendFloat/64Fixed16-8 65.14n ± 0% 64.93n ± 0% -0.33% (p=0.000 n=20) AppendFloat/64Fixed12Hard-8 62.12n ± 0% 61.76n ± 0% -0.58% (p=0.000 n=20) AppendFloat/64Fixed17Hard-8 73.93n ± 0% 74.13n ± 0% +0.27% (p=0.000 n=20) AppendFloat/64Fixed18Hard-8 4.285µ ± 0% 4.283µ ± 0% ~ (p=0.039 n=20) AppendFloat/64FixedF1-8 216.10n ± 0% 65.71n ± 0% -69.59% (p=0.000 n=20) AppendFloat/64FixedF2-8 227.70n ± 0% 54.02n ± 0% -76.28% (p=0.000 n=20) AppendFloat/64FixedF3-8 208.20n ± 1% 47.25n ± 0% -77.30% (p=0.000 n=20) AppendFloat/Slowpath64-8 97.40n ± 0% 97.45n ± 0% ~ (p=0.018 n=20) AppendFloat/SlowpathDenormal64-8 94.75n ± 0% 94.86n ± 0% +0.11% (p=0.000 n=20) geomean 87.86n 76.99n -12.37% host: local goos: darwin cpu: Apple M3 Pro │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-12 21.05n ± 1% 20.91n ± 1% ~ (p=0.051 n=20) AppendFloat/Float-12 32.13n ± 0% 32.04n ± 1% ~ (p=0.457 n=20) AppendFloat/Exp-12 31.84n ± 0% 31.72n ± 0% ~ (p=0.151 n=20) AppendFloat/NegExp-12 31.78n ± 1% 31.79n ± 1% ~ (p=0.867 n=20) AppendFloat/LongExp-12 33.70n ± 0% 33.22n ± 1% -1.42% (p=0.000 n=20) AppendFloat/Big-12 35.52n ± 1% 35.07n ± 1% -1.25% (p=0.000 n=20) AppendFloat/BinaryExp-12 19.32n ± 1% 19.64n ± 0% +1.68% (p=0.000 n=20) AppendFloat/32Integer-12 21.32n ± 0% 21.18n ± 1% ~ (p=0.025 n=20) AppendFloat/32ExactFraction-12 30.88n ± 0% 31.07n ± 0% ~ (p=0.087 n=20) AppendFloat/32Point-12 30.88n ± 0% 30.95n ± 1% ~ (p=0.250 n=20) AppendFloat/32Exp-12 31.57n ± 0% 31.67n ± 2% ~ (p=0.126 n=20) AppendFloat/32NegExp-12 30.50n ± 1% 30.76n ± 1% ~ (p=0.087 n=20) AppendFloat/32Shortest-12 27.14n ± 0% 26.91n ± 1% -0.85% (p=0.001 n=20) AppendFloat/32Fixed8Hard-12 17.11n ± 0% 17.08n ± 0% ~ (p=0.027 n=20) AppendFloat/32Fixed9Hard-12 19.16n ± 1% 19.31n ± 1% ~ (p=0.062 n=20) AppendFloat/64Fixed1-12 15.50n ± 0% 15.18n ± 1% -2.03% (p=0.000 n=20) AppendFloat/64Fixed2-12 15.46n ± 0% 14.93n ± 0% -3.43% (p=0.000 n=20) AppendFloat/64Fixed2.5-12 15.28n ± 0% 14.94n ± 1% -2.23% (p=0.000 n=20) AppendFloat/64Fixed3-12 15.58n ± 0% 15.32n ± 1% -1.64% (p=0.000 n=20) AppendFloat/64Fixed4-12 15.39n ± 0% 15.06n ± 1% -2.11% (p=0.000 n=20) AppendFloat/64Fixed5Hard-12 18.00n ± 0% 18.07n ± 1% ~ (p=0.011 n=20) AppendFloat/64Fixed12-12 27.97n ± 8% 29.05n ± 3% ~ (p=0.107 n=20) AppendFloat/64Fixed16-12 21.48n ± 0% 21.38n ± 0% -0.49% (p=0.000 n=20) AppendFloat/64Fixed12Hard-12 20.79n ± 1% 21.05n ± 2% ~ (p=0.784 n=20) AppendFloat/64Fixed17Hard-12 27.21n ± 1% 26.95n ± 1% -0.94% (p=0.000 n=20) AppendFloat/64Fixed18Hard-12 2.166µ ± 1% 2.182µ ± 1% ~ (p=0.031 n=20) AppendFloat/64FixedF1-12 103.35n ± 0% 24.72n ± 0% -76.08% (p=0.000 n=20) AppendFloat/64FixedF2-12 114.30n ± 1% 20.78n ± 0% -81.82% (p=0.000 n=20) AppendFloat/64FixedF3-12 107.10n ± 0% 16.58n ± 0% -84.51% (p=0.000 n=20) AppendFloat/Slowpath64-12 32.01n ± 0% 31.59n ± 0% -1.30% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-12 30.21n ± 0% 29.70n ± 0% -1.69% (p=0.000 n=20) geomean 31.84n 27.00n -15.20% host: linux-amd64 goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-16 63.62n ± 1% 64.05n ± 1% ~ (p=0.753 n=20) AppendFloat/Float-16 97.12n ± 1% 94.98n ± 1% -2.20% (p=0.000 n=20) AppendFloat/Exp-16 98.12n ± 1% 102.15n ± 1% +4.11% (p=0.000 n=20) AppendFloat/NegExp-16 101.1n ± 1% 101.5n ± 1% ~ (p=0.089 n=20) AppendFloat/LongExp-16 104.5n ± 1% 109.2n ± 1% +4.50% (p=0.000 n=20) AppendFloat/Big-16 108.5n ± 0% 112.5n ± 1% +3.69% (p=0.000 n=20) AppendFloat/BinaryExp-16 47.68n ± 1% 47.44n ± 1% ~ (p=0.143 n=20) AppendFloat/32Integer-16 63.77n ± 2% 63.45n ± 1% ~ (p=0.015 n=20) AppendFloat/32ExactFraction-16 97.69n ± 1% 95.14n ± 1% -2.61% (p=0.000 n=20) AppendFloat/32Point-16 92.17n ± 1% 89.72n ± 1% -2.65% (p=0.000 n=20) AppendFloat/32Exp-16 95.63n ± 1% 100.75n ± 1% +5.35% (p=0.000 n=20) AppendFloat/32NegExp-16 94.53n ± 1% 96.72n ± 0% +2.31% (p=0.000 n=20) AppendFloat/32Shortest-16 86.43n ± 0% 86.95n ± 0% ~ (p=0.010 n=20) AppendFloat/32Fixed8Hard-16 57.75n ± 1% 57.95n ± 1% ~ (p=0.098 n=20) AppendFloat/32Fixed9Hard-16 66.56n ± 2% 66.97n ± 1% ~ (p=0.380 n=20) AppendFloat/64Fixed1-16 51.02n ± 1% 50.99n ± 1% ~ (p=0.473 n=20) AppendFloat/64Fixed2-16 50.94n ± 1% 51.01n ± 1% ~ (p=0.136 n=20) AppendFloat/64Fixed2.5-16 49.27n ± 1% 49.37n ± 1% ~ (p=0.218 n=20) AppendFloat/64Fixed3-16 51.85n ± 1% 52.55n ± 1% ~ (p=0.045 n=20) AppendFloat/64Fixed4-16 50.30n ± 1% 50.43n ± 1% ~ (p=0.794 n=20) AppendFloat/64Fixed5Hard-16 57.57n ± 1% 58.48n ± 1% +1.58% (p=0.000 n=20) AppendFloat/64Fixed12-16 82.67n ± 1% 84.02n ± 1% +1.63% (p=0.000 n=20) AppendFloat/64Fixed16-16 71.10n ± 1% 70.94n ± 1% ~ (p=0.569 n=20) AppendFloat/64Fixed12Hard-16 68.36n ± 1% 68.64n ± 1% ~ (p=0.155 n=20) AppendFloat/64Fixed17Hard-16 80.16n ± 1% 80.10n ± 1% ~ (p=0.836 n=20) AppendFloat/64Fixed18Hard-16 4.916µ ± 1% 4.919µ ± 1% ~ (p=0.507 n=20) AppendFloat/64FixedF1-16 239.75n ± 1% 69.67n ± 1% -70.94% (p=0.000 n=20) AppendFloat/64FixedF2-16 252.50n ± 1% 58.20n ± 1% -76.95% (p=0.000 n=20) AppendFloat/64FixedF3-16 238.00n ± 1% 52.79n ± 1% -77.82% (p=0.000 n=20) AppendFloat/Slowpath64-16 100.4n ± 1% 102.0n ± 1% +1.64% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-16 97.92n ± 1% 98.01n ± 1% ~ (p=0.304 n=20) geomean 95.58n 84.00n -12.12% host: s7 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-32 22.00n ± 0% 22.06n ± 0% +0.30% (p=0.001 n=20) AppendFloat/Float-32 34.83n ± 0% 34.76n ± 0% ~ (p=0.159 n=20) AppendFloat/Exp-32 34.91n ± 0% 34.89n ± 0% ~ (p=0.188 n=20) AppendFloat/NegExp-32 35.24n ± 0% 35.32n ± 0% ~ (p=0.026 n=20) AppendFloat/LongExp-32 37.02n ± 0% 37.02n ± 0% ~ (p=0.317 n=20) AppendFloat/Big-32 38.51n ± 0% 38.43n ± 0% ~ (p=0.060 n=20) AppendFloat/BinaryExp-32 17.57n ± 0% 17.59n ± 0% ~ (p=0.278 n=20) AppendFloat/32Integer-32 22.06n ± 0% 22.09n ± 0% ~ (p=0.762 n=20) AppendFloat/32ExactFraction-32 32.91n ± 0% 33.00n ± 0% ~ (p=0.055 n=20) AppendFloat/32Point-32 33.24n ± 0% 33.18n ± 0% ~ (p=0.068 n=20) AppendFloat/32Exp-32 34.50n ± 0% 34.55n ± 0% ~ (p=0.030 n=20) AppendFloat/32NegExp-32 33.53n ± 0% 33.61n ± 0% ~ (p=0.045 n=20) AppendFloat/32Shortest-32 30.10n ± 0% 30.10n ± 0% ~ (p=0.931 n=20) AppendFloat/32Fixed8Hard-32 22.89n ± 0% 22.49n ± 0% -1.75% (p=0.000 n=20) AppendFloat/32Fixed9Hard-32 25.82n ± 0% 25.75n ± 1% ~ (p=0.143 n=20) AppendFloat/64Fixed1-32 18.80n ± 0% 18.70n ± 0% ~ (p=0.004 n=20) AppendFloat/64Fixed2-32 18.64n ± 1% 18.54n ± 0% ~ (p=0.001 n=20) AppendFloat/64Fixed2.5-32 17.89n ± 0% 17.81n ± 0% ~ (p=0.001 n=20) AppendFloat/64Fixed3-32 19.62n ± 0% 19.68n ± 0% +0.31% (p=0.000 n=20) AppendFloat/64Fixed4-32 18.64n ± 0% 18.82n ± 0% ~ (p=0.010 n=20) AppendFloat/64Fixed5Hard-32 21.62n ± 0% 21.57n ± 0% ~ (p=0.058 n=20) AppendFloat/64Fixed12-32 30.98n ± 1% 30.61n ± 1% -1.23% (p=0.000 n=20) AppendFloat/64Fixed16-32 26.89n ± 0% 27.08n ± 1% ~ (p=0.003 n=20) AppendFloat/64Fixed12Hard-32 26.03n ± 0% 26.20n ± 1% ~ (p=0.344 n=20) AppendFloat/64Fixed17Hard-32 30.03n ± 1% 29.72n ± 1% ~ (p=0.001 n=20) AppendFloat/64Fixed18Hard-32 1.824µ ± 0% 1.825µ ± 1% ~ (p=0.567 n=20) AppendFloat/64FixedF1-32 83.58n ± 1% 26.52n ± 0% -68.26% (p=0.000 n=20) AppendFloat/64FixedF2-32 89.68n ± 1% 20.32n ± 1% -77.34% (p=0.000 n=20) AppendFloat/64FixedF3-32 84.84n ± 0% 18.82n ± 0% -77.81% (p=0.000 n=20) AppendFloat/Slowpath64-32 35.55n ± 0% 35.61n ± 0% ~ (p=0.394 n=20) AppendFloat/SlowpathDenormal64-32 35.03n ± 0% 35.02n ± 0% ~ (p=0.733 n=20) geomean 34.67n 30.31n -12.56% host: linux-386 goarch: 386 cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-16 133.6n ± 1% 130.5n ± 1% ~ (p=0.002 n=20) AppendFloat/Float-16 242.3n ± 1% 237.0n ± 1% -2.19% (p=0.000 n=20) AppendFloat/Exp-16 249.1n ± 3% 252.5n ± 1% ~ (p=0.005 n=20) AppendFloat/NegExp-16 248.7n ± 3% 253.8n ± 2% ~ (p=0.006 n=20) AppendFloat/LongExp-16 258.4n ± 2% 253.0n ± 6% ~ (p=0.185 n=20) AppendFloat/Big-16 285.6n ± 1% 279.2n ± 5% ~ (p=0.012 n=20) AppendFloat/BinaryExp-16 89.47n ± 1% 91.85n ± 2% +2.65% (p=0.000 n=20) AppendFloat/32Integer-16 133.5n ± 1% 129.9n ± 1% ~ (p=0.004 n=20) AppendFloat/32ExactFraction-16 213.7n ± 1% 212.2n ± 2% ~ (p=0.071 n=20) AppendFloat/32Point-16 202.0n ± 0% 200.4n ± 1% ~ (p=0.223 n=20) AppendFloat/32Exp-16 236.4n ± 1% 239.8n ± 1% +1.44% (p=0.000 n=20) AppendFloat/32NegExp-16 212.5n ± 1% 211.9n ± 1% ~ (p=0.995 n=20) AppendFloat/32Shortest-16 200.3n ± 1% 193.9n ± 1% -3.20% (p=0.000 n=20) AppendFloat/32Fixed8Hard-16 136.0n ± 1% 133.2n ± 4% ~ (p=0.323 n=20) AppendFloat/32Fixed9Hard-16 155.6n ± 1% 156.7n ± 2% ~ (p=0.022 n=20) AppendFloat/64Fixed1-16 132.8n ± 1% 133.0n ± 3% ~ (p=0.199 n=20) AppendFloat/64Fixed2-16 128.9n ± 1% 129.7n ± 3% ~ (p=0.018 n=20) AppendFloat/64Fixed2.5-16 127.0n ± 1% 126.5n ± 3% ~ (p=0.825 n=20) AppendFloat/64Fixed3-16 127.3n ± 1% 130.3n ± 4% +2.32% (p=0.001 n=20) AppendFloat/64Fixed4-16 121.4n ± 1% 123.2n ± 2% +1.48% (p=0.000 n=20) AppendFloat/64Fixed5Hard-16 136.2n ± 1% 136.2n ± 3% ~ (p=0.256 n=20) AppendFloat/64Fixed12-16 159.0n ± 1% 165.2n ± 2% +3.93% (p=0.000 n=20) AppendFloat/64Fixed16-16 151.4n ± 0% 156.9n ± 1% +3.67% (p=0.000 n=20) AppendFloat/64Fixed12Hard-16 146.5n ± 1% 153.8n ± 1% +4.98% (p=0.000 n=20) AppendFloat/64Fixed17Hard-16 166.3n ± 1% 169.8n ± 1% +2.07% (p=0.001 n=20) AppendFloat/64Fixed18Hard-16 10.59µ ± 2% 10.60µ ± 0% ~ (p=0.499 n=20) AppendFloat/64FixedF1-16 614.4n ± 1% 152.0n ± 1% -75.27% (p=0.000 n=20) AppendFloat/64FixedF2-16 845.0n ± 0% 139.1n ± 1% -83.53% (p=0.000 n=20) AppendFloat/64FixedF3-16 608.8n ± 1% 129.3n ± 1% -78.77% (p=0.000 n=20) AppendFloat/Slowpath64-16 251.7n ± 1% 245.0n ± 1% -2.66% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-16 248.4n ± 1% 241.2n ± 1% -2.90% (p=0.000 n=20) geomean 225.7n 193.8n -14.14% host: s7:GOARCH=386 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 1cc918cc725 │ b66c604f523 │ │ sec/op │ sec/op vs base │ AppendFloat/Decimal-32 41.88n ± 0% 42.02n ± 1% ~ (p=0.004 n=20) AppendFloat/Float-32 71.05n ± 0% 71.24n ± 0% ~ (p=0.044 n=20) AppendFloat/Exp-32 74.91n ± 1% 74.80n ± 0% ~ (p=0.433 n=20) AppendFloat/NegExp-32 74.10n ± 0% 74.20n ± 0% ~ (p=0.867 n=20) AppendFloat/LongExp-32 75.73n ± 0% 75.84n ± 0% ~ (p=0.147 n=20) AppendFloat/Big-32 82.47n ± 0% 82.36n ± 0% ~ (p=0.490 n=20) AppendFloat/BinaryExp-32 32.31n ± 1% 32.62n ± 0% +0.97% (p=0.000 n=20) AppendFloat/32Integer-32 41.38n ± 1% 41.40n ± 1% ~ (p=0.106 n=20) AppendFloat/32ExactFraction-32 62.72n ± 0% 62.92n ± 0% ~ (p=0.009 n=20) AppendFloat/32Point-32 60.36n ± 0% 60.33n ± 0% ~ (p=0.050 n=20) AppendFloat/32Exp-32 68.97n ± 0% 69.24n ± 0% +0.39% (p=0.000 n=20) AppendFloat/32NegExp-32 62.63n ± 0% 63.15n ± 0% +0.82% (p=0.000 n=20) AppendFloat/32Shortest-32 58.76n ± 0% 58.87n ± 0% ~ (p=0.053 n=20) AppendFloat/32Fixed8Hard-32 41.67n ± 1% 43.46n ± 1% +4.30% (p=0.000 n=20) AppendFloat/32Fixed9Hard-32 49.78n ± 1% 50.53n ± 1% +1.52% (p=0.000 n=20) AppendFloat/64Fixed1-32 41.15n ± 0% 42.95n ± 1% +4.36% (p=0.000 n=20) AppendFloat/64Fixed2-32 40.83n ± 1% 41.24n ± 1% +1.03% (p=0.000 n=20) AppendFloat/64Fixed2.5-32 39.42n ± 0% 40.47n ± 1% +2.66% (p=0.000 n=20) AppendFloat/64Fixed3-32 40.73n ± 1% 41.58n ± 1% +2.10% (p=0.000 n=20) AppendFloat/64Fixed4-32 38.68n ± 0% 39.29n ± 0% +1.58% (p=0.000 n=20) AppendFloat/64Fixed5Hard-32 42.88n ± 1% 43.62n ± 1% +1.73% (p=0.000 n=20) AppendFloat/64Fixed12-32 51.67n ± 1% 52.92n ± 1% +2.42% (p=0.000 n=20) AppendFloat/64Fixed16-32 49.15n ± 0% 50.30n ± 0% +2.33% (p=0.000 n=20) AppendFloat/64Fixed12Hard-32 48.51n ± 0% 48.81n ± 0% +0.62% (p=0.001 n=20) AppendFloat/64Fixed17Hard-32 54.62n ± 1% 55.60n ± 1% +1.79% (p=0.000 n=20) AppendFloat/64Fixed18Hard-32 3.979µ ± 1% 3.980µ ± 1% ~ (p=0.569 n=20) AppendFloat/64FixedF1-32 165.90n ± 1% 49.97n ± 0% -69.88% (p=0.000 n=20) AppendFloat/64FixedF2-32 225.50n ± 0% 45.02n ± 1% -80.04% (p=0.000 n=20) AppendFloat/64FixedF3-32 160.20n ± 1% 42.16n ± 1% -73.69% (p=0.000 n=20) AppendFloat/Slowpath64-32 75.55n ± 0% 75.23n ± 0% -0.44% (p=0.000 n=20) AppendFloat/SlowpathDenormal64-32 74.84n ± 0% 75.00n ± 0% ~ (p=0.268 n=20) geomean 69.22n 61.13n -11.69% Change-Id: I722d2e2621e74e32cb3fc34a2df5b16cc595715c Reviewed-on: https://go-review.googlesource.com/c/go/+/717183 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
Now that the bootstrap compiler is 1.24, it's no longer needed. Change-Id: I9b3d6b7176af10fbc580173d50130120b542e7f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/717060 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Propagate "unread" across OpMoves. If the addr of this auto is only used by an OpMove as its source arg, and the OpMove's target arg is the addr of another auto. If the 2nd auto can be eliminated, this one can also be eliminated. This CL eliminates unnecessary memory copies and makes the frame smaller in the following code snippet: func contains(m map[string][16]int, k string) bool { _, ok := m[k] return ok } These are the benchmark results followed by the benchmark code: goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Map1Access2Ok-8 9.582n ± 2% 9.226n ± 0% -3.72% (p=0.000 n=20) Map2Access2Ok-8 13.79n ± 1% 10.24n ± 1% -25.77% (p=0.000 n=20) Map3Access2Ok-8 68.68n ± 1% 12.65n ± 1% -81.58% (p=0.000 n=20) package main_test import "testing" var ( m1 = map[int]int{} m2 = map[int][16]int{} m3 = map[int][256]int{} ) func init() { for i := range 1000 { m1[i] = i m2[i] = [16]int{15:i} m3[i] = [256]int{255:i} } } func BenchmarkMap1Access2Ok(b *testing.B) { for i := range b.N { _, ok := m1[i%1000] if !ok { b.Errorf("%d not found", i) } } } func BenchmarkMap2Access2Ok(b *testing.B) { for i := range b.N { _, ok := m2[i%1000] if !ok { b.Errorf("%d not found", i) } } } func BenchmarkMap3Access2Ok(b *testing.B) { for i := range b.N { _, ok := m3[i%1000] if !ok { b.Errorf("%d not found", i) } } } Fixes golang#75398 Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771 Reviewed-on: https://go-review.googlesource.com/c/go/+/702875 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Change-Id: I9c5a8f8a031e368bda312c830dc266f5986e8b1a GitHub-Last-Rev: 23145e8 GitHub-Pull-Request: golang#76160 Reviewed-on: https://go-review.googlesource.com/c/go/+/717341 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
We don't need to check that the bit patterns of the constants match, it is sufficient to just check the constant is equal to the given value. While we're here also change the FCLASSD rules to use a bit pattern for the mask. I think this improves readability, particularly as more uses of FCLASSD get added (e.g. CL 717560). These changes should not affect codegen. Change-Id: I92a6338dc71e6a71e04306f67d7d86016c6e9c47 Reviewed-on: https://go-review.googlesource.com/c/go/+/717580 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
0e1bd8b to c37e118 Compare 9f3a108 to c37e118 Compare Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
This patch add support for Zawrs extension on RISCV64
🔄 This is a mirror of upstream PR golang#76178