Skip to content

Conversation

@bwplotka
Copy link
Member

@bwplotka bwplotka commented Apr 12, 2022

Fixes #983

Also:

  • Improved docs
  • fixed TestMemStatsEquivalence, it was noop before (:
  • Removed gc_cpu_fraction metric completely, since it's not working completely for Go1.17+

Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka bwplotka marked this pull request as ready for review April 12, 2022 22:24
@bwplotka bwplotka requested review from beorn7 and kakkoyun April 12, 2022 22:25
@bwplotka
Copy link
Member Author

cc @mknyszek - we would love your eyes on this (:

@@ -227,9 +251,15 @@ func TestMemStatsEquivalence(t *testing.T) {
for i := 0; i < msRealValue.NumField(); i++ {
fr := msRealValue.Field(i)
ff := msFakeValue.Field(i)
switch typ.Kind() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact. This test was noop, because typ.Kind was always struct - we want field kind instead 🙈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! good catch.

@bwplotka
Copy link
Member Author

After this is merged I will attempt to fix other issues:

#994
#995
#1026

I would also (in separate PR) propose to default to just goRuntimeMemStats for now instead of goRuntimeMemStats | goRuntimeMetrics set. I think migration to new metrics is important, but it has to be opt in for now IMO. We impacted a lot ppl by adding two digit more of metrics and I am kind of missing the reason (users don't know how to leverage them yet) 🤔 Not sure, yet, let's discuss. (:

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

prometheus/collectors/go_collector.go file missing build flags to exclude versions older than <1.17, I guess, considering CI failures.

Fixes #983 Also: * fixed TestMemStatsEquivalence, it was noop before (: * Removed gc_cpu_fraction metric completely, since it's not working completely for Go1.17+ Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

Fixed @kakkoyun

@mknyszek I don't know much about runtime/metrics.Read and nothing about "stop-the-world" problem was written into its documentation (go docs). I assume we can remove section about the consequences of stop-the-world logic from Go Collector, right? Could you confirm? (: I mean the following one:

// NewGoCollector returns a collector that exports metrics about the current Go // process. This includes memory stats. To collect those, runtime.ReadMemStats // is called. This requires to “stop the world”, which usually only happens for // garbage collection (GC). Take the following implications into account when // deciding whether to use the Go collector: // // 1. The performance impact of stopping the world is the more relevant the more // frequently metrics are collected. However, with Go1.9 or later the // stop-the-world time per metrics collection is very short (~25µs) so that the // performance impact will only matter in rare cases. However, with older Go // versions, the stop-the-world duration depends on the heap size and can be // quite significant (~1.7 ms/GiB as per // https://go-review.googlesource.com/c/go/+/34937). // // 2. During an ongoing GC, nothing else can stop the world. Therefore, if the // metrics collection happens to coincide with GC, it will only complete after // GC has finished. Usually, GC is fast enough to not cause problems. However, // with a very large heap, GC might take multiple seconds, which is enough to // cause scrape timeouts in common setups. To avoid this problem, the Go // collector will use the memstats from a previous collection if // runtime.ReadMemStats takes more than 1s. However, if there are no previously // collected memstats, or their collection is more than 5m ago, the collection // will block until runtime.ReadMemStats succeeds. 
@bwplotka bwplotka merged commit 2417284 into main Apr 13, 2022
@bwplotka bwplotka deleted the gocolloptions branch April 13, 2022 08:55
@mknyszek
Copy link
Contributor

Hey sorry for the delay. If it matters anymore, LGTM! Thanks, and sorry I never got around to this.

@mknyszek
Copy link
Contributor

@mknyszek I don't know much about runtime/metrics.Read and nothing about "stop-the-world" problem was written into its documentation (go docs). I assume we can remove section about the consequences of stop-the-world logic from Go Collector, right? Could you confirm? (: I mean the following one:

As of Go 1.15 (golang/go#19812) I believe that problem is resolved for ReadMemStats. Before then, ReadMemStats would block until the GC mark phase was complete, meaning it could wait a while. Now it can stop-the-world at any time.

That's still less efficient than runtime/metrics.Read which never stops the world; it ensures metric consistency in a much more efficient way. Note that ReadMemStats still incurs a whole-program latency hit (usually not a problem in practice if you're not calling it very often) that runtime/metrics.Read does not.

@bwplotka bwplotka changed the title gocollector: Added options to Go Collector for changing the gocollector: Added options to Go Collector for changing the collections Apr 13, 2022
kakkoyun pushed a commit that referenced this pull request May 13, 2022
* Renamed files. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * gocollector: Added options to Go Collector for diffetent collections. Fixes #983 Also: * fixed TestMemStatsEquivalence, it was noop before (: * Removed gc_cpu_fraction metric completely, since it's not working completely for Go1.17+ Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
kakkoyun added a commit that referenced this pull request Jul 6, 2022
* Cut v1.12.0 Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Bump the day Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make the Go 1.17 collector thread-safe (#969) * Use simpler locking in the Go 1.17 collector (#975) A previous PR made it so that the Go 1.17 collector locked only around uses of rmSampleBuf, but really that means that Metric values may be sent over the channel containing some values from future metrics.Read calls. While generally-speaking this isn't a problem, we lose any consistency guarantees provided by the runtime/metrics package. Also, that optimization to not just lock around all of Collect was premature. Truthfully, Collect is called relatively infrequently, and its critical path is fairly fast (10s of µs). To prove it, this change also adds a benchmark. name old time/op new time/op delta GoCollector-16 43.7µs ± 2% 43.2µs ± 2% ~ (p=0.190 n=9+9) Note that because the benchmark is single-threaded it actually looks like it might be getting *slightly* faster, because all those Collect calls for the Metrics are direct calls instead of interface calls. Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com> * API client: make http reads more efficient (#976) Replace `io.ReadAll` with `bytes.Buffer.ReadFrom`. Both need to resize a buffer until they have finished reading; the former increases by 1.25x each time while the latter uses 2x. Also added a benchmark to demonstrate the benefit: name old time/op new time/op delta Client/4KB-8 35.9µs ± 4% 35.3µs ± 3% ~ (p=0.310 n=5+5) Client/50KB-8 83.1µs ± 8% 69.5µs ± 1% -16.37% (p=0.008 n=5+5) Client/1000KB-8 891µs ± 6% 750µs ± 0% -15.83% (p=0.016 n=5+4) Client/2000KB-8 1.74ms ± 2% 1.35ms ± 1% -22.72% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Client/4KB-8 20.2kB ± 0% 20.4kB ± 0% +1.26% (p=0.008 n=5+5) Client/50KB-8 218kB ± 0% 136kB ± 0% -37.65% (p=0.008 n=5+5) Client/1000KB-8 5.88MB ± 0% 2.11MB ± 0% -64.10% (p=0.008 n=5+5) Client/2000KB-8 11.7MB ± 0% 4.2MB ± 0% -63.93% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Client/4KB-8 75.0 ± 0% 72.0 ± 0% -4.00% (p=0.008 n=5+5) Client/50KB-8 109 ± 0% 98 ± 0% -10.09% (p=0.079 n=4+5) Client/1000KB-8 617 ± 0% 593 ± 0% -3.89% (p=0.008 n=5+5) Client/2000KB-8 1.13k ± 0% 1.09k ± 0% -3.27% (p=0.008 n=5+5) Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Reduce granularity of histogram buckets for Go 1.17 collector (#974) The Go runtime/metrics package currently exports extremely granular histograms. Exponentially bucket any histogram with unit "seconds" or "bytes" instead to dramatically reduce the number of buckets, and thus the number of metrics. This change also adds a test to check for expected cardinality to prevent cardinality surprises in the future. Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com> * Cut v1.12.1 (#978) * Cut v1.12.1 Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Apply review suggestions Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix deprecated `NewBuildInfoCollector` API Update `examples/random/main.go`: `prometheus.NewBuildInfoCollector` is deprecated. Use `collectors.NewBuildInfoCollector` instead. Signed-off-by: alissa-tung <alissa-tung@outlook.com> * gocollector: Added options to Go Collector for changing the (#1031) * Renamed files. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * gocollector: Added options to Go Collector for diffetent collections. Fixes #983 Also: * fixed TestMemStatsEquivalence, it was noop before (: * Removed gc_cpu_fraction metric completely, since it's not working completely for Go1.17+ Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * gocollector: Reverted client_golang v1.12 addition of runtime/metrics metrics by default. (#1033) Fixes #967 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * prometheus: Fix convention violating names for generated collector metrics (#1048) * Fix convention violating names for generated collector metrics Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add new Go collector example Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove -Inf buckets from go collector histograms (#1049) * Remove -Inf buckets from go collector histograms Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update prometheus/collectors/go_collector_latest_test.go Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Simplify Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Cut v1.12.2 (#1052) * Cut v1.12.2 Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Apply suggestions Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG.md Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Michael Knyszek <mknyszek@google.com> Co-authored-by: Bryan Boreham <bjboreham@gmail.com> Co-authored-by: Kemal Akkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: alissa-tung <alissa-tung@outlook.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants