Skip to content
This repository was archived by the owner on Mar 18, 2025. It is now read-only.

Conversation

codebien
Copy link
Contributor

Running the output with the xk6 race detector enabled a data race is caught:

================== WARNING: DATA RACE Read at 0x00c00011fd20 by goroutine 71: go.k6.io/k6/metrics.(*CounterSink).Add() go.k6.io/k6@v0.38.0/metrics/sink.go:50 +0x5a github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*metricsStorage).update() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/metrics.go:43 +0x54f github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*PrometheusMapping).MapCounter() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/prometheus.go:16 +0xe4 github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*metricsStorage).transform() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/metrics.go:55 +0x218 github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*Output).convertToTimeSeries() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/remotewrite.go:149 +0x4c8 github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*Output).flush() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/remotewrite.go:112 +0x104 github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite.(*Output).flush-fm() github.com/grafana/xk6-output-prometheus-remote@v0.0.0-00010101000000-000000000000/pkg/remotewrite/remotewrite.go:84 +0x39 go.k6.io/k6/output.(*PeriodicFlusher).run() go.k6.io/k6@v0.38.0/output/helpers.go:89 +0xe7 go.k6.io/k6/output.NewPeriodicFlusher·dwrap·5() go.k6.io/k6@v0.38.0/output/helpers.go:122 +0x39 Previous write at 0x00c00011fd20 by goroutine 72: go.k6.io/k6/metrics.(*CounterSink).Add() go.k6.io/k6@v0.38.0/metrics/sink.go:50 +0x77 go.k6.io/k6/metrics/engine.(*outputIngester).flushMetrics() go.k6.io/k6@v0.38.0/metrics/engine/ingester.go:80 +0x3a1 go.k6.io/k6/metrics/engine.(*outputIngester).flushMetrics-fm() go.k6.io/k6@v0.38.0/metrics/engine/ingester.go:52 +0x39 go.k6.io/k6/output.(*PeriodicFlusher).run() go.k6.io/k6@v0.38.0/output/helpers.go:89 +0xe7 go.k6.io/k6/output.NewPeriodicFlusher·dwrap·5() go.k6.io/k6@v0.38.0/output/helpers.go:122 +0x39 

The global metrics referenced in the samples are managed by the k6's ingester output, so the sink operation is already executed from it. So, executing concurrently the sink on the metric in the output generates a data race.

The fixes use new dedicated metrics in the metricsStorage for executing the sink locally in the extension.

@codebien codebien requested review from mstoykov and yorugac May 31, 2022 09:31
@codebien codebien marked this pull request as draft May 31, 2022 09:39
The global metric is controlled by the k6's ingester output, so the sink operation on the global metric in the sample is already executed. Use a local metric in the metricsStorage for executing a local and dedicated sink in the extension. It fixes the data race where both the extension and the k6 ingester calls Add and sink for the same metrics.
@codebien codebien marked this pull request as ready for review May 31, 2022 10:08
@codebien codebien self-assigned this May 31, 2022
Copy link
Collaborator

@yorugac yorugac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 54 to 58
// TODO: this is just avoiding duplicates with the previous
// Implement a better and complete solution
// maybe discard any timestamp < latest?
//
//current.Time = sample.Time // to avoid duplicates in timestamps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note regarding solutions: can k6 guarantee to always send samples with timestamps in strictly increasing order?

current.Time = sample.Time // to avoid duplicates in timestamps

This assignment was mostly meant to avoid encountering an error on write (and losing data subsequently) and Prometheus bug mentioned below when such an error is thrown seemingly when it shouldn't be. Also, #11 for further investigation.

Copy link
Contributor Author

@codebien codebien Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find just now the time for checking this comment. The Prometheus' timestamp generator function

Timestamp: timestamp.FromTime(sample.Time),
that we are using for generating the timestamp has a millisecond precision that it could be very low if we consider our high concurrent world. If the remote write supports higher precision we could just use the sample.Time.UnixNano(), if not then we need to discard or wait for the new data modelling (hopefully available soon).

WDYT? Could it explain the issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that improving precision could help with this particular problem but even if it's possible, I'm not sure about drawbacks it'll bring. E.g. do people actually need precision higher than ms? what else would it impact during processing? etc. Also, I believe precision question is valid for both Prometheus and k6 implementations, IIRC.
So IMO, precision is a multi-level problem here.

This change feels like a bit of an experiment to me 🙂 Definitely 👍 for making more experiments and figuring out when exactly duplicate timestamps happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all the commented lines and I'm going to merge this PR so we can fix the data race. We need to find the main issue about duplicates and fix it in a dedicated PR. I expect to address it by implementing the k6 metrics refactoring.

Copy link
Contributor

@javaducky javaducky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The specific hack is not required with the new local model. Regarding the original issue we need to get the real problem and apply a complete solution. We expect to fix with a PR for github.com//issues/11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
4 participants