Skip to content

Conversation

felixbarny
Copy link
Member

A short-term workaround for #99123

If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter (see open-telemetry/opentelemetry-collector-contrib#37511). As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.

If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Jan 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

priority: 10
# workaround for https://github.com/elastic/elasticsearch/issues/99123
_metric_names_hash:
type: keyword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: will a number be more lightweight, as you're using a 8 digit hex anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, numbers can't leverage run-length encoding. So it's actually lighter to use a keyword here as all dimensions are incorporated into the _tsid, which we sort by. Therefore, all values for the same tsid are equal and can be compressed very efficiently.

@felixbarny
Copy link
Member Author

I had a discussion with @martijnvg about this last week. The conclusion was that this change makes the consequences of imperfect grouping much less bad and we should therefore move forward with it. It's not a replacement for improving the grouping logic. But it's much better to have a different time series rather than dropping metrics. It's a much less stressful situation having to debug why the rate aggregation isn't working properly in some cases rather than debugging a data loss scenario. Longer-term, it seems like we'll go into the one metric per doc route where grouping of metrics isn't required anymore.

@felixbarny felixbarny requested a review from martijnvg February 17, 2025 12:56
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixbarny felixbarny merged commit 5e8865d into elastic:main Feb 18, 2025
17 checks passed
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.0
8.x
8.16
8.17
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
elasticsearchmachine pushed a commit that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
elasticsearchmachine pushed a commit that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
elasticsearchmachine pushed a commit that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
elasticsearchmachine pushed a commit that referenced this pull request Feb 18, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.
andrzej-stencel pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Feb 21, 2025
…tions (#37511) If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. This adds a hash of the metric names that will be mapped as a dimension in Elasticsearch. The tradeoff is that if the composition of the metrics grouping changes over time, a new time series will be created. That has an impact on the rate aggregation for counters. ES mapping changes: elastic/elasticsearch#120952 --------- Co-authored-by: Carson Ip <carsonip@users.noreply.github.com>
@carsonip
Copy link
Member

💚 All backports created successfully

Status Branch Result
8.18

Questions ?

Please refer to the Backport tool documentation

carsonip pushed a commit to carsonip/elasticsearch that referenced this pull request Apr 15, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters. (cherry picked from commit 5e8865d)
carsonip added a commit that referenced this pull request Apr 15, 2025
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters. (cherry picked from commit 5e8865d) Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
carsonip added a commit that referenced this pull request Apr 16, 2025
carsonip added a commit to carsonip/elasticsearch that referenced this pull request Apr 16, 2025
carsonip added a commit to carsonip/elasticsearch that referenced this pull request Apr 16, 2025
carsonip added a commit to carsonip/elasticsearch that referenced this pull request Apr 16, 2025
carsonip added a commit to carsonip/elasticsearch that referenced this pull request Apr 16, 2025
…elastic#126850) Bump otel-data plugin version as elastic#120952 missed the bump. (cherry picked from commit 5860ccb) # Conflicts: #	x-pack/plugin/otel-data/src/main/resources/resources.yaml
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
carsonip added a commit that referenced this pull request Apr 17, 2025
…#126850) (#126900) Bump otel-data plugin version as #120952 missed the bump. (cherry picked from commit 5860ccb)
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Data Management/Data streams Data streams and their lifecycles external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.16.5 v8.17.3 v8.19.0 v9.0.0 v9.1.0

4 participants