Skip to content

TSDS Downsampling data loss with aggregate_metric_double types #96076

@xyu

Description

@xyu

Elasticsearch Version

8.6.2

Installed Plugins

No response

Java Version

bundled

OS Version

3.16.0-4-amd64

Problem Description

When a aggregate_metric_double field is created with a subset of metrics sub-fields mappings for downsampled index is created with the full complement of sub-fields resulting in the down-sampling process losing all docs.

Steps to Reproduce

Create a TSDS index with a mapping template like so:

{ "template": { "mappings": { "properties": { "@timestamp": { "type": "date", "format": "strict_date_optional_time" }, "host": { "type": "keyword", "time_series_dimension": true }, "calls": { "type": "aggregate_metric_double", "metrics": [ "sum" ], "default_metric": "sum", "time_series_metric": "gauge" } } } } }

Now create the TSDS data stream and you should have a set of backing index with a mapping that looks something like:

{ ".ds-tsds-test-2023.05.11-000002": { "mappings": { "_data_stream_timestamp": { "enabled": true }, "properties": { "@timestamp": { "type": "date", "format": "strict_date_optional_time" }, "host": { "type": "keyword", "time_series_dimension": true }, "calls": { "type": "aggregate_metric_double", "metrics": [ "sum" ], "default_metric": "sum", "time_series_metric": "gauge" } } } } }

If you then have an ILM policy that down samples for a stage, e.g.:

"warm": { "min_age": "7d", "actions": { "set_priority": { "priority": 40 }, "downsample": { "fixed_interval": "5m" }, "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } }

And trigger it either by waiting the given time or adjusting the index age you will end up with a down-sampled index with a mapping like the following and with 0 docs:

{ "shrink-ujei-downsample-te_v-.ds-tsds-test-2023.05.11-000001": { "mappings": { "_data_stream_timestamp": { "enabled": true }, "dynamic_templates": [ { "strings": { "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ], "properties": { "@timestamp": { "type": "date", "meta": { "fixed_interval": "5m", "time_zone": "UTC" } }, "host": { "type": "keyword", "time_series_dimension": true }, "calls": { "type": "aggregate_metric_double", "metrics": [ "min", "max", "sum", "value_count" ], "default_metric": "max", "time_series_metric": "gauge" } } } } }

Notice how the min, max, and value_count sub-fields exist in the downsampled mapping but not in the template / original mapping. This in turn causes the downsampled docs to be not indexable resulting in all data being dropped.

Logs (if relevant)

[2023-05-11T20:11:25,576][ERROR][o.e.x.d.RollupShardIndexer] [es1.test.example.com] Shard [[.ds-tsds-test-2023.05.11-000001][4]] failed to populate rollup index. Failures: [{null=org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [calls] of type [aggregate_metric_double] in a time series document at [2023-05-10T07:35:00.000Z]. Preview of field's value: 'null',... 

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions