Add mapper for exponential histograms #132493

JonasKunz · 2025-08-06T08:48:46Z

Adds a new field type and mapper for OpenTelemetry exponential histograms.

The implementation was heavily inspired (aka partly copy-pasted) from the existing HistogramFieldMapper.
Aggregations are not (yet) supported, but documents with this field type can be stored and loaded. I also did some manual testing with Discover in Kibana and things look fine there.

The new type looks like this:

{ "scale": 12, "zero": { "threshold": 0.123456, "count": 42 }, "positive": { "indices": [-1000000, -10, 25, 26, 99999999], "counts": [1, 2, 3, 4, 5] }, "negative": { "indices": [-1000, -999, 123456], "counts": [10, 20, 11] } }

Every sub-field except for scale is optional.

This is not the most readable with the indices array, but this sparse representation is required, because a dense representation might not be possible for e.g. aggregation results.

However, if desired we could support a dense representation as additional input format:

{ "scale": 12, "zero": { "threshold": 0.123456, "count": 42 }, "positive": { "offset": -12345, "counts": [1, 2, 3, 0, 0, 0, 0, 4, 5] }, "negative": { "offset": 6789, "counts": [10, 0, 20, 0, 0, 0, 11] } }

This would be exactly the opentelemetry representation. However, this adds some complexity:

External consumers would need to understand both notations
We'd need some heuristic on when to choose which format for synthetic source / query results. E.g. if at least 80% of buckets are empty, use sparse. Otherwise use dense.

I've not implemented this in this PR, as it is not strictly required. However, it would be great if you could give some feedback if we want to support this notation or not.

elasticsearchmachine · 2025-08-08T12:23:40Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

...c/main/java/org/elasticsearch/xpack/exponentialhistogram/CompressedExponentialHistogram.java

# Conflicts: # benchmarks/src/main/java/org/elasticsearch/benchmark/exponentialhistogram/ExponentialHistogramMergeBench.java # libs/exponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/ZeroBucket.java # libs/exponential-histogram/src/test/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramMergerTests.java

martijnvg

I looked at the encoding and how it gets stored and left a question around that.

martijnvg · 2025-08-18T03:53:16Z

.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java

+ CompressedExponentialHistogram.writeHistogramBytes(histogramBytesOutput, scale, negativeBuckets, positiveBuckets);
+ BytesRef histoBytes = histogramBytesOutput.bytes().toBytesRef();
+
+ Field histoField = new BinaryDocValuesField(fullPath(), histoBytes);


Using binary doc values today means this would never be compressed (across multiple documents). I suspect the storage would be high for exponential histogram? Did we consider alternatives?

Looks like the encoding logic kind of stores a number of IndexWithCount. Did we look into encoding each index and count pair using sorted numeric doc values? We can have two sorted number doc values, one count index and one for counts. Then we can have another sorted number document values that keeps track which count belongs to what index. Maybe this compresses better? We would need a similar structure for negative buckets.

I took inspiration from HistogramFieldMapper which also stores tuples of (value, count) as encoded binary doc values with the difference that value is a double. For that reason I assumed that this is the best option for storing tuples.

I don't quite understand how your suggested approach would work.

Let's say we have the following (index, count) tuples to store: [(1, 10), (2, 30), (3, 20)].
The SortedNumericDocValuesField for index would then store 1, 2, 3 and for the counts it would be 10, 20, 30.

How would we store the relation between those two? We'd need a "position"-array storing the position within the counts for each index, which in this example would be 0,2,1. I assumed there is no way of storing this, because SortedNumericDocValues would reorder here?

Yes, sorted number doc values re-orders numbers at the document level (lowest to highest). We would need an extra doc values field to keep track of original order. We can stored as bytes maybe in a SortedDocvalues.

Something similar was done for keep track of original array order for sythetic source if we synthesize from doc values. See FieldArrayContext and how it gets used.

I think this could reduce storage required for exponential histograms, but we would need to test it. The current format is much simpler.

Maybe in order to test it would could do a small experiment by generating some semi-realistic exponential histograms and direct store then into a lucene index with both formats and than compare index size?

Given that this adds the new field mapper behind a feature flag, we can also explore this in a follow up. But I like to look into this before the feature flag gets removed.

Here are the experiment results:

The custom encoding stored in a single BinaryDocValue yields 1.59 bytes per populated bucket

Storing indices and counts each in SortedNumericDocValues with FieldArrayContext gives 3.34 bytes per bucket

The custom encoding stored in a single SortedDocValue yields 1.2 bytes, presumably due to LZ4 compression of the dictionary.

So I suggest we go with the custom encoding, but stored in a SortedDocValuesField for now?

I tried doing this (branch), but this causes a weird test failure, which seems to be fixed by this lucene change.

So I guess we should do the switch to SortedDocValuesField later after upgrading to lucene 10.3?

Using a terms dictionary for this kinda feels wrong. It adds more ingest overhead to create and de-duplicate the dictionary and adds another layer of indirection. But the compression does look compelling. As commented above, I think it would be interesting to try compressing values with (double) delta encoding to see if it brings benefits. Also, maybe there's a way to compress binary doc values with lz4 or zstd somehow, without having to create a terms dictionary?

...stogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java

kkrik-es · 2025-08-19T09:03:38Z

...stogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java

+ private int negativeBucketsLength;
+ private int positiveBucketsLength;
+
+ void load(BytesRef data) throws IOException {


Nit: decode?

Should this be a factory method, to avoid mutations?

This was done intentionally to avoid allocation when iterating over stored histograms. The same pattern is used in HistogramFieldMapper.

If this was a factory method, this would also give a wrong sense of lifetime/safety IINM: The byte array backing binaryValue() returned from BinaryDocValues is reused internally when advancing to the next method. So if we want to avoid mutation side effects here, we'd need to copy the bytes value, which we want to avoid for performance reasons.

…lasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java Co-authored-by: Kostas Krikellas <131142368+kkrik-es@users.noreply.github.com>

…-mapper # Conflicts: # x-pack/plugin/mapper-exponential-histogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java

kkrik-es · 2025-08-19T09:56:01Z

...stogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java

+ }
+ long minIndex = buckets.getFirst().index();
+ out.writeZLong(minIndex);
+ long prevIndex = minIndex - 1;


Why do we need - 1 here and below?

Here it is to avoid the special case of writing out the first bucket. But I can see how this can be confusing. I'll instead write out the count of the first bucket explicitly and start the for loop below at the second list element. This should make this more clear.

Below I though it should be self explanatory:
If you have a bucket at index 42 and the next bucket is 44, the indexDelta is 44-42=2. The number of empty buckets inbetween is 2-1 = 1.

kkrik-es

Nice.

...stogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/EncodedHistogramData.java

.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java

felixbarny · 2025-08-20T07:37:50Z

.../main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java

+ CompressedExponentialHistogram.writeHistogramBytes(histogramBytesOutput, scale, negativeBuckets, positiveBuckets);
+ BytesRef histoBytes = histogramBytesOutput.bytes().toBytesRef();
+
+ Field histoField = new BinaryDocValuesField(fullPath(), histoBytes);


Using a terms dictionary for this kinda feels wrong. It adds more ingest overhead to create and de-duplicate the dictionary and adds another layer of indirection. But the compression does look compelling. As commented above, I think it would be interesting to try compressing values with (double) delta encoding to see if it brings benefits. Also, maybe there's a way to compress binary doc values with lz4 or zstd somehow, without having to create a terms dictionary?

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

…improv * upstream/main: (58 commits) Fixing flaky LoggedExec (tests) (elastic#133215) CPS search should not use `skip_unavailable` (elastic#132927) Don't fail search if bottom doc can't be formatted (elastic#133188) Mute org.elasticsearch.xpack.esql.action.RandomizedTimeSeriesIT testGroupByNothing elastic#133225 Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedSetDocValuesWithSkipperSmall elastic#133224 Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedNumberMergeAwayAllValuesWithSkipper elastic#133223 Adding support for index.number_of_replicas to data stream settings (elastic#132748) Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedDocValuesSingleUniqueValue elastic#133221 Fix VectorSimilarityFunctionsIT (elastic#133206) Mute org.elasticsearch.xpack.esql.action.RandomizedTimeSeriesIT testGroupBySubset elastic#133220 Increase the number of FORK branches in ForkGenerator (elastic#132019) Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=STORED]} elastic#133218 Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=DOC_VALUES]} elastic#133217 Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=NONE]} elastic#133216 Set default processor allocation for test clusters (elastic#133204) Add mapper for exponential histograms (elastic#132493) Fix offset handling in Murmur3Hasher (elastic#133193) unmute testDoesNotResolveClosedIndex (elastic#133115) Fix an AWS SDK v2 release note (elastic#133155) Limit the depth of a filter (elastic#133113) ...

* upstream/main: (58 commits) Fixing flaky LoggedExec (tests) (elastic#133215) CPS search should not use `skip_unavailable` (elastic#132927) Don't fail search if bottom doc can't be formatted (elastic#133188) Mute org.elasticsearch.xpack.esql.action.RandomizedTimeSeriesIT testGroupByNothing elastic#133225 Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedSetDocValuesWithSkipperSmall elastic#133224 Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedNumberMergeAwayAllValuesWithSkipper elastic#133223 Adding support for index.number_of_replicas to data stream settings (elastic#132748) Mute org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormatTests testSortedDocValuesSingleUniqueValue elastic#133221 Fix VectorSimilarityFunctionsIT (elastic#133206) Mute org.elasticsearch.xpack.esql.action.RandomizedTimeSeriesIT testGroupBySubset elastic#133220 Increase the number of FORK branches in ForkGenerator (elastic#132019) Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=STORED]} elastic#133218 Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=DOC_VALUES]} elastic#133217 Mute org.elasticsearch.index.mapper.blockloader.IpFieldBlockLoaderTests testBlockLoader {preference=Params[syntheticSource=true, preference=NONE]} elastic#133216 Set default processor allocation for test clusters (elastic#133204) Add mapper for exponential histograms (elastic#132493) Fix offset handling in Murmur3Hasher (elastic#133193) unmute testDoesNotResolveClosedIndex (elastic#133115) Fix an AWS SDK v2 release note (elastic#133155) Limit the depth of a filter (elastic#133113) ...

elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 6, 2025

JonasKunz added 4 commits August 8, 2025 12:34

Add mapper for exponential_histogram type

93a59b9

Fix exception types, remove todos

ce6796e

Fix javadoc

d861def

Fix zerobucket comments and visibility

f48361b

JonasKunz force-pushed the exp-histo-mapper branch from 9a70b1d to f48361b Compare August 8, 2025 10:37

JonasKunz added 3 commits August 8, 2025 12:39

Merge remote-tracking branch 'elastic/main' into exp-histo-mapper

a6f53fa

Fix benchmark changes

875c34d

Fix yaml test name

69cd05b

JonasKunz marked this pull request as ready for review August 8, 2025 12:14

JonasKunz requested a review from not-napoleon August 8, 2025 12:14

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 8, 2025

JonasKunz requested review from felixbarny, kkrik-es and martijnvg August 8, 2025 12:15

JonasKunz added the Team:StorageEngine label Aug 8, 2025

elasticsearchmachine removed the Team:StorageEngine label Aug 8, 2025

felixbarny added >non-issue :StorageEngine/Mapping The storage related side of mappings labels Aug 8, 2025

elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Aug 8, 2025

JonasKunz removed the request for review from martijnvg August 14, 2025 08:56

kkrik-es reviewed Aug 14, 2025

View reviewed changes