Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented May 7, 2025

Currently, time-series aggregations use the values aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using values, and the inputs are likely ordinal blocks. This change speeds up the values aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op 
@dnhatn dnhatn force-pushed the support-ordinals-values-aggs branch from 5a4fbba to 1fad50d Compare May 7, 2025 17:49
@dnhatn dnhatn requested review from ivancea and nik9000 May 7, 2025 18:21
@dnhatn dnhatn marked this pull request as ready for review May 7, 2025 18:21
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels May 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

}

$if(BytesRef)$
public static GroupingAggregatorFunction.AddInput wrapAddInput(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a static method somewhere else and reference it? So we don't need to edit it without the IDE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea :)

@dnhatn dnhatn force-pushed the support-ordinals-values-aggs branch from a8253f1 to 001fe81 Compare May 7, 2025 18:54
@dnhatn
Copy link
Member Author

dnhatn commented May 8, 2025

Thanks Nik!

@dnhatn dnhatn merged commit 7b87266 into elastic:main May 8, 2025
17 checks passed
@dnhatn dnhatn deleted the support-ordinals-values-aggs branch May 8, 2025 01:24
@dnhatn dnhatn mentioned this pull request May 8, 2025
28 tasks
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025
Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request May 12, 2025
Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jun 5, 2025
Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```
@dnhatn dnhatn added the v8.19.0 label Jun 5, 2025
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jun 5, 2025
Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```
dnhatn added a commit that referenced this pull request Jun 6, 2025
Currently, time-series aggregations use the `values` aggregation to collect dimension values. While we might introduce a specialized aggregation for this in the future, for now, we are using `values`, and the inputs are likely ordinal blocks. This change speeds up the `values` aggregation when the inputs are ordinal-based. Execution time reduced from 461ms to 192ms for 1000 groups. ``` ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 461.938 ± 6.089 ms/op ValuesAggregatorBenchmark.run BytesRef 10000 avgt 7 192.898 ± 1.781 ms/op ```
dnhatn added a commit that referenced this pull request Jul 21, 2025
Similar to #127849, this change adds an optimized path for leveraging ordinal blocks of intermediate input pages in the Values aggregator. Below are the micro-benchmark results. Before: ``` // 1 raw input page + 1000 intermediate input pages Benchmark (dataType) (groups) Mode Cnt Score Error Units ValuesAggregatorBenchmark.run BytesRef 1 avgt 2 0.382 ms/op ValuesAggregatorBenchmark.run BytesRef 1000 avgt 2 112.293 ms/op ValuesAggregatorBenchmark.run BytesRef 1000000 avgt 2 113182.908 ms/op ``` ``` After: // 1 raw input page + 1000 intermediate input pages Benchmark (dataType) (groups) Mode Cnt Score Error Units ValuesAggregatorBenchmark.run BytesRef 1 avgt 2 0.378 ms/op ValuesAggregatorBenchmark.run BytesRef 1000 avgt 2 34.410 ms/op ValuesAggregatorBenchmark.run BytesRef 1000000 avgt 2 64654.830 ms/op ``` 1K groups: 112 ms -> 34.4ms 1M groups: 113s -> 64s More to come with #130510 Relates #127849
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v8.19.0 v9.1.0

3 participants