Skip to content

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented May 26, 2025

Closes #101163

Fixes the significant_terms aggregation not working correctly on nested fields.

It was returning buckets with bg_count: 0, meaning it wasn't detecting any background document.
The filter it used to check for background documents was a plain TermsQuery, which wouldn't work on Nested fields.

The PR adds an extra NestedQuery wrapping the Terms in case the field is inside a Nested parent.

SignificantText was also checked, but it's explicitly documented that it doesn't work on text fields, as it needs to access the source.

TBD: Backports

@ivancea ivancea requested a review from nik9000 May 26, 2025 15:26
@ivancea ivancea added >bug :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0 labels May 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help a bit with the review, these tests have:

  • Setup: Index with a type ("normal" and "outlier"), and a value (1 and 2). That value is replicated 6 times (integer and keyword, and then as a nested and doubly nested field. Every "value" field has the same value in each document, so testing against each of them should render the same results.
  • A first "Checking" test to ensure all data ingested is correct
  • Test cases for the 3 kinds of values (normal, nested, doubly nested). Each of them do a sig_terms expecting the same results (except for the background_filter one)
@ivancea ivancea marked this pull request as ready for review May 27, 2025 12:07
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Comment on lines +225 to +226
var nestedParentField = context.nestedLookup().getNestedParent(fieldType.name());
if (nestedParentField != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best way to detect a nested field, but it worked well, and I didn't find any edge case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg, do you know the most right way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if buildQuery should take the nested context into account? That sounds like a bigger change than I'd like to make to aggs at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been some time since I worked on nested related functionality. Assuming that fieldType.name() returns a path, then I think this is the right way to figure out the parent field.

@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've updated the changelog YAML for you.

@ivancea ivancea requested a review from iverase May 27, 2025 13:05
@ivancea ivancea changed the title Aggs: Fix significant terms not finding background docuemnts for nested fields Aggs: Fix significant terms not finding background documents for nested fields May 28, 2025
@ivancea ivancea added auto-backport Automatically create backport pull requests when merged v8.19.0 v9.0.3 labels May 30, 2025
@ivancea ivancea enabled auto-merge (squash) June 3, 2025 15:04
@ivancea ivancea merged commit 36828e2 into elastic:main Jun 4, 2025
17 of 18 checks passed
@ivancea ivancea deleted the significant-terms-nested-investigation branch June 4, 2025 12:54
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.0
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 4, 2025
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 4, 2025
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
elasticsearchmachine pushed a commit that referenced this pull request Jun 4, 2025
…ed fields (#128472) (#128896) Closes #101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
elasticsearchmachine pushed a commit that referenced this pull request Jun 4, 2025
…ed fields (#128472) (#128897) Closes #101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 5, 2025
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit to ivancea/elasticsearch that referenced this pull request Jun 5, 2025
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit that referenced this pull request Jun 5, 2025
…ed fields (#128472) (#128970) Closes #101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
ivancea added a commit that referenced this pull request Jun 5, 2025
…ed fields (#128472) (#128969) Closes #101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.8 v8.18.3 v8.19.0 v9.0.3 v9.1.0

4 participants