Skip to content

Conversation

@kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented May 4, 2023

This is used when the top_hits output is passed to pipeline aggregators
like bucket selectors. The logic retrieves the requested field from the
source of the first SearchHit. This implies that (a) the spec of the
wrapping aggregator (e.g. bucket_path) points to an appropriate field
using a bracketed reference (e.g. my_top_hits[my_metric]) and (b) the
top_hits contains a size: 1 setting.

This PR also includes extensions to YAML tests for top_metrics and
top_hits to cover the cases where these are used in pipeline
aggregations through bucket_selector, similar to a HAVING clause in
SQL.

Related to #73429.

kkrik-es added 5 commits May 2, 2023 11:37
TopMetrics aggregation currently requires explicitly providing the name of the "leaf" metric to be returned. When combined with a bucket filtering aggregation like BucketSelector, the latter uses the TopMetric aggregation name instead of the leaf metric to retrieve the field to apply filtering to. To support this, TopMetrics aggregation is extended to directly return its leaf metric if it only contains a single one. Related to elastic#73429.
The change is wrong and redundant. Update the YAML to denote the proper syntax for applying BucketSelector to TopMetrics.
Turns out they're not needed, the REST test in YAML passes without them.
This is used when `top_hits` output is passed to pipeline aggregators like bucket selectors. The logic retrieves the requested field from the source of the first SearchHit. This implies that (a) the spec of the wrapping aggregator (e.g. `bucket_path`) points to an appropriate field using a bracketed reference (e.g. `my_top_hits[my_metric]`) and (b) the `top_hits` contains a `size: 1` setting. This PR also includes extensions to YAML tests for `top_metrics` and `top_hits` to cover the cases where these are used in pipeline aggregations through `bucket_selector`, similar to a HAVING clause in SQL. Related to elastic#73429.
@kkrik-es kkrik-es added >enhancement :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels May 4, 2023
@kkrik-es kkrik-es self-assigned this May 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@kkrik-es kkrik-es requested a review from martijnvg May 4, 2023 10:44
@csoulios csoulios added >feature and removed >feature labels May 4, 2023
@mark-vieira
Copy link
Contributor

@elasticsearchmachine generate changelog

1 similar comment
@kkrik-es
Copy link
Contributor Author

kkrik-es commented May 4, 2023

@elasticsearchmachine generate changelog

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es
Copy link
Contributor Author

kkrik-es commented May 4, 2023

@elasticsearchmachine run elasticsearch-ci/part-3

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a question about whether other properties of search hit should be accessible. I also think we should update the top_hits aggregation documentation?

}
assert path.size() == 1 : "property paths for top_hits can only contain a single field";
assert searchHits.getHits().length == 1 : "property paths should only resolve against top_hits with size == 1.";
Map<String, Object> sourceAsMap = searchHits.getAt(0).getSourceAsMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether other properties of search hit should be accessible? Like the score or sort values.
Maybe the source should be accessible if _source as key is used? For example _source.field_name to access a specific field in the source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for specifying _source.field, as well as _sort and _score. Ptal, is this what you had in mind?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few more comments.

@kkrik-es
Copy link
Contributor Author

Thanks Martijn, comments addressed. I also need to update the documentation.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, otherwise LGTM.
I think this issue can also be closed when this lands: #94967

@kkrik-es
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kkrik-es kkrik-es added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 15, 2023
@kkrik-es
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

@cjbottaro
Copy link

@kkrik-es Can we get this backported to Elasticsearch 7?

@kkrik-es
Copy link
Contributor Author

This is an enhancement so we don't normally backport to previous versions. You can upgrade to Elasticsearch 8 to make use of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.9.0

6 participants