Skip to content

Commit deffa80

Browse files
authored
Support value retrieval in top_hits (#95828)
This is used when the `top_hits` output is passed to pipeline aggregators like bucket selectors. The logic retrieves the requested field from the source of the first SearchHit. This implies that (a) the spec of the wrapping aggregator (e.g. `bucket_path`) points to an appropriate field using a bracketed reference (e.g. `my_top_hits[my_metric]`) and (b) the `top_hits` contains a `size: 1` setting. This PR also includes extensions to YAML tests for `top_metrics` and `top_hits` to cover the cases where these are used in pipeline aggregations through `bucket_selector`, similar to a HAVING clause in SQL. Related to #73429.
1 parent a02ffaf commit deffa80

File tree

7 files changed

+659
-2
lines changed

7 files changed

+659
-2
lines changed

docs/changelog/95828.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 95828
2+
summary: Support value retrieval in `top_hits`
3+
area: Aggregations
4+
type: enhancement
5+
issues: []

docs/reference/aggregations/metrics/top-metrics-aggregation.asciidoc

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,3 +416,44 @@ Which returns the much more expected:
416416
}
417417
----
418418
// TESTRESPONSE
419+
420+
===== Use in pipeline aggregations
421+
422+
`top_metrics` can be used in pipeline aggregations that consume a single value per bucket, such as `bucket_selector`
423+
that applies per bucket filtering, similar to using a HAVING clause in SQL. This requires setting `size` to 1, and
424+
specifying the right path for the (single) metric to be passed to the wrapping aggregator. For example:
425+
426+
[source,console]
427+
----
428+
POST /test*/_search?filter_path=aggregations
429+
{
430+
"aggs": {
431+
"ip": {
432+
"terms": {
433+
"field": "ip"
434+
},
435+
"aggs": {
436+
"tm": {
437+
"top_metrics": {
438+
"metrics": {"field": "m"},
439+
"sort": {"s": "desc"},
440+
"size": 1
441+
}
442+
},
443+
"having_tm": {
444+
"bucket_selector": {
445+
"buckets_path": {
446+
"top_m": "tm[m]"
447+
},
448+
"script": "params.top_m < 1000"
449+
}
450+
}
451+
}
452+
}
453+
}
454+
}
455+
----
456+
// TEST[continued]
457+
458+
The `bucket_path` uses the `top_metrics` name `tm` and a keyword for the metric providing the aggregate value,
459+
namely `m`.

docs/reference/aggregations/metrics/tophits-aggregation.asciidoc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,3 +415,56 @@ the second slow of the `nested_child_field` field:
415415
...
416416
--------------------------------------------------
417417
// NOTCONSOLE
418+
419+
==== Use in pipeline aggregations
420+
421+
`top_hits` can be used in pipeline aggregations that consume a single value per bucket, such as `bucket_selector`
422+
that applies per bucket filtering, similar to using a HAVING clause in SQL. This requires setting `size` to 1, and
423+
specifying the right path for the value to be passed to the wrapping aggregator. The latter can be a `_source`, a
424+
`_sort` or a `_score` value. For example:
425+
426+
[source,console]
427+
--------------------------------------------------
428+
POST /sales/_search?size=0
429+
{
430+
"aggs": {
431+
"top_tags": {
432+
"terms": {
433+
"field": "type",
434+
"size": 3
435+
},
436+
"aggs": {
437+
"top_sales_hits": {
438+
"top_hits": {
439+
"sort": [
440+
{
441+
"date": {
442+
"order": "desc"
443+
}
444+
}
445+
],
446+
"_source": {
447+
"includes": [ "date", "price" ]
448+
},
449+
"size": 1
450+
}
451+
},
452+
"having.top_salary": {
453+
"bucket_selector": {
454+
"buckets_path": {
455+
"tp": "top_sales_hits[_source.price]"
456+
},
457+
"script": "params.tp < 180"
458+
}
459+
}
460+
}
461+
}
462+
}
463+
}
464+
465+
--------------------------------------------------
466+
// TEST[setup:sales]
467+
468+
The `bucket_path` uses the `top_hits` name `top_sales_hits` and a keyword for the field providing the aggregate value,
469+
namely `_source` field `price` in the example above. Other options include `top_sales_hits[_sort]`, for filtering on the
470+
sort value `date` above, and `top_sales_hits[_score]`, for filtering on the score of the top hit.

0 commit comments

Comments
 (0)