Skip to content

Prefilter nodes/shard queries #99073

@costin

Description

@costin

Description

Description

Blindly contacting nodes and shards to run a plan or query (even if cheap) has a detrimental effect especially in large clusters due to the associated cost:

  • nodes might be busy and have limited threads
  • the shards might be frozen and contacting them is expensive
  • the shard might be idle (and with refresh pending).

ES has several optimizations in this area, some which are straight forward.
This meta ticket lists a number of options with the plan of incorporating them in ESQL:

  • @timestamp filter
    Determine if there's any filter in a query that works on @timestamp. This could/should be combined with the filter parameter, if present ESQL: Improve detection of @timestamp inside query #99146
  • can_match query
    Before executing an actual local query, check if the plan actually matches anything.
  • reduce field_caps search space - QL: EQL and ESQL to use only the necessary fields in the internal field_caps calls #98987
    Currently QL asks all fields from an index pattern and does NOT apply any specified filter in order to get a full view of the data. This makes validation simple and has some nice side-effects (such as did you mean in case of typos). The big downside however is in the response time and how expensive field caps is.
    To improve both latency and memory consumption, we need to find a better approach with several options on the table:
    ~ pre-analyze the plan and find the fields needed by it and ask fields_caps just for these (complexity: medium)
    ~ stop using field_caps (complexity: big)
    ~ improve performance of field caps (complexity: minimal - nothing we need to do)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions