Speed up exists and missing filters on high-cardinality fields

The way that the exists filter works is by merging all postings lists. missing just wraps an exists filter into a not filter.

Merging all postings lists can however be very slow on high-cardinality fields. I think there are two ways to fix it:

make these filters run on top of field data,
or add a new metadata field that we could eg. call _field_names that would index all field names of a document.

Working on field data has the drawback of requiring a lot of stuff to be loaded into memory if the field doesn't have doc values, and the returned filter cannot skip.

I tend to like indexing field names because it would not load anything into memory with a default setup, and the returned filter could skip efficiently since it would be based on a postings list. But unfortunately it could not be used on indices that have been created before we introduce this new metadata field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up `exists` and `missing` filters on high-cardinality fields #5659

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed up exists and missing filters on high-cardinality fields #5659

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Speed up `exists` and `missing` filters on high-cardinality fields #5659