Skip to content

Speed up exists and missing filters on high-cardinality fields #5659

@jpountz

Description

@jpountz

The way that the exists filter works is by merging all postings lists. missing just wraps an exists filter into a not filter.

Merging all postings lists can however be very slow on high-cardinality fields. I think there are two ways to fix it:

  1. make these filters run on top of field data,
  2. or add a new metadata field that we could eg. call _field_names that would index all field names of a document.

Working on field data has the drawback of requiring a lot of stuff to be loaded into memory if the field doesn't have doc values, and the returned filter cannot skip.

I tend to like indexing field names because it would not load anything into memory with a default setup, and the returned filter could skip efficiently since it would be based on a postings list. But unfortunately it could not be used on indices that have been created before we introduce this new metadata field.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions