Skip to content

Indicate why a field has been _ignored #101153

@felixbarny

Description

@felixbarny

At the moment, a field can be ignored and land in the _ignored metadata field either because of ignore_malformed or because of ignore_above.

@ckauf reported that he heard feedback from a user that really likes the new default setting for ignore_malformed but it's not trivial to find out what the reason was that a field has been ignored. It could either be due to ignore_malformed or ignore_above.

This ambiguity will get worse with #96235 where fields can also be _ignored if the field limit is hit.

In this issue, I'd like to discuss options on how we could add an indication for the reason a field ended up being _ignored.


A potential solution for the would be to store an additional _ignored_reason metadata field alongside the _ignored field. The two fields would both contain an array of strings. We can line up the indices/positions of the two arrays so that we exactly know the reason for why a field has been ignored.

For example, if field foo has been ignored because of ignore_malformed and bar has been ignored because of ignore_above, we can store something like this:

{ "_ignored": ["foo", "bar"], "_ignored_reason": ["ignore_malformed", "ignore_above"] }

You might think, doesn't Lucene de-duplicate and sort keyword doc_values? Yes, it does, but the _ignored field isn't stored in doc_values but in a stored field. While we'll want to add doc_values to _ignored in the future (see #59946), we don't necessarily need to remove the stored field. This would come at the expense of storage but it would greatly simplify these troubleshooting workflows.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions