Skip to content

QL: Reduce number of nested bool queries #96236

@costin

Description

@costin

Description

Certain OR expressions are translated into a number of nested bool queries which, unless balanced, end up being too deep and rejected by ES/Lucene.

Reproduced on 8.7.1 thanks to @e-parth-pathak:

  1. Create an Index and ingest data:
PUT /library/_bulk?refresh {"index":{"_id":"Leviathan Wakes"}} {"name":"Leviathan Wakes","author":"James S.A. Corey","release_date":"2011-06-02","page_count":561} {"index":{"_id":"Hyperion"}} {"name":"Hyperion","author":"Dan Simmons","release_date":"1989-05-26","page_count":482} {"index":{"_id":"Dune"}} {"name":"Dune","author":"Frank Herbert","release_date":"1965-06-01","page_count":604} {"index":{"_id":"Persuation"}} {"name":"Persuation","author":"Robert Cialdini","release_date":"2023-06-01","page_count":304} {"index":{"_id":"Sapiens"}} {"name":"Sapiens","author":"Yuval Noah Harari","release_date":"1969-06-01","page_count":204} {"index":{"_id":"Hooked"}} {"name":"Hooked","author":"Nir Eyal","release_date":"2022-06-01","page_count":408} 
  1. Run a query:
POST /_sql/translate { "query": "SELECT * FROM library WHERE name like '%nir%' OR name like '%yuval%' OR name like '%dan%' OR name like '%frank%' OR name like '%robert%' OR name like '%leviathan%'" } 
  1. The response that I got:
{ "size": 1000, "query": { "bool": { "should": [ { "bool": { "should": [ { "bool": { "should": [ { "bool": { "should": [ { "bool": { "should": [ { "wildcard": { "name.keyword": { "wildcard": "*nir*", "boost": 1 } } }, { "wildcard": { "name.keyword": { "wildcard": "*yuval*", "boost": 1 } } } ], "boost": 1 } }, { "wildcard": { "name.keyword": { "wildcard": "*dan*", "boost": 1 } } } ], "boost": 1 } }, { "wildcard": { "name.keyword": { "wildcard": "*frank*", "boost": 1 } } } ], "boost": 1 } }, { "wildcard": { "name.keyword": { "wildcard": "*robert*", "boost": 1 } } } ], "boost": 1 } }, { "wildcard": { "name.keyword": { "wildcard": "*leviathan*", "boost": 1 } } } ], "boost": 1 } }, "_source": false, "fields": [ { "field": "author" }, { "field": "name" }, { "field": "page_count" }, { "field": "release_date", "format": "strict_date_optional_time_nanos" } ], "sort": [ { "_doc": { "order": "asc" } } ], "track_total_hits": -1 }

Note that while the lowest nested bool query uses two entries, the rest do not and instead rely on another bool wrapper.
Either all entries should be put into a should clause OR, all the OR entries need to be balanced and be wrapped on the same level.

P.S. This has been addressed in the past by #30267 however it still seems to occur when dealing with like comparisons.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions