- Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version):
Version: 6.3.0, Build: default/tar/424e937/2018-06-11T23:38:03.357887Z, JVM: 10.0.1
Plugins installed:
analysis-icu ingest-geoip ingest-user-agent mapper-size repository-gcs ] JVM version (java -version):
openjdk version "10.0.1" 2018-04-17 OpenJDK Runtime Environment (build 10.0.1+10) OpenJDK 64-Bit Server VM (build 10.0.1+10, mixed mode) OS version (uname -a if on a Unix-like system): Linux es-master-0 4.4.111+ #1 SMP Sat May 5 12:48:47 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior: When searching an index, if a prefixed token contains only filtered characters (e.g. @*), Elasticsearch 5.5 previously filtered that token out of the query entirely (the expected behavior). In 6.3.0, this token is preserved, causing the query to match nothing if the same token character filtering is applied at indexing time.
Steps to reproduce:
- Create the index:
curl -X PUT localhost:9200/punct-wildcard-test -d '{ "settings": { "analysis": { "analyzer": { "icu_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer" } } } }, "mappings": { "doc": { "properties": { "txt": { "type": "text", "analyzer": "icu_analyzer" } } } } }' - Analyze text (what would happen during indexing):
curl -X POST localhost:9200/punct-wildcard-test/_analyze -d '{ "text": ["foo @bar baz@qux"], "tokenizer": "icu_tokenizer" }' Result:
{ "tokens": [ { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "<ALPHANUM>", "position": 0 }, { "token": "bar", "start_offset": 5, "end_offset": 8, "type": "<ALPHANUM>", "position": 1 }, { "token": "baz", "start_offset": 9, "end_offset": 12, "type": "<ALPHANUM>", "position": 2 }, { "token": "qux", "start_offset": 13, "end_offset": 16, "type": "<ALPHANUM>", "position": 3 } ] } - Validate/explain a problem query:
curl -X POST "localhost:9200/punct-wildcard-test/_validate/query?explain" -d '{ "query": { "query_string": { "query": "foo @* @bar* baz@*", "analyzer": "icu_analyzer", "default_field": "txt", "analyze_wildcard": true } } }' Elasticsearch 5.5 Response:
{ "valid": true, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "explanations": [ { "index": "punct-wildcard-test", "valid": true, "explanation": "txt:foo txt:bar* txt:baz*" } ] } Elasticsearch 6.3.0 Response:
{ "_shards": { "total": 1, "successful": 1, "failed": 0 }, "valid": true, "explanations": [ { "index": "punct-wildcard-test", "valid": true, "explanation": "txt:foo txt:@* txt:bar* txt:baz*" } ] }