- Notifications
You must be signed in to change notification settings - Fork 25.6k
Support for pattern replace filter in keyword normalizer #96588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for pattern replace filter in keyword normalizer #96588
Conversation
| Pinging @elastic/es-search (Team:Search) |
| Hi @Kiriakos1998, thanks for looking at this issue and opening a PR. Should be accepted and working. This would need some additional tests of course, but again it should be sufficient to add a case for using the "pattern_replace" filter in the rest tests under |
| Hello @cbuescher thanks for the feedback. I want to modify this PR and work on this issue. I missconcepted what had to be done in the first place but I think it's clear now. We need to be able to implement a filter with pattern_replace type and add it in the custom type normalizer. |
This reverts commit 1e97c93.
| Great, have a look at TrimTokenFilterTests and how it tests that everything is working in a normalizer. It should be possible do do a very similar test for PatternReplace. |
| Hi @cbuescher, just pushed the changes. Indeed, implementing the NormalizingTokenFilterFactory did the job. |
| Looks great, thanks. I will run our CI tests next. |
…n_keyword_normalizer
| @cbuescher I think it failed because it was a lot of commits behind the main(an 8.81. tar for Linux distribution was not created). I updated the branch with the latest commits. |
| Yes, lets try again |
| Looks like the test are okay now. Sorry I forgot to ask for one more little change. Could you add the "pattern_replace" filter to the list of token filters allowed in normalizers in the docs at |
| Sure no problem |
| Done |
| @elasticmachine test this please |
| @Kiriakos1998, Thanks a lot for your PR, it is merged to the 8.9 line now. |
Add support for pattern_replace type of normalizer for a keyword. After these changes, this setting won't be giving an exception.
{ "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "pattern_replace", "pattern": "^0+", "replacement": "", "all": false } } } }, "mappings": { "properties": { "pagerank": { "type": "keyword", "normalizer": "my_normalizer" } } } }The implementation used the CustomAnalyzerProvider adding two fields for the TokenFieldFactories and CharFieldFactories that are not pre-configured. So in buildMapping when the type pattern_replace is encountered its filters are retrieved from the charFilters and tokenFilters registered in AnalysisRegistry. Closes(#83005)