- Notifications
You must be signed in to change notification settings - Fork 25.5k
Conditional stemming for 'persian' analyzer #113482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional stemming for 'persian' analyzer #113482
Conversation
The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.
Documentation preview: |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
server/src/main/java/org/elasticsearch/index/analysis/Analysis.java Outdated Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the testing and most of the changes. I do think the node features we need to add is a "Lucene 10 Release" feature instead of individual ones that are required for the lucene 10 release.
server/src/main/java/org/elasticsearch/index/analysis/Analysis.java Outdated Show resolved Hide resolved
@elasticmachine run elasticsearch-ci/part-1 |
CI Failures on 8.16-bwc are issues coming from "main" so I'm inclined to merge this and wait for a fix/awaitsFix with the next upstream merge |
I think that even though we provide backward compatibility with existing indices with this change, it should be marked as "breaking" and have a changelog entry. Users that don't want the additional stemming need to move away from the default analyzer and build their own. |
@benwtrent I added a changelog entry, let me know if that reads alright to you. |
server/src/main/java/org/elasticsearch/index/analysis/AnalyzerProvider.java Outdated Show resolved Hide resolved
…st {p0=range/20_synthetic_source/Date range} elastic#113874
@cbuescher is this PR relevant to the serverless changelog? [FYI this question is based on 9.0 breaking changes] |
Yes I think so. |
The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.
Closes #113050