Skip to content

Conversation

cbuescher
Copy link
Member

@cbuescher cbuescher commented Sep 24, 2024

The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.

Closes #113050

The 'persian' analyzer for Lucene 10 comes with PersianStemFilter as the last token filter by default. In order to maintain compatibility for old indices, we only use the new analyzer for newly created indices but configure a legacy analyzer with the old behaviour for older index versions.
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the testing and most of the changes. I do think the node features we need to add is a "Lucene 10 Release" feature instead of individual ones that are required for the lucene 10 release.

@cbuescher
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-1

@cbuescher
Copy link
Member Author

CI Failures on 8.16-bwc are issues coming from "main" so I'm inclined to merge this and wait for a fix/awaitsFix with the next upstream merge

@cbuescher
Copy link
Member Author

I think that even though we provide backward compatibility with existing indices with this change, it should be marked as "breaking" and have a changelog entry. Users that don't want the additional stemming need to move away from the default analyzer and build their own.

@cbuescher
Copy link
Member Author

@benwtrent I added a changelog entry, let me know if that reads alright to you.

@cbuescher cbuescher merged commit 7089ff3 into elastic:lucene_snapshot Oct 2, 2024
15 checks passed
@cbuescher cbuescher deleted the persian-analyzer-l10 branch October 2, 2024 12:06
@leemthompo
Copy link
Contributor

@cbuescher is this PR relevant to the serverless changelog? [FYI this question is based on 9.0 breaking changes]

@cbuescher
Copy link
Member Author

is this PR relevant to the serverless changelog?

Yes I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>breaking >enhancement :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >upgrade v9.0.0

5 participants