- Notifications
You must be signed in to change notification settings - Fork 25.6k
Create .synonyms system index #95548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create .synonyms system index #95548
Conversation
Create .synonyms system index that is exposed under es.synonyms_api_feature_flag.
| Pinging @elastic/es-search (Team:Search) |
| Hi @mayya-sharipova, I've created a changelog YAML for you. |
| @elasticsearchmachine run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question about creating a separate plugin for managing synonyms - besides that, LGTM
...n/src/main/java/org/elasticsearch/analysis/common/synonyms/SynonymsManagementAPIService.java Outdated Show resolved Hide resolved
...es/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java Show resolved Hide resolved
...es/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as a first step. I have a question regarding future plans to where the rest of the API is going to live. Are the endpoints supposed to end up in x-pack, the main server project or the common-analysis module? Since synonyms management is strongly tied to synonym filters and they currently are part of commons-analysis, it makes sense to me to put the system index there as well, but I wonder if putting the API there will work as well, especially for testing etc... Not a blocker, just interested in the more general plan here. Also I left a question around the proposed mapping for the index.
| builder.startObject("synonyms"); | ||
| { | ||
| builder.field("type", "object"); | ||
| builder.field("enabled", "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, this looks like we're leaning towards storing synonyms source-only, that is e.g. in this format:
{ "00001" : "foo => bar", "00002": "lol, laughing out loud", "00003": "i-phone, i phone => iphone" } Or maybe with more fields, but basically it would mean nothing in there is searchable and we'd have to load the whole source at once and parse stuff from a map? Also for updates this means we'd need to modify the source for every rule that is changed, appended etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format, we are currently leaning towards is this:
{ "synonyms" : [ "foo => bar", "lol, laughing out loud", "i-phone, i phone => iphone" ] }Where indeed synonyms are stored in source only inside a single field. And indeed for updates we modify the entire document (with its implications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, so that would mean a slightly different mapping then I guess (probably synonyms as keyword type array then?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it looks more like keyword, but I chose object field type because of performance:
enabledparameter can only be applied toobject. This will skip parsing of the contents of the field entirely, which is very performant and what we need. While doing:
"synonyms": { "type": "keyword", "doc_values": false, "index" : false }will still parse every synonym rule and does some extra work for indexing before retuning with empty index options.
-
Event it is defined as object, by setting
enabled:false, we accept any value even arrays, as we don't do any parsing. -
Having object with
enabled:false, allows us to modify document structure later without modifying mapping. For example, in case we decide to adopt a format with fields.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, I didn't think about not parsing the docs here at all but its a nice idea as long as we don't have to implement any fast filtering or search on anything. So the parsing logic would only live in whatever piece of code is reading the synonyms from index instead of from file like right now and basically treats source like a blob. Sounds good to me given the current scope of the UI etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: should we consider properly disabling parsing for fields that don't have any lucene footprint like in this example, rather than using type object? The current solution looks ok to me, but I wonder if it's solid enough given we discussed rethinking how to disable fields in the past (see #63369). This should not block progress on synonyms management API though.
| @cbuescher Thanks for your review. Answering your questions:
We have split opinion here. @carlosdelest is in favour to put endpoints (and probably management of .synonym index) into a separate plugin. I am in favour to keep everything under the common-analysis module for 2 reasons: 1) synonym management API going to be a general feature available to everyone 2) this feature is strongly tied to synonym filters which live in the common-analysis module. |
...n/src/main/java/org/elasticsearch/analysis/common/synonyms/SynonymsManagementAPIService.java Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova thanks for the additional explanations, LGTM
| @mayya-sharipova according to this PR's labels, it should not have a changelog YAML, but I can't delete it because it is closed. Please either delete the changelog from the appropriate branch yourself, or adjust the labels. |
Create .synonyms system index that is exposed
under es.synonyms_api_feature_flag.
This is the first task for creating Synonyms API management,
where synonyms will be stored in the .synonyms system index.
Relates to #38523